docs/how/elasticsearch-search-client-shim.md
This guide explains how to use DataHub's multi-client search engine shim to support different versions of Elasticsearch and OpenSearch through a unified interface.
DataHub's search client shim provides seamless support for:
This enables smooth migrations between different search engine versions while maintaining backward compatibility with existing DataHub deployments.
The shim consists of several key components:
SearchClientShim - Main abstraction interfaceSearchClientShimFactory - Factory for creating appropriate client implementationsEs7CompatibilitySearchClientShim - ES 7.17Es8SearchClientShim - ES 8.17+OpenSearch2SearchClientShim - OpenSearch 2.x| Source Engine | Target Engine | Shim Implementation | Status |
|---|---|---|---|
| DataHub → ES 7.17 | ES 7.17 | Es7CompatibilitySearchClientShim | ✅ Complete |
| DataHub → ES 8.17+ | ES 8.17+ | Es8SearchClientShim | ✅ Complete |
| DataHub → OpenSearch 2.x | OpenSearch 2.x | OpenSearch2SearchClientShim | ✅ Complete |
Configure the shim using these environment variables:
# Enable the search client shim (required)
ELASTICSEARCH_SHIM_ENABLED=true
# Specify engine type (or use AUTO_DETECT)
ELASTICSEARCH_SHIM_ENGINE_TYPE=AUTO_DETECT
# Options: AUTO_DETECT, ELASTICSEARCH_7, ELASTICSEARCH_8, OPENSEARCH_2
# Enable auto-detection (recommended)
ELASTICSEARCH_SHIM_AUTO_DETECT=true
Alternatively, configure via application.yaml:
elasticsearch:
host: localhost
port: 9200
username: ${ELASTICSEARCH_USERNAME:#{null}}
password: ${ELASTICSEARCH_PASSWORD:#{null}}
useSSL: false
# Standard Elasticsearch configuration...
# Multi-client shim configuration
shim:
enabled: true # Enable shim
engineType: AUTO_DETECT # or specific type
autoDetectEngine: true # Auto-detect cluster type
This is the most common migration path.
Step 1: Enable the shim
ELASTICSEARCH_SHIM_ENABLED=true
ELASTICSEARCH_SHIM_ENGINE_TYPE=ELASTICSEARCH_8
Step 2: Verify connection
# Check logs for successful connection
Direct migration from Elasticsearch to OpenSearch 2.x.
Configuration:
ELASTICSEARCH_SHIM_ENABLED=true
ELASTICSEARCH_SHIM_ENGINE_TYPE=OPENSEARCH_2
ELASTICSEARCH_SHIM_AUTO_DETECT=true
Let DataHub automatically detect your search engine type:
ELASTICSEARCH_SHIM_ENABLED=true
ELASTICSEARCH_SHIM_ENGINE_TYPE=AUTO_DETECT
ELASTICSEARCH_SHIM_AUTO_DETECT=true
The shim will:
Update your docker-compose.yml:
services:
datahub-gms:
environment:
- ELASTICSEARCH_SHIM_ENABLED=true
- ELASTICSEARCH_SHIM_ENGINE_TYPE=AUTO_DETECT
# ... other ES config
Update your deployment manifests:
apiVersion: apps/v1
kind: Deployment
metadata:
name: datahub-gms
spec:
template:
spec:
containers:
- name: datahub-gms
env:
- name: ELASTICSEARCH_SHIM_ENABLED
value: "true"
- name: ELASTICSEARCH_SHIM_ENGINE_TYPE
value: "AUTO_DETECT"
# ... other configuration
Update your values.yaml:
global:
elasticsearch:
shim:
enabled: true
engineType: "AUTO_DETECT"
autoDetectEngine: true
docker logs datahub-gms | grep -i "shim\|search"
Look for messages like:
INFO Creating SearchClientShim for engine type: ELASTICSEARCH_7
INFO Auto-detected search engine type: ELASTICSEARCH_7
# 1. Check DataHub health endpoint
curl http://localhost:8080/health
# 2. Verify search index access
curl -u user:pass "http://elasticsearch:9200/_cat/indices?v"
# 3. Test search functionality
curl -X POST "http://localhost:8080/api/graphql" \
-H "Content-Type: application/json" \
-d '{"query": "{ search(input: {type: DATASET, query: \"*\"}) { total }}"}'
ERROR: Unable to connect to search cluster
Solutions:
ELASTICSEARCH_HOST and ELASTICSEARCH_PORTERROR: Unable to detect search engine type
Solutions:
ELASTICSEARCH_SHIM_ENGINE_TYPE=ELASTICSEARCH_8curl http://elasticsearch:9200/_cluster/healthERROR: Incompatible API version
Solutions:
ERROR: ClassNotFoundException for ES client
Solutions:
build.gradle for required dependenciesEnable debug logging to troubleshoot issues:
# Add to environment
DATAHUB_LOG_LEVEL=DEBUG
ELASTICSEARCH_SHIM_DEBUG=true
Monitor key metrics during migration:
# Connection pool metrics
curl "http://localhost:8080/actuator/metrics/elasticsearch.connections"
# Search operation metrics
curl "http://localhost:8080/actuator/metrics/elasticsearch.search"
# Error rates
curl "http://localhost:8080/actuator/metrics/elasticsearch.errors"
To extend the shim for additional search engines:
SearchClientShim interfaceSearchEngineType enumSearchClientShimFactory| DataHub Version | ES 7.17 | ES 8.x | OpenSearch 2.x |
|---|---|---|---|
| 0.3.15+ | ✅ Full | ✅ 8.17+ | ✅ Full |
| Future | ✅ Full | ✅ Full | ✅ Full |
A: Yes, the shim is backward compatible. It is a thin abstraction layer over the existing code
A: No, DataHub connects to one search cluster at a time. Use the shim to switch between different engine types.
For additional support, join the conversation in the DataHub Community Slack or file an issue in the GitHub repository.