docs/how/migrating-elasticsearch-opensearch.md
:::caution Data Loss Warning
Switching from Elasticsearch to OpenSearch will result in the loss of all timeseries data. This includes:
If you need to preserve this historical data, consider exporting it before migration or plan for a data retention gap. See Preserving Timeseries Data section below to back up and restore of this data. :::
OpenSearch offers several advantages over Elasticsearch for DataHub deployments:
OpenSearch is required for DataHub's Semantic Search feature. Semantic search uses vector embeddings and k-NN (k-nearest neighbors) algorithms to find semantically similar assets based on meaning rather than just keyword matching. This enables:
Elasticsearch does not support the vector search capabilities needed for this functionality.
If you want to provide your own OpenSearch Instance, keep the setting false and skip to the next step.
Disable Elasticsearch. Note, this can be done as the last step if you want to migrate data or have a backup.
elasticsearch:
# set this to false, if you want to provide your own ES instance.
- enabled: true
+ enabled: false
Enable OpenSearch cluster. See the OpenSearch helm chart for a full list of values and options.
opensearch:
- enabled: false
+ enabled: true
Run helm upgrade to apply changes
helm upgrade prerequisites datahub/datahub-prerequisites --values <<path-to-values-file>>
# If you want to use OpenSearch instead of ElasticSearch add the USE_AWS_ELASTICSEARCH environment variable below
- # extraEnvs:
- # - name: USE_AWS_ELASTICSEARCH
- # value: "true"
+ extraEnvs:
+ - name: USE_AWS_ELASTICSEARCH
+ value: "true"
If you are bringing your own OpenSearch Cluster, set the cluster hostname to your OpenSearch Domain.
elasticsearch:
-# host: "elasticsearch-master"
# If you want to use OpenSearch instead of ElasticSearch use different hostname below
+ host: "opensearch-cluster-master"
...
# Elasticsearch/OpenSearch implementation configuration
- implementation: "elasticsearch" # Sets
ELASTICSEARCH_IMPLEMENTATION - "elasticsearch" or "opensearch"
+ implementation: "opensearch" # Sets
ELASTICSEARCH_IMPLEMENTATION - "elasticsearch" or "opensearch"
Run helm upgrade to apply changes
helm upgrade datahub datahub/datahub --values <<path-to-values-file>>
Run a Restore-Indices job to complete the migration.
kubectl create job --from=cronjob/datahub-datahub-restore-indices-job-template datahub-restore-indices-adhoc
We recommend exporting the ElasticSearch data using a tool such as elasticsearch-dump before migrating to OpenSearch.
Setting up an OpenSearch Cluster before tearing down the ElasticSearch Cluster is the easiest way.
elasticdump \
--input=http://elasticsaerch-domain:9200/ \
--output=http://opensearch-domain:9200/ \
--type=data
The export and import can also be done independently to a file.
elasticdump \
--input=http://elasticsearch-domain:9200/ \
--output=/data/datahub_data.json \
--type=data
elasticdump \
--intput=/data/datahub_data.json \
--output=http://opensearch-domain:9200/ \
--type=data
See elasticsearch-dump for a full list of supported options.