docs/content/stable/integrations/datahub.md
DataHub is an open-source metadata platform for the data stack. DataHub is a modern data catalog built to enable end-to-end data discovery, data observability, and data governance. It supports various data sources including PostgreSQL.
Because YugabyteDB's YSQL API is wire-compatible with PostgreSQL, Datahub can connect to YugabyteDB as a data source using the PostgreSQL plugin.
You can run the Docker Compose quickStart example provided in the Datahub GitHub repository against YugabyteDB with the following changes:
org.postgresql.Driver.Make changes in the following files:
In docker/quickstart/docker-compose-without-neo4j.quickstart.yml, change the following:
Change the EBEAN_DATASOURCE configuration [lines 80-84 and 126-130] as follows:
EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver
EBEAN_DATASOURCE_HOST=yugabyte:5433
EBEAN_DATASOURCE_PASSWORD=yugabyte
EBEAN_DATASOURCE_URL=jdbc:postgresql://yugabyte:5433/yugabyte
EBEAN_DATASOURCE_USERNAME=yugabyte
Change mysql-setup to postgres-setup [line 123].
Replace the mysql and mysql-setup container [lines 197 - 231] with yugabyte and postgres-setup container as follows:
yugabyte:
container_name: yugabyte
hostname: yugabyte
image: yugabytedb/yugabyte:latest
command: /bin/bash /home/yugabyte/docker-entrypoint-initdb.d/yb-init.sh
environment:
POSTGRES_USER: ${POSTGRES_USER:-yugabyte}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-yugabyte}
ports:
- '5433:5433'
volumes:
- ./yb-setup/:/home/yugabyte/docker-entrypoint-initdb.d/
healthcheck:
test: bin/ysqlsh -h `hostname -i` -U yugabyte -tAc 'select 1' -d yugabyte
interval: 10s
timeout: 5s
retries: 20
postgres-setup:
container_name: postgres-setup
depends_on:
yugabyte:
condition: service_healthy
environment:
- POSTGRES_HOST=yugabyte
- POSTGRES_PORT=5433
- POSTGRES_USERNAME=yugabyte
- POSTGRES_PASSWORD=yugabyte
- DATAHUB_DB_NAME=yugabyte
hostname: yugabyte-setup
image: ${DATAHUB_POSTGRES_SETUP_IMAGE:-acryldata/datahub-postgres-setup}:${DATAHUB_VERSION:-head}
Create a directory yb-setup in docker/quickstart/ and a script file named yb-init.sh with the following content and place it under docker/quickstart/yb-setup/ in the repository. The script runs during container initialization to launch the YugabyteDB cluster.
bin/yugabyted start
sleep 5
bin/ysqlsh -h `hostname -i` -f /home/yugabyte/docker-entrypoint-initdb.d/init.sql
tail -f /dev/null
Copy the file docker/postgres/init.sql to docker/quickstart/yb-setup/.
Run the example using the following command:
docker compose -f docker-compose-without-neo4j.quickstart.yml up -d
After all the containers are running, you can ingest some demo data by running ./datahub/docker/ingestion/ingestion.sh, or head to http://localhost:9002 (username: datahub, password: datahub) to access the UI.