docs/authentication/README.md
Authentication is the process of verifying the identity of a user or service. There are two places where Authentication occurs inside DataHub:
In this document, we'll tak a closer look at both.
Authentication of normal users of DataHub takes place in two phases.
At login time, authentication is performed by either DataHub itself (via username / password entry) or a third-party Identity Provider. Once the identity of the user has been established, and credentials validated, a persistent session token is generated for the user and stored in a browser-side session cookie.
DataHub provides 3 mechanisms for authentication at login time:
In subsequent requests, the session token is used to represent the authenticated identity of the user, and is validated by DataHub's backend service (discussed below). Eventually, the session token is expired (24 hours by default), at which point the end user is required to log in again.
DataHub also supports Guest users to access the system without requiring an explicit login when enabled. The default configuration disables guest authentication. When Guest access is enabled, accessing datahub with a configurable URL path logs the user in an existing user that is designated as the guest. The privileges of the guest user are controlled by adjusting privileges of that designated guest user.
When a user makes a request for Data within DataHub, the request is authenticated by DataHub's Backend (Metadata Service) via a JSON Web Token. This applies to both requests originating from the DataHub application, and programmatic calls to DataHub APIs. There are two types of tokens that are important:
MAX_SESSION_TOKEN_AGE environment variable
on the datahub-frontend deployment. Additionally, the AUTH_SESSION_TTL_HOURS configures the expiration time of the actor cookie on the user's browser which will also prompt a user login. The difference between these is that the actor cookie expiration only affects the browser session and can still be used programmatically,
but when the session expires it can no longer be used programmatically either as it is created as a JWT with an expiration claim.To learn more about DataHub's backend authentication, check out Introducing Metadata Service Authentication.
Credentials must be provided as Bearer Tokens inside of the Authorization header in any request made to DataHub's API layer.
Authorization: Bearer <your-token>
As with the frontend, the backend also can optionally enable Guest authentication. If Guest authentication is enabled, all API calls made to the backend without an Authorization header are treated as guest users and the privileges associated with the designated guest user apply to those requests.
Note that in DataHub local quickstarts, Authentication at the backend layer is disabled for convenience. This leaves the backend
vulnerable to unauthenticated requests and should not be used in production. To enable
backend (token-based) authentication, simply set the METADATA_SERVICE_AUTH_ENABLED=true environment variable
for the datahub-gms container or pod.
It is also recommended to provide your own values for DATAHUB_TOKEN_SERVICE_SIGNING_KEY and DATAHUB_TOKEN_SERVICE_SALT which is used to sign and verify the access tokens generated by the application. Note that starting v1.5.0 the default values for the above have been removed from the code.
For a quick video on the topic of users and groups within DataHub, have a look at DataHub Basics — Users, Groups, & Authentication 101