docs/authentication/concepts.md
We introduced a few important concepts to the Metadata Service to make authentication work:
In following sections, we'll take a closer look at each individually.
<p align="center"> </p>High level overview of Metadata Service Authentication
An Actor is a concept within the new Authentication subsystem to represent a unique identity / principal that is initiating actions (e.g. read & write requests) on the platform.
An actor can be characterized by 2 attributes:
For example, the root "datahub" super user would have the following attributes:
{
"type": "USER",
"id": "datahub"
}
Which is mapped to the CorpUser urn:
urn:li:corpuser:datahub
for Metadata retrieval.
An Authenticator is a pluggable component inside the Metadata Service that is responsible for authenticating an inbound request provided context about the request (currently, the request headers). Authentication boils down to successfully resolving an Actor to associate with the inbound request.
There can be many types of Authenticator. For example, there can be Authenticators that
and more! A key goal of the abstraction is extensibility: a custom Authenticator can be developed to authenticate requests based on an organization's unique needs.
DataHub ships with 3 Authenticators by default:
DataHubSystemAuthenticator: Verifies that inbound requests have originated from inside DataHub itself using a shared system identifier and secret. This authenticator is always present.
DataHubTokenAuthenticator: Verifies that inbound requests contain a DataHub-issued Access Token (discussed further in the "DataHub Access Token" section below) in their 'Authorization' header. This authenticator is required if Metadata Service Authentication is enabled.
DataHubGuestAuthenticator: Verifies if guest authentication is enabled with a guest user configured and allows unauthenticated users to perform operations as the designated guest user. By default, this Authenticator is disabled. If this is required, it needs to be explicitly enabled and requires a restart of the datahub GMS service.
An AuthenticatorChain is a series of Authenticators that are configured to run one-after-another. This allows for configuring multiple ways to authenticate a given request, for example via LDAP OR via local key file.
Only if each Authenticator within the chain fails to authenticate a request will it be rejected.
The Authenticator Chain can be configured in the application.yaml file under authentication.authenticators:
authentication:
....
authenticators:
# Configure the Authenticators in the chain
- type: com.datahub.authentication.Authenticator1
...
- type: com.datahub.authentication.Authenticator2
....
DataHub uses a two-tier authentication system that decouples authentication extraction from enforcement:
The AuthenticationExtractionFilter is the foundation servlet filter that runs for every request to the Metadata Service. Its single responsibility:
The second tier consists of enforcement mechanisms that can be implemented in multiple ways:
The default enforcement filter that:
/health, /config)The decoupled design enables flexible enforcement strategies:
This separation of concerns provides several advantages:
The AuthenticationContext is a thread-local storage mechanism that bridges the extraction and enforcement tiers. It serves as the universal authentication state for the entire request lifecycle:
This context enables consistent authentication decisions across all parts of the system. Whether enforcement happens in a servlet filter, a controller method, or custom business logic, they all work with the same authentication information established during the extraction phase.
Along with Metadata Service Authentication comes an important new component called the DataHub Token Service. The purpose of this component is twofold:
Access Tokens granted by the Token Service take the form of Json Web Tokens, a type of stateless token which has a finite lifespan & is verified using a unique signature. JWTs can also contain a set of claims embedded within them. Tokens issued by the Token Service contain the following claims:
Today, Access Tokens are granted by the Token Service under two scenarios:
datahub-frontend service issues an
request to the Metadata Service to generate a SESSION token on behalf of of the user logging in. (*Only the frontend service is authorized to perform this action).At present, the Token Service supports the symmetric signing method
HS256to generate and verify tokens.
Now that we're familiar with the concepts, we will talk concretely about what new capabilities have been built on top of Metadata Service Authentication.
The Guest Authentication configuration is present in two configuration files - the application.conf for DataHub frontend, and
application.yaml for GMS. To enable Guest Authentication, set the environment variable GUEST_AUTHENTICATION_ENABLED to true
for both the GMS and the frontend service and restart those services.
If enabled, the default user designated as guest is called guest. This user must be explicitly created and privileges assigned
to control the guest user privileges.
A recommended approach to operationalize guest access is, first, create a designated guest user account with login credentials, but keep guest access disabled. This allows you to configure and test the exact permissions this user should have. Once you've confirmed the privileges are set correctly, you can then enable guest access, which removes the need for login/credentials while maintaining the verified permission settings.
The name of the designated guest user can be changed by defining the env var GUEST_AUTHENTICATION_USER.
The entry URL to authenticate as the guest user is /public and can be changed via the env var GUEST_AUTHENTICATION_PATH
Here are the relevant portions of the two configs
For the Frontend
#application.conf
...
auth.guest.enabled = ${?GUEST_AUTHENTICATION_ENABLED}
# The name of the guest user id
auth.guest.user = ${?GUEST_AUTHENTICATION_USER}
# The path to bypass login page and get logged in as guest
auth.guest.path = ${?GUEST_AUTHENTICATION_PATH}
...
and for GMS
#application.yaml
# Required if enabled is true! A configurable chain of Authenticators
...
authenticators:
...
- type: com.datahub.authentication.authenticator.DataHubGuestAuthenticator
configs:
guestUser: ${GUEST_AUTHENTICATION_USER:guest}
enabled: ${GUEST_AUTHENTICATION_ENABLED:false}
...