Back to Datahub

Compliance Forms

docs/api/tutorials/forms.md

1.5.0.37.3 KB
Original Source

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Compliance Forms

Why Would You Use Compliance Forms?

DataHub Compliance Forms streamline the process of documenting, annotating, and classifying your most critical Data Assets through a collaborative, crowdsourced approach.

With Compliance Forms, you can execute large-scale compliance initiatives by assigning tasks (e.g., documentation, tagging, or classification requirements) to the appropriate stakeholders — data owners, stewards, and subject matter experts.

Learn more about forms in the Compliance Forms Feature Guide.

Goal Of This Guide

This guide will show you how to

  • Create, Update, Read, and Delete a form
  • Assign and Remove a form from entities

Prerequisites

For this tutorial, you need to deploy DataHub Quickstart and ingest sample data. For detailed information, please refer to DataHub Quickstart Guide.

<Tabs> <TabItem value="CLI" label="CLI">

Install the relevant CLI version. Forms are available as of CLI version 0.13.1. The corresponding DataHub Cloud release version is v0.2.16.5 Connect to your instance via init:

  1. Run datahub init to update the instance you want to load into
  2. Set the server to your sandbox instance, https://{your-instance-address}/gms
  3. Set the token to your access token
</TabItem> </Tabs>

Create a Form

<Tabs> <TabItem value="graphQL" label="GraphQL">
graphql
mutation createForm {
  createForm(
    input: {
      id: "metadataInitiative2024"
      name: "Metadata Initiative 2024"
      description: "How we want to ensure the most important data assets in our organization have all of the most important and expected pieces of metadata filled out"
      type: VERIFICATION
      prompts: [
        {
          id: "123"
          title: "retentionTime"
          description: "Apply Retention Time structured property to form"
          type: STRUCTURED_PROPERTY
          structuredPropertyParams: {
            urn: "urn:li:structuredProperty:retentionTime"
          }
        }
      ]
      actors: {
        users: [
          "urn:li:corpuser:[email protected]"
          "urn:li:corpuser:[email protected]"
        ]
        groups: ["urn:li:corpGroup:[email protected]"]
      }
    }
  ) {
    urn
  }
}
</TabItem> <TabItem value="CLI" label="CLI">

Create a yaml file representing the forms you’d like to load. For example, below file represents a form 123456 You can see the full example here.

yaml
- id: 123456
  # urn: "urn:li:form:123456"  # optional if id is provided
  type: VERIFICATION # Supported Types: COMPLETION(DOCUMENTATION), VERIFICATION
  name: "Metadata Initiative 2023"
  description: "How we want to ensure the most important data assets in our organization have all of the most important and expected pieces of metadata filled out"
  prompts:
    - id: "123"
      title: "Retention Time"
      description: "Apply Retention Time structured property to form"
      type: STRUCTURED_PROPERTY
      structured_property_id: io.acryl.privacy.retentionTime
      required: True # optional, will default to True
  entities: # Either pass a list of urns or a group of filters. This example shows a list of urns
    urns:
      - urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)
  # optionally assign the form to a specific set of users and/or groups
  # when omitted, form will be assigned to Asset owners
  actors:
    users:
      - urn:li:corpuser:[email protected] # note: these should be urns
      - urn:li:corpuser:[email protected]
    groups:
      - urn:li:corpGroup:[email protected] # note: these should be urns

:::note Note that the structured properties and related entities should be created before you create the form. Please refer to the Structured Properties Tutorial for more information. :::

You can apply forms to either a list of entity urns, or a list of filters. For a list of entity urns, use this structure:

entities:
urns:
  - urn:li:dataset:...

For a list of filters, use this structure:

entities:
filters:
  types:
    - dataset  # you can use entity type name or urn
  platforms:
    - snowflake  # you can use platform name or urn
  domains:
    - urn:li:domain:finance  # you must use domain urn
  containers:
    - urn:li:container:my_container  # you must use container urn

Note that you can filter to entity types, platforms, domains, and/or containers.

Use the CLI to create your properties:

commandline
datahub forms upsert -f {forms_yaml}

If successful, you should see Created form urn:li:form:...

</TabItem> </Tabs>

Update Form

<Tabs> <TabItem value="graphQL" label="GraphQL">
graphql
mutation updateForm {
  updateForm(
    input: {
      urn: "urn:li:form:metadataInitiative2024"
      name: "Metadata Initiative 2024"
      description: "How we want to ensure the most important data assets in our organization have all of the most important and expected pieces of metadata filled out"
      type: VERIFICATION
      promptsToAdd: [
        {
          id: "456"
          title: "deprecationDate"
          description: "Deprecation date for dataset"
          type: STRUCTURED_PROPERTY
          structuredPropertyParams: {
            urn: "urn:li:structuredProperty:deprecationDate"
          }
        }
      ]
      promptsToRemove: ["123"]
    }
  ) {
    urn
  }
}
</TabItem> </Tabs>

Read Property Definition

<Tabs> <TabItem value="CLI" label="CLI">

You can see the properties you created by running the following command:

commandline
datahub forms get --urn {urn}

For example, you can run datahub forms get --urn urn:li:form:123456.

If successful, you should see metadata about your form returned like below.

json
{
  "urn": "urn:li:form:123456",
  "name": "Metadata Initiative 2023",
  "description": "How we want to ensure the most important data assets in our organization have all of the most important and expected pieces of metadata filled out",
  "prompts": [
    {
      "id": "123",
      "title": "Retention Time",
      "description": "Apply Retention Time structured property to form",
      "type": "STRUCTURED_PROPERTY",
      "structured_property_urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime"
    }
  ],
  "type": "VERIFICATION"
}
</TabItem> </Tabs>

Delete Form

<Tabs> <TabItem value="graphQL" label="GraphQL">
graphql
mutation deleteForm {
  deleteForm(input: { urn: "urn:li:form:metadataInitiative2024" })
}
</TabItem> </Tabs>

Assign Form to Entities

For assigning a form to a given list of entities:

<Tabs> <TabItem value="graphQL" label="GraphQL">
graphql
mutation batchAssignForm {
  batchAssignForm(
    input: {
      formUrn: "urn:li:form:myform"
      entityUrns: ["urn:li:dataset:mydataset1", "urn:li:dataset:mydataset2"]
    }
  )
}
</TabItem> </Tabs>

Remove Form from Entities

For removing a form from a given list of entities:

<Tabs> <TabItem value="graphQL" label="GraphQL">
graphql
mutation batchRemoveForm {
  batchRemoveForm(
    input: {
      formUrn: "urn:li:form:myform"
      entityUrns: ["urn:li:dataset:mydataset1", "urn:li:dataset:mydataset2"]
    }
  )
}
</TabItem> </Tabs>