Batch - Great Expectations

import TechnicalTag from '../term_tags/_tag.mdx';

A Batch is a selection of records from a <TechnicalTag relative="../" tag="data_asset" text="Data Asset" />.

A Batch provides an interface for describing specific data from any <TechnicalTag relative="../" tag="datasource" text="Data Source" />, and supports the creation of <TechnicalTag relative="../" tag="metric" text="Metrics" />, and <TechnicalTag relative="../" tag="validation" text="Validations" />.

Batches are designed to be "MECE" -- mutually exclusive and collectively exhaustive partitions of Data Assets. However, in many cases the same underlying data could be present in multiple batches, for example if an analyst runs an analysis against an entire table of data each day, with only a fraction of new records being added.

Consequently, the best way to understand what "makes a Batch a Batch" is the act of attending to it. Once you have defined how a Data Source's data should be sliced (even if that is to define a single slice containing all of the data in the Data Asset), you have determined what makes those particular Batches "a Batch." The Batch is the fundamental unit that Great Expectations will validate and about which it will collect metrics.

Relationship to other objects

A Batch is generated by providing a <TechnicalTag relative="../" tag="batch_request" text="Batch Request" /> to a Data Asset. It provides a reference to interact with the data through the Data Asset and adds metadata to precisely identify the specific data included in the Batch.

Metrics are always associated with a Batch of data. The identifier for the Batch is the primary way that Great Expectations identifies what data to use when computing a Metric and how to store that Metric.

Batches are also used by <TechnicalTag relative="../" tag="validator" text="Validators" /> when they run an Expectation Suite against data.

Use Cases

When creating Expectations interactively, a <TechnicalTag relative="../" tag="validator" text="Validator" /> needs access to a specific Batch of data against which to check Expectations. The how to guide on interactively creating expectations covers using a Batch in this use case.

During Validation, a <TechnicalTag relative="../" tag="checkpoint" text="Checkpoint" /> checks a Batch of data against Expectations from an <TechnicalTag relative="../" tag="expectation_suite" text="Expectation Suite" />. You must specify a Batch Request for the Checkpoint to run.

Consistency

A Batch is always part of a Data Asset. A Data Asset can be configured to slice its data into batches in many ways. For example, it can be based on an arbitrary field, including datetimes, from the data.

A Batch is always built using a Batch Request. See Batch Request or Connect to a Data Source.

Once a Data Asset identifies the specific data that will be included in a Batch based on the Batch Request, it creates a reference to the data and adds metadata to including the parameters used in the Batch Request.

Access

You can use the get_batch_list_from_batch_request method to access a Batch through the Data Asset. You don't typically access the Batch directly. Instead, you pass a Batch Request to a Validator or a Checkpoint.

Create

The BatchRequest object is the primary API used to construct Batches. You construct a Batch Request that corresponds to a batch via the Data Asset's method build_batch_request.

For more information, see our documentation on Batch Requests.

:::note Note

Instantiating a Batch does not necessarily “fetch” the data by immediately running a query or pulling data into memory. Instead, think of a Batch as a wrapper that includes the information that you will need to fetch the right data when it’s time to Validate.

:::