docs/platform/connector-development/connector-builder-ui/pagination.md
Pagination is a mechanism used by APIs in which data is split up into "pages" when returning results, so that the entire response data doesn't need to be returned all at once.
The Connector Builder offers a Pagination section which implements the most common pagination methods used by APIs. When enabled, the connector will use the pagination configuration you have provided to request consecutive pages of data from the API until there are no more pages to fetch.
If your API doesn't support pagination, simply leave the Pagination section disabled.
Check the documentation of the API you want to integrate to find which type of pagination is uses. Many API docs have a "Pagination" or "Paging" section that describes this.
The following pagination mechanisms are supported in the connector builder:
Select the matching pagination method for your API and check the sections below for more information about individual methods. If none of these pagination methods work for your API, you can implement a custom pagination strategy or use the low-code CDK or Python CDK instead.
If your API paginates using offsets, the API docs will likely contain one of the following keywords:
offsetlimitIn this method of pagination, the "limit" specifies the maximum number of records to return per page, while the "offset" indicates the starting position or index from which to retrieve records.
For example, say that the API has the following dataset:
[
{"id": 1, "name": "Product A"},
{"id": 2, "name": "Product B"},
{"id": 3, "name": "Product C"},
{"id": 4, "name": "Product D"},
{"id": 5, "name": "Product E"}
]
Then the API may take in a request like this: GET https://api.example.com/products?limit=2&offset=3, which could result in the following response:
{
"data": [
{"id": 4, "name": "Product D"},
{"id": 5, "name": "Product E"}
]
}
Normally, the caller of the API would need to implement some logic to then increment the offset by the limit amount and then submit another call with the updated offset, and continue on this pattern until all of the records have been retrieved.
The Offset Increment pagination mode in the Connector Builder does this for you. So you just need to decide on a limit value to set (the general recommendation is to use the largest limit that the API supports in order to minimize the number of API requests), and configure how the limit and offset are injected into the HTTP requests. Most APIs accept these values as query parameters like in the above example, but this can differ depending on the API. If an API does not accept a limit, then the injection configuration for the limit can be disabled
Either way, your connector will automatically increment the offset for subsequent requests based on the number of records it receives, and will continue until it receives fewer records than the limit you configured.
So for the example API and dataset above, you could apply the following Pagination configurations in the Connector Builder:
Offset Increment2request_parameterlimitrequest_parameteroffsetand this would cause your connector to make the following requests to the API in order to paginate through all of its data:
GET https://api.example.com/products?limit=2&offset=0
-> [
{"id": 1, "name": "Product A"},
{"id": 2, "name": "Product B"}
]
GET https://api.example.com/products?limit=2&offset=2
-> [
{"id": 3, "name": "Product C"},
{"id": 4, "name": "Product D"}
]
GET https://api.example.com/products?limit=3&offset=4
-> [
{"id": 5, "name": "Product E"}
]
// less than 2 records returned -> stop
The Connector Builder currently supports injecting these values into the query parameters (i.e. request parameters), headers, or body (JSON or form data).
Inject Offset on First Request: By default, the first request doesn't include an offset parameter (assuming offset=0 is implicit). Enable this option if the API requires an explicit offset=0 parameter in the first request.
Interpolated Page Size: You can use dynamic page sizes by setting the limit to an interpolated value:
{{ config['page_size'] }} - Use a value from your connector configuration"{{ config['page_size'] }}" - Use a string interpolated value for APIs expecting string parametersInjection Options: The limit and offset values can be injected into different parts of the HTTP request:
The following APIs accept offset and limit pagination values as query parameters like in the above example:
If your API paginates using page increments, the API docs will likely contain one of the following keywords:
page size / page_size / pagesize / per_pagepage number / page_number / pagenum / pageIn this method of pagination, the "page size" specifies the maximum number of records to return per request, while the "page number" indicates the specific page of data to retrieve.
This is similar to Offset Increment pagination, but instead of increasing the offset parameter by the number of records per page for the next request, the page number is simply increased by one to fetch the next page, iterating through all of them.
For example, say that the API has the following dataset:
[
{"id": 1, "name": "Product A"},
{"id": 2, "name": "Product B"},
{"id": 3, "name": "Product C"},
{"id": 4, "name": "Product D"},
{"id": 5, "name": "Product E"},
{"id": 6, "name": "Product F"}
]
Then the API may take in a request like this: GET https://api.example.com/products?page_size=2&page=1, which could result in the following response:
{
"data": [
{"id": 1, "name": "Product A"},
{"id": 2, "name": "Product B"}
]
}
then incrementing the page by 1 to call it with GET https://api.example.com/products?page_size=2&page=2 would result in:
{
"data": [
{"id": 3, "name": "Product C"},
{"id": 4, "name": "Product D"}
]
}
and so on.
The Connector Builder abstracts this away so that you only need to decide what page size to set (the general recommendation is to use the largest limit that the API supports in order to minimize the number of API requests), what the starting page number should be (usually either 0 or 1 dependent on the API), and how the page size and number are injected into the API requests. Similar to Offset Increment pagination, the page size injection can be disabled if the API does not accept a page size value.
Either way, your connector will automatically increment the page number by 1 for each subsequent request, and continue until it receives fewer records than the page size you configured.
So for the example API and dataset above, you could apply the following configurations in the Connector Builder:
Page Increment31request_parameterpage_sizerequest_parameterpageand this would cause your connector to make the following requests to the API in order to paginate through all of its data:
GET https://api.example.com/products?page_size=3&page=1
-> [
{"id": 1, "name": "Product A"},
{"id": 2, "name": "Product B"},
{"id": 3, "name": "Product C"}
]
GET https://api.example.com/products?page_size=3&page=2
-> [
{"id": 4, "name": "Product D"},
{"id": 5, "name": "Product E"},
{"id": 6, "name": "Product F"}
]
GET https://api.example.com/products?page_size=3&page=3
-> [
]
// no records returned -> stop
The Connector Builder currently supports injecting these values into the query parameters (i.e. request parameters), headers, or body (JSON or form data).
Start From Page: Specify the initial page number. Use 0 for zero-based pagination (pages 0, 1, 2...) or 1 for one-based pagination (pages 1, 2, 3...). The default is 0.
Inject Page Number on First Request: By default, the first request doesn't include a page parameter (assuming the first page is implicit). Enable this option if the API requires an explicit page number in the first request.
Interpolated Page Size: You can use dynamic page sizes by setting the page size to an interpolated value:
{{ config['page_size'] }} - Use a value from your connector configuration"{{ config['page_size'] }}" - Use a string interpolated value for APIs expecting string parametersZero-based vs One-based Pagination: Different APIs use different page numbering schemes:
Zero-based pagination (Start From Page = 0):
GET /users?page=0&page_size=50 - First pageGET /users?page=1&page_size=50 - Second pageOne-based pagination (Start From Page = 1):
GET /users?page=1&page_size=50 - First pageGET /users?page=2&page_size=50 - Second pageInjection Options: The page number and page size values can be injected into different parts of the HTTP request:
The following APIs accept page size/num pagination values as query parameters like in the above example:
If your API paginates using cursor pagination, the API docs will likely contain one of the following keywords:
cursorlinknext_tokenIn this method of pagination, some identifier (e.g. a timestamp or record ID) is used to navigate through the API's records, rather than relying on fixed indices or page numbers like in the above methods. When making a request, clients provide a cursor value, and the API returns a subset of records starting from the specified cursor, along with the cursor for the next page. This can be especially helpful in preventing issues like duplicate or skipped records that can arise when using the above pagination methods.
Using the Twitter API as an example, a request is made to the /tweets endpoint, with the page size (called max_results in this case) set to 100. This will return a response like:
{
"data": [
{
"created_at": "2020-12-11T20:44:52.000Z",
"id": "1337498609819021312",
"text": "Thanks to everyone who tuned in today..."
},
{
"created_at": "2020-05-06T17:24:31.000Z",
"id": "1258085245091368960",
"text": "It’s now easier to understand Tweet impact..."
},
...
],
"meta": {
...
"result_count": 100,
"next_token": "7140w"
}
}
The meta.next_token value of that response can then be set as the pagination_token in the next request, causing the API to return the next 100 tweets.
To integrate with such an API in the Connector Builder, you must configure how this "Next page cursor" is obtained for each request. In most cases, the next page cursor is either part of the response body or part of the HTTP headers. Select the respective type and define the property (or nested property) that holds the cursor value, for example "meta, next_token" for the twitter API.
You can also configure how the cursor value is injected into the API Requests. In the above example, this would be set as a request_parameter with the field name pagination_token, but this is dependent on the API - check the docs to see if they describe how to set the cursor/token for subsequent requests. For cursor pagination, if path is selected as the Inject into option, then the entire request URL for the subsequent request will be replaced by the cursor value. This can be useful for APIs that return a full URL that should be requested for the next page of results, such as the GitHub API.
The "Page size" can optionally be specified as well; if so, how this page size gets injected into the HTTP requests can be configured similar to the above pagination methods.
When using the "response" or "headers" option for obtaining the next page cursor, the connector will stop requesting more pages as soon as no value can be found at the specified location. In some situations, this is not sufficient. If you need more control over how to obtain the cursor value and when to stop requesting pages, use the "custom" option and specify the "stop condition" using a jinja placeholder.
Cursor Value Context: The cursor value template has access to the following context:
config - Your connector configurationheaders - Response headers (with parsed link header available as headers.link)last_page_size - Number of records in the current pagelast_record - The last record from the current pageresponse - The full response bodyStop Condition: The stop condition is a template that determines when to stop paginating. Common patterns include:
{{ response.more_results is false }} - Stop when a boolean flag indicates no more results{{ response.pagination.next_cursor is none }} - Stop when next_cursor is null{{ not response.data }} - Stop when data array is empty{{ 'next' not in headers['link'] }} - Stop when there's no 'next' link in headers{{ response.data|length == 0 or response.pagination.next_cursor is none }} - Complex condition combining multiple checksPage Size: Even with cursor pagination, you can often specify how many records to return per page. Set this to optimize performance - larger page sizes mean fewer API calls but larger responses.
Request Path vs Request Parameter: There are two ways to inject cursor values:
:::info
One potential variant of cursor pagination is an API that takes in some sort of record identifier to "start after". For example, the PartnerStack API endpoints accept a starting_after parameter to which a record key is supposed to be passed.
In order to configure cursor pagination for this API in the connector builder, you will need to extract the key off of the last record returned by the previous request, using a "custom" next page cursor.
This can be done in a couple different ways:
{{ last_records }} object; so accessing the key field of the last record would look like {{ last_records[-1]['key'] }}. The [-1] syntax points to the last item in that last_records array.{{ response }} object; so accessing the key field of the last item would look like {{ response['data']['items'][-1]['key'] }}.This API also has a boolean has_more property included in the response to indicate if there are more items to be retrieved, so the stop condition in this case should be {{ response.data.has_more is false }}.
:::
The following APIs implement cursor pagination in various ways:
next_token IDs in its responses which are passed in as query parameters to subsequent requestslinks to subsequent pages of resultslinks to subsequent pages of resultsFor APIs that use unique pagination mechanisms not covered by the standard methods (Offset Increment, Page Increment, or Cursor Pagination), you can implement a custom pagination strategy. This requires writing custom Python code as part of your connector implementation.
Custom pagination strategies are useful for:
To use a custom pagination strategy:
source_<name>.<package>.<class_name>Your custom pagination class must:
PaginationStrategy in the Airbyte CDKIf you have a custom pagination class in your connector:
# In source_myapi/components.py
from airbyte_cdk.sources.declarative.requesters.paginators.strategies.pagination_strategy import PaginationStrategy
class MyCustomPaginationStrategy(PaginationStrategy):
def next_page_token(self, response, last_page_size, last_record, last_page_token_value):
# Your custom pagination logic here
# Return the next page token or None to stop pagination
pass
def get_page_size(self):
# Return the page size for this strategy
return self.page_size
Configure it in the Connector Builder as:
source_myapi.components.MyCustomPaginationStrategyWhen implementing custom pagination strategies, you have access to the same context as other pagination methods, including the response data, headers, and configuration values.
Using the "Inject page size / limit / offset into outgoing HTTP request" option in the pagination form works for most cases, but sometimes the API has special requirements that can't be handled this way:
To handle these cases, disable injection in the pagination form and use the generic parameter section at the bottom of the stream configuration form to freely configure query parameters, headers and properties of the JSON body, by using jinja expressions and available variables. You can also use these variables as part of the URL path.
For example the Prestashop API requires to set offset and limit separated by a comma into a single query parameter (?limit=<offset>,<limit>)
For this case, you can use the next_page_token variable to configure a query parameter with key limit and value {{ next_page_token['next_page_token'] or '0' }},50 to inject the offset from the pagination strategy and a hardcoded limit of 50 into the same parameter.
GitHub uses cursor pagination with link headers:
Link: <https://api.github.com/repos/octocat/Hello-World/issues?page=2>; rel="next",
<https://api.github.com/repos/octocat/Hello-World/issues?page=5>; rel="last"
Configuration:
{{ headers.link.next.url }}{{ 'next' not in headers['link'] }}Stripe uses the ID of the last object as a cursor:
{
"data": [{"id": "cus_123"}, {"id": "cus_456"}],
"has_more": true
}
Configuration:
{{ last_record['id'] }}starting_after{{ response.has_more is false }}A typical REST API with zero-based pagination:
Configuration:
{{ config['page_size'] }}When testing your pagination configuration:
If your connector keeps paginating indefinitely:
If some records are missing:
If you're seeing duplicate records:
If the first request fails or behaves unexpectedly:
If pagination doesn't stop when expected:
If pagination parameters aren't being sent correctly: