Copyright (c) 2026 Microsoft Corporation. - Graphrag

python

# Copyright (c) 2026 Microsoft Corporation.
# Licensed under the MIT License.

Token chunking example

The TokenChunker splits text into fixed-size chunks based on token count rather than sentence boundaries. It uses a tokenizer to encode text into tokens, then creates chunks of a specified size with configurable overlap between chunks.

python

import tiktoken
from graphrag_chunking.token_chunker import TokenChunker

tokenizer = tiktoken.get_encoding("o200k_base")
chunker = TokenChunker(
    size=3, overlap=0, encode=tokenizer.encode, decode=tokenizer.decode
)
chunks = chunker.chunk("This is a random test fragment of some text")
print(chunks)  # ["This is a", " random test fragment", " of some text"]