Back to Graphrag

Copyright (c) 2026 Microsoft Corporation.

packages/graphrag-chunking/example_notebooks/token_chunking_example.ipynb

3.0.9733 B
Original Source
python
# Copyright (c) 2026 Microsoft Corporation.
# Licensed under the MIT License.

Token chunking example

The TokenChunker splits text into fixed-size chunks based on token count rather than sentence boundaries. It uses a tokenizer to encode text into tokens, then creates chunks of a specified size with configurable overlap between chunks.

python
import tiktoken
from graphrag_chunking.token_chunker import TokenChunker

tokenizer = tiktoken.get_encoding("o200k_base")
chunker = TokenChunker(
    size=3, overlap=0, encode=tokenizer.encode, decode=tokenizer.decode
)
chunks = chunker.chunk("This is a random test fragment of some text")
print(chunks)  # ["This is a", " random test fragment", " of some text"]