docs/src/content/docs/advanced_topics/memoization_keys.mdx
As described in Function — Change detection, CocoIndex detects logic, input, and context changes to decide whether a memo can be reused. Function arguments, deps values, and context values with detect_change=True are all fingerprinted through the same data fingerprinting pipeline. By default, most types are fingerprinted automatically. This page covers how to customize that pipeline — how objects are fingerprinted and validated:
For each data value (function argument, deps value, or context value), CocoIndex derives a canonical form with this precedence:
__coco_memo_key__(), CocoIndex uses its return value.The following types are handled automatically (no custom key needed):
None, bool, int, float, str, bytes, bytearray, memoryviewlist, tuple, dict, set, frozenset (recursively canonicalized)type): identified by module and qualified namepickleThe canonical forms are combined into a deterministic fingerprint. If the fingerprint matches a cached entry, the cached result is reused — unless memo states indicate it's stale (see Memo state validation below).
__coco_memo_key__ (when you control the type)Implement a method on your class that returns a stable, deterministic value:
class MyType:
def __coco_memo_key__(self) -> object:
# Return small primitives / tuples.
return (...)
Return something that uniquely identifies the semantic content your function depends on:
(stable_id, version)datetime.now(), or large raw payloadsExample — DB row:
class UserRow:
def __init__(self, user_id: int, updated_at: int) -> None:
self.user_id = user_id
self.updated_at = updated_at
def __coco_memo_key__(self) -> object:
return ("users", self.user_id, self.updated_at)
If you can't add __coco_memo_key__ (stdlib / third-party types), register a handler:
from pathlib import Path
from cocoindex import register_memo_key_function
def path_key(p: Path) -> object:
p = p.resolve()
st = p.stat()
return (str(p), st.st_mtime_ns, st.st_size)
register_memo_key_function(Path, path_key)
__coco_memo_key__ (small primitives/tuples).Sometimes fingerprint matching alone isn't enough to decide whether a cached result is valid. For example:
If-Modified-Since on the next run.Memo state validation addresses these by letting you attach a state function to your objects. It runs after a fingerprint match, giving you a chance to check freshness before the cached result is reused.
When CocoIndex finds a fingerprint match, it calls each state function with the stored state from the previous run:
prev_state is coco.NON_EXISTENCE. Use coco.is_non_existence(prev_state) to detect this.prev_state is whatever you returned last time.Your state function returns a coco.MemoStateOutcome(state=..., memo_valid=...):
state — the current state value. CocoIndex stores it for the next run.memo_valid (bool, defaults to False) — whether the cached result is still valid.This decouples "has the state changed?" from "can we reuse the memo?":
MemoStateOutcome(state=new_state) → cache is invalid (default). Function re-executes, new state is stored. On the first run (no previous cache), simply return the initial state without setting memo_valid.MemoStateOutcome(state=same_state, memo_valid=True) → nothing changed, cached result reused, no state update needed.MemoStateOutcome(state=new_state, memo_valid=True) → state changed but cached result is still valid (e.g. mtime changed but content hash unchanged). The new state is persisted so the next run uses the updated state.__coco_memo_state__ (when you control the type):::info[Type annotations]
Annotate the prev_state parameter with its expected type (matching what you return in MemoStateOutcome(state=...)) so CocoIndex can properly reconstruct stored state values. See Serialization for details on supported types.
:::
Add a __coco_memo_state__ method alongside __coco_memo_key__:
import os
import hashlib
from pathlib import Path
import cocoindex as coco
class LocalFile:
def __init__(self, path: Path) -> None:
self.path = path
def __coco_memo_key__(self) -> object:
# Identity only — which file is it?
return str(self.path.resolve())
def __coco_memo_state__(self, prev_state: tuple[int, str] | coco.NonExistenceType) -> coco.MemoStateOutcome:
st = os.stat(self.path)
new_mtime = st.st_mtime_ns
if coco.is_non_existence(prev_state):
# First run — compute initial state (memo_valid defaults to False,
# which is fine since there's no previous cache to reuse)
content_hash = hashlib.sha256(self.path.read_bytes()).hexdigest()
return coco.MemoStateOutcome(state=(new_mtime, content_hash))
prev_mtime, prev_hash = prev_state
if new_mtime == prev_mtime:
# mtime unchanged — definitely reusable, no content read needed
return coco.MemoStateOutcome(state=prev_state, memo_valid=True)
# mtime changed — read content and check hash
content_hash = coco.connectorkits.fingerprint_bytes(self.path.read_bytes())
return coco.MemoStateOutcome(state=(new_mtime, content_hash), memo_valid=content_hash == prev_hash)
:::tip[Keys vs states for files]
Without state validation, you'd include mtime and size directly in the memo key:
def __coco_memo_key__(self):
st = os.stat(self.path)
return (str(self.path.resolve()), st.st_mtime_ns, st.st_size)
This works for simple cases. State validation becomes useful when you need multi-level checks (e.g. check mtime first, then content hash only if it differs), async operations, or stored metadata like ETags. With the MemoStateOutcome return, you can update the state (e.g. new mtime) without invalidating the cache when the content hasn't actually changed.
:::
Pass a state_fn keyword argument to register_memo_key_function. The state function receives the object as its first argument and prev_state as its second. Annotate prev_state with the expected type:
from pathlib import Path
from cocoindex import register_memo_key_function
def path_key(p: Path) -> object:
return str(p.resolve())
def path_state(p: Path, prev_state: tuple[int, int] | coco.NonExistenceType) -> coco.MemoStateOutcome:
st = p.stat()
new_state = (st.st_mtime_ns, st.st_size)
memo_valid = not coco.is_non_existence(prev_state) and new_state == prev_state
return coco.MemoStateOutcome(state=new_state, memo_valid=memo_valid)
register_memo_key_function(Path, path_key, state_fn=path_state)
A state method can return an Awaitable. CocoIndex handles this automatically:
asyncio.run(). If a loop is already running, it raises an error — switch to an async function or use @coco.fn.as_async.import cocoindex as coco
class S3Object:
def __init__(self, bucket: str, key: str) -> None:
self.bucket = bucket
self.key = key
def __coco_memo_key__(self) -> object:
return (self.bucket, self.key)
async def __coco_memo_state__(self, prev_state: str | coco.NonExistenceType) -> coco.MemoStateOutcome:
etag = await self._head_object()
memo_valid = not coco.is_non_existence(prev_state) and etag == prev_state
return coco.MemoStateOutcome(state=etag, memo_valid=memo_valid)
async def _head_object(self) -> str:
... # boto3 / aioboto3 HEAD call
Some types maintain internal state that makes memoization semantically incorrect. For example, a generator that tracks call counts would produce wrong results if memoized.
NotMemoKeyable (when you control the type)import cocoindex as coco
class MyStatefulGenerator(coco.NotMemoKeyable):
def __init__(self) -> None:
self._counter = 0
def next_value(self) -> int:
self._counter += 1
return self._counter
import cocoindex as coco
from some_library import StatefulGenerator
coco.register_not_memo_keyable(StatefulGenerator)
In either case, attempting to use the type as a memo key raises a clear error.
id(obj), pointer addresses, or random values.MemoStateOutcome(state=new_state, memo_valid=True) for cheap state updates: when a cheap property changes (mtime) but the expensive check (content hash) confirms nothing meaningful changed, return memo_valid=True while updating the state. This avoids re-executing the function and avoids re-checking the expensive property next time.NotMemoKeyable: prevent subtle bugs from incorrect memoization of types with side effects.