skills/lance-user-guide/references/io-cheatsheet.md
Use this file when the user asks how to write/read Lance datasets, manage versions, or work with object stores.
Use lance.write_dataset(data, uri, mode=...).
Modes:
mode="create": create new dataset (error if exists)mode="overwrite": create a new version that replaces the latest snapshotmode="append": append data as a new version (or create if missing)Inputs:
pyarrow.Tablepyarrow.RecordBatchReaderUse lance.dataset(uri, version=..., asof=..., storage_options=...).
Notes:
version can be a number or a tag (depending on the environment/version).storage_options for object stores (credentials, endpoint, etc.).Use ds.scanner(...) for pushdowns:
columns=[...] for projectionfilter="..." for predicate pushdownlimit=... for limit pushdownnearest={...} for vector searchprefilter=True/False to control filter ordering when combined with nearestuse_scalar_index=True/False to control scalar index usageThen materialize:
scanner(...).to_table()scanner(...).to_batches()Use a scalar index for the filter column when the filter is selective and you set prefilter=True.
Example:
tbl = ds.scanner(
nearest={"column": "vector", "q": q, "k": 10},
filter="category = 'a'",
prefilter=True,
).to_table()
Prefer:
ds.describe_indices()Use with care:
ds.list_indices() can be expensive because it may load index statistics.