docs/source/user-guide/io/hugging-face.md
All cloud-enabled scan functions, and their read_ counterparts transparently support scanning from
Hugging Face:
| Scan | Read |
|---|---|
| scan_parquet | read_parquet |
| scan_csv | read_csv |
| scan_ndjson | read_ndjson |
| scan_ipc | read_ipc |
To scan from Hugging Face, a hf:// path can be passed to the scan functions. The hf:// path
format is defined as hf://BUCKET/REPOSITORY@REVISION/PATH, where:
BUCKET is one of datasets or spacesREPOSITORY is the location of the repository, this is usually in the format of
username/repo_name. A branch can also be optionally specified by appending @branchREVISION is the name of the branch (or commit) to use. This is optional and defaults to main
if not given.PATH is a file or directory path, or a glob pattern from the repository root.Example hf:// paths:
| Path | Path components |
|---|---|
| hf://datasets/nameexhaustion/polars-docs/iris.csv | Bucket: datasets |
| Repository: nameexhaustion/polars-docs | |
| Branch: main | |
| Path: iris.csv | |
| Web URL | |
| hf://datasets/nameexhaustion/polars-docs@foods/*.csv | Bucket: datasets |
| Repository: nameexhaustion/polars-docs | |
| Branch: foods | |
| Path: *.csv | |
| Web URL | |
| hf://datasets/nameexhaustion/polars-docs/hive_dates/ | Bucket: datasets |
| Repository: nameexhaustion/polars-docs | |
| Branch: main | |
| Path: hive_dates/ | |
| Web URL | |
| hf://spaces/nameexhaustion/polars-docs/orders.feather | Bucket: spaces |
| Repository: nameexhaustion/polars-docs | |
| Branch: main | |
| Path: orders.feather | |
| Web URL |
A Hugging Face API key can be passed to Polars to access private locations using either of the following methods:
token in storage_options to the scan function, e.g.
scan_parquet(..., storage_options={'token': '<your HF token>'})HF_TOKEN environment variable, e.g. export HF_TOKEN=<your HF token>--8<-- "python/user-guide/io/hugging-face.py:setup"
{{code_block('user-guide/io/hugging-face','scan_iris_csv',['scan_csv'])}}
--8<-- "python/user-guide/io/hugging-face.py:scan_iris_repr"
See this file at https://huggingface.co/datasets/nameexhaustion/polars-docs/blob/main/iris.csv
{{code_block('user-guide/io/hugging-face','scan_iris_ndjson',['scan_ndjson'])}}
--8<-- "python/user-guide/io/hugging-face.py:scan_iris_repr"
See this file at https://huggingface.co/datasets/nameexhaustion/polars-docs/blob/main/iris.jsonl
{{code_block('user-guide/io/hugging-face','scan_parquet_hive_repr',['scan_parquet'])}}
--8<-- "python/user-guide/io/hugging-face.py:scan_parquet_hive_repr"
See this folder at https://huggingface.co/datasets/nameexhaustion/polars-docs/tree/main/hive_dates/
{{code_block('user-guide/io/hugging-face','scan_ipc',['scan_ipc'])}}
--8<-- "python/user-guide/io/hugging-face.py:scan_ipc_repr"
See this file at https://huggingface.co/spaces/nameexhaustion/polars-docs/blob/main/orders.feather