explorations/longmemeval/DATA_DOWNLOAD_GUIDE.md
The LongMemEval datasets are large files (several GB) hosted on HuggingFace with Git LFS. Here are all the ways to download them:
export HF_TOKEN=your_token_here
# or
export HUGGINGFACE_TOKEN=your_token_here
pnpm install
pnpm download:hf
# macOS
brew install git-lfs
# Ubuntu/Debian
sudo apt-get install git-lfs
# Initialize Git LFS
git lfs install
git clone https://huggingface.co/datasets/xiaowu0162/longmemeval
cd longmemeval
cp *.json ../data/
Go to: https://drive.google.com/file/d/1zJgtYRFhOh5zDQzzatiddfjYhFSnyQ80/view
cd packages/longmemeval/data
tar -xzvf ~/Downloads/longmemeval_data.tar.gz
ls -lh *.json
# You should see:
# - longmemeval_s.json (~40MB)
# - longmemeval_m.json (~200MB)
# - longmemeval_oracle.json (~2MB)
If you have a HuggingFace account:
packages/longmemeval/data/This means the download failed due to authentication. Use one of the authenticated methods above.
HuggingFace has bandwidth limits. Try:
Make sure you're logged in to HuggingFace and have accepted any dataset terms of use.
After downloading, verify the files:
# Check file sizes
ls -lh data/*.json
# Check file content (should be valid JSON)
head -n 5 data/longmemeval_s.json
Expected sizes:
longmemeval_oracle.json: ~2MBlongmemeval_s.json: ~40MBlongmemeval_m.json: ~200MB