qwencoder-eval/instruct/PlotCraft/README.md
PlotCraft is a rigorous benchmark designed to evaluate the advanced data visualization capabilities of LLMs. It presents ~1k challenging tasks to assess how well models can generate and refine complex plots from natural language instructions.
Key Features:
conda create -n plotcraftbench python=3.13
conda activate plotcraftbench
pip install -r requirements.txt
zip -F data.zip --out complete_data.zip
unzip complete_data.zip -d data
Obtain Kaggle API credentials: Navigate to your Kaggle account settings (Account tab) and select 'Create New Token'. This downloads kaggle.json containing your API credentials.
Configure credentials: Move kaggle.json to the appropriate location:
~/.kaggle/kaggle.jsonC:\Users\<Windows-username>\.kaggle\kaggle.jsonInstall Kaggle CLI:
pip install kaggle
python download_datasets.py data/
Alternatively, download datasets manually using the URLs specified in download_url.json files located in each subdirectory (e.g., data/<dataset-name>/download_url.json). After downloading, place all CSV and XLSX files in the root of their respective subdirectories.
Edit run_single_turn.sh or run_multi_turn.sh to set your OpenAI-compatible API endpoint and keys:
API_KEY="" # Evaluation API key
API_KEY_GEN="" # Generation API key
Single-Turn Mode (generate plots from scratch):
bash run_single_turn.sh <model_name> <api_base_url>
Multi-Turn Mode (iterative refinement):
bash run_multi_turn.sh <model_name> <api_base_url>
Results will be saved to results_single_turn/ (single-turn) or results_multi_turn/ (multi-turn).