LiveCodeBench for Reasoning Models

This is a customized version of LiveCodeBench specifically designed for reasoning models, such as QwQ-32B-Preview.

Getting Started

1. Download the Dataset

bash

huggingface-cli download --repo-type dataset livecodebench/code_generation_lite --local-dir code_generation_lite

2. Inference and Evaluation

Before running the inference and evaluation, ensure that you have configured the following in the scripts/evaluate_qwq.sh script:

PYTHON_BIN: Set the path to your Python binary.
MODEL_DIR: Set the path to the QwQ-32B-Preview model directory.

Run the following command to perform inference and evaluation:

bash

bash scripts/evaluate_qwq.sh

The results of the inference will be saved in the lcb_result/qwq-32b-preview directory. We have included results obtained from the Qwen/QwQ-32B-Preview model.

Modifications

A new chat template for Qwen/QwQ-32B-Preview.
Adjusted the prompt template to allow the model to think more freely (removed You will NOT return anything except for the program).
Implemented a filter to select examples within a specified date range.
Modified the extract_code function to select the last code block according to the updated prompt template.

Acknowledgments

A big thank you to the LiveCodeBench team for their valuable contributions to the open-source community.