Back to Qwen3 Coder

README

qwencoder-eval/instruct/README.md

latest6.4 KB
Original Source

Setup

  1. Create a Conda Environment Use the following command to create and activate a new environment for the SFT training:

    bash
    benchmark="eval_plus"
    conda create -n eval_${benchmark}_env python=3.9
    conda activate eval_${benchmark}_env
    cd ${benchmark}
    pip install -r requirements.txt
    

    Please setup all evaluation environments in test.sh

  2. Install Dependencies After activating the environment, install all required dependencies by running: For each

    bash
    pip install -r requirements.txt
    

Evaluation

  1. Model Path Modification
    Before running the evaluation script, ensure that the model paths are correctly set. Modify the paths as needed based on your local environment or cloud storage setup.

  2. Run Evaluation
    Once the environment is ready and the model paths are configured, run the evaluation suite by executing the following script:

    bash
    EVAL_SCRIPT="./evaluate.sh"
    MODEL_DIR="/path/to/Qwen2.5-coder-Instruct/"
    OUTPUT_DIR="/path/to/results/"
    TP=2
    bash ${EVAL_SCRIPT} ${MODEL_DIR} ${OUTPUT_DIR} ${TP}
    

Quantization Evaluation Results

Python

HEHE+MBPPMBPP+BCB-inst-fullBCB-inst-hardLCB (2407-2411)
Qwen2.5-Coder-32B-Instruct92.787.290.275.149.627.031.4
Qwen2.5-Coder-32B-AWQ92.184.187.875.148.927.031.7
Qwen2.5-Coder-32B-Instruct-GPTQ-Int892.185.490.576.548.626.430.7
Qwen2.5-Coder-32B-Instruct-GPTQ-Int489.683.587.075.949.727.030.3
Qwen2.5-Coder-32B-Instruct-GGUF-Q8_090.986.089.476.247.723.631.1
Qwen2.5-Coder-32B-Instruct-GGUF-Q6_K90.286.089.776.248.325.731.6
Qwen2.5-Coder-32B-Instruct-GGUF-Q5_K_M90.985.489.275.748.825.731.4
Qwen2.5-Coder-32B-Instruct-GGUF-Q5_090.986.088.974.948.423.630.7
Qwen2.5-Coder-32B-Instruct-GGUF-Q4_K_M89.685.489.475.948.924.329.9
Qwen2.5-Coder-32B-Instruct-GGUF-Q4_089.685.490.277.848.225.732.6
Qwen2.5-Coder-32B-Instruct-GGUF-Q3_K_M91.586.690.776.748.323.632.0
Qwen2.5-Coder-32B-Instruct-GGUF-Q2_K92.784.887.374.347.623.028.5

Multiple Programming Languages

PythonJavaC++C#TSJSPHPBashAvg.
Qwen2.5-Coder-32B-Instruct92.780.479.582.986.885.778.948.179.4
Qwen2.5-Coder-32B-AWQ90.283.580.782.385.585.179.549.479.5
Qwen2.5-Coder-32B-Instruct-GPTQ-Int890.984.281.480.485.587.679.549.479.8
Qwen2.5-Coder-32B-Instruct-GPTQ-Int492.183.582.680.484.386.378.350.079.7
Qwen2.5-Coder-32B-Instruct-GGUF-Q8_090.284.882.081.085.587.680.149.480.1
Qwen2.5-Coder-32B-Instruct-GGUF-Q6_K90.983.582.081.684.987.080.748.779.9
Qwen2.5-Coder-32B-Instruct-GGUF-Q5_K_M90.283.582.681.085.587.079.548.779.8
Qwen2.5-Coder-32B-Instruct-GGUF-Q5_090.284.882.081.685.587.080.148.179.9
Qwen2.5-Coder-32B-Instruct-GGUF-Q4_K_M90.284.881.482.385.586.380.150.680.2
Qwen2.5-Coder-32B-Instruct-GGUF-Q4_088.482.980.181.086.885.778.348.178.9
Qwen2.5-Coder-32B-Instruct-GGUF-Q3_K_M90.984.285.182.384.987.080.149.480.5
Qwen2.5-Coder-32B-Instruct-GGUF-Q2_K90.281.082.681.683.684.580.148.779.1

Code Editing & Code Reasoning & SQL

Aider (whole)Aider (diff)CRUXEval-Input-CoTCRUXEval-Output-CoTSpiderBird
Qwen2.5-Coder-32B-Instruct73.771.475.283.485.158.4
Qwen2.5-Coder-32B-AWQ73.767.775.183.183.657.3
Qwen2.5-Coder-32B-Instruct-GPTQ-Int874.473.775.883.684.858.1
Qwen2.5-Coder-32B-Instruct-GPTQ-Int472.267.775.883.585.057.6
Qwen2.5-Coder-32B-Instruct-GGUF-Q8_072.969.980.583.884.557.9
Qwen2.5-Coder-32B-Instruct-GGUF-Q6_K72.973.778.183.584.758.1
Qwen2.5-Coder-32B-Instruct-GGUF-Q5_K_M74.469.978.484.685.357.7
Qwen2.5-Coder-32B-Instruct-GGUF-Q5_071.472.280.683.284.957.4
Qwen2.5-Coder-32B-Instruct-GGUF-Q4_K_M75.269.279.083.584.557.5
Qwen2.5-Coder-32B-Instruct-GGUF-Q4_074.471.478.584.084.757.2
Qwen2.5-Coder-32B-Instruct-GGUF-Q3_K_M72.968.478.883.984.457.4
Qwen2.5-Coder-32B-Instruct-GGUF-Q2_K69.961.775.581.183.456.1