Back to Deepseek Coder

README

Evaluation/DS-1000/README.md

latest2.4 KB
Original Source

1. Introduction

We provide a test script to evaluate the performance of the deepseek-coder model on code completion benchmarks. We select the widely-used benchmarks: DS-1000.

2. Evaluation

We directly use the scripts provided by the DS-1000 repository to evaluate the performance of the models. You can refer to DS-1000 to find more details about the evaluation.

3. Experimental Results

We report experimental results here for the completion mode of DS-1000. We set the maximum length to 2048, and employ the greedy search strategy. To ensure a fair comparison, we apply identical hyper-parameters across all open-source models under evaluation.

ModelSizeMatplotlibNumpyPandasPytorchScipyScikit-LearnTensorflowAvg
Codex-001-41.8%26.6%9.4%9.7%15.0%18.5%17.2%20.2%
Codex-002-57.0%43.1%26.5%41.8%31.8%44.8%39.3%39.2%
CodeShell7B34.1%21.8%10.7%11.8%17.0%20.0%15.6%18.8%
CodeGeeX26B38.7%26.8%14.4%11.8%19.8%27.0%17.8%22.9%
StarCoder16B47.7%31.4%12.7%25%22.6%35.7%22.2%27.2%
CodeLLama-Base7B41.9%24.6%14.8%16.2%18.9%17.4%17.8%22.1%
CodeLLama-Base13B46.5%28.6%18.2%19.1%18.9%27.8%33.3%26.8%
CodeLLama-Base34B50.3%42.7%23.0%25.0%28.3%33.9%40.0%34.3%
DeepSeek-Coder-Base1.3B32.3%21.4%9.3%8.8%8.5%16.5%8.9%16.2%
DeepSeek-Coder-Base5.7B51.1%31.8%19.9%14.7%17.0%29.6%15.6%27.7%
DeepSeek-Coder-Base6.7B48.4%35.5%20.6%19.1%22.6%38.3%24.4%30.5%
DeepSeek-Coder-Base33B56.1%49.6%25.8%36.8%36.8%40.0%46.7%40.2%