qwencoder-eval/instruct/eval-dev-quality/docs/reports/v0.4.0/5/README.md
This report was generated by DevQualityEval benchmark in version 0.4.0.
Keep in mind that LLMs are nondeterministic. The following results just reflect a current snapshot.
The results of all models have been divided into the following categories:
The following sections list all models with their categories. The complete log of the evaluation with all outputs can be found here. Detailed scoring can be found here.
Models in this category encountered an error.
openrouter/jebcarter/psyfighter-13bopenrouter/anthropic/claude-2openrouter/openchat/openchat-7b:freeopenrouter/openrouter/cinematika-7b:freeopenrouter/jondurbin/bagel-34bopenrouter/anthropic/claude-instant-1openrouter/haotian-liu/llava-13bopenrouter/undi95/toppy-m-7b:freeopenrouter/anthropic/claude-2:betaopenrouter/nousresearch/nous-hermes-2-vision-7bopenrouter/anthropic/claude-instant-1:betaopenrouter/nousresearch/nous-capybara-7b:freeopenrouter/perplexity/pplx-70b-onlineopenrouter/lynn/soliloquy-l3Models in this category produced no code.
openrouter/neversleep/noromaid-20bopenrouter/koboldai/psyfighter-13b-2openrouter/nousresearch/nous-capybara-7bopenrouter/neversleep/noromaid-mixtral-8x7b-instructopenrouter/gryphe/mythomist-7b:freeopenrouter/meta-llama/codellama-34b-instructopenrouter/undi95/remm-slerp-l2-13b:extendedopenrouter/meta-llama/llama-3-8b-instruct:extendedopenrouter/anthropic/claude-instant-1.0openrouter/openai/gpt-3.5-turbo-1106openrouter/gryphe/mythomist-7bopenrouter/nousresearch/nous-hermes-llama2-13bopenrouter/teknium/openhermes-2.5-mistral-7bopenrouter/mistralai/mistral-tinyopenrouter/intel/neural-chat-7bopenrouter/gryphe/mythomax-l2-13b:nitroopenrouter/gryphe/mythomax-l2-13b:extendedopenrouter/mistralai/mistral-7b-instruct:nitroopenrouter/01-ai/yi-34bopenrouter/google/gemini-proopenrouter/togethercomputer/stripedhyena-hessian-7bopenrouter/cognitivecomputations/dolphin-mixtral-8x7bopenrouter/openai/gpt-3.5-turbo-instructopenrouter/pygmalionai/mythalion-13bopenrouter/mancer/weaveropenrouter/undi95/remm-slerp-l2-13bopenrouter/openai/gpt-3.5-turbo-0613Models in this category produced invalid code.
openrouter/mistralai/mistral-7b-instruct:freeopenrouter/codellama/codellama-70b-instructopenrouter/microsoft/wizardlm-2-7bopenrouter/google/palm-2-codechat-bisonopenrouter/open-orca/mistral-7b-openorcaopenrouter/google/palm-2-chat-bisonopenrouter/perplexity/pplx-7b-chatopenrouter/01-ai/yi-34b-chatopenrouter/nousresearch/nous-hermes-2-mistral-7b-dpoopenrouter/anthropic/claude-3-haiku:betaopenrouter/undi95/toppy-m-7bopenrouter/huggingfaceh4/zephyr-7b-beta:freeopenrouter/xwin-lm/xwin-lm-70bopenrouter/01-ai/yi-6bopenrouter/undi95/toppy-m-7b:nitroopenrouter/openai/gpt-3.5-turbo-16kopenrouter/huggingfaceh4/zephyr-7b-betaopenrouter/rwkv/rwkv-5-world-3bopenrouter/cohere/command-r-plusopenrouter/anthropic/claude-2.0openrouter/google/gemini-pro-visionopenrouter/sao10k/fimbulvetr-11b-v2openrouter/sophosympatheia/midnight-rose-70bopenrouter/alpindale/goliath-120bopenrouter/perplexity/sonar-medium-onlineopenrouter/openchat/openchat-7bopenrouter/mistralai/mixtral-8x7bopenrouter/meta-llama/llama-3-8b-instructopenrouter/austism/chronos-hermes-13bopenrouter/mistralai/mistral-largeopenrouter/mistralai/mixtral-8x22bopenrouter/anthropic/claude-1.2openrouter/meta-llama/llama-2-13b-chatopenrouter/nousresearch/nous-capybara-34bopenrouter/openrouter/cinematika-7bopenrouter/perplexity/pplx-7b-onlineopenrouter/cohere/commandopenrouter/anthropic/claude-instant-1.1openrouter/teknium/openhermes-2-mistral-7bopenrouter/mistralai/mixtral-8x7b-instruct:nitroopenrouter/nousresearch/nous-hermes-2-mixtral-8x7b-sftopenrouter/perplexity/sonar-small-onlineopenrouter/cohere/command-ropenrouter/togethercomputer/stripedhyena-nous-7bopenrouter/google/gemma-7b-it:nitroopenrouter/nousresearch/nous-hermes-yi-34bopenrouter/nousresearch/nous-hermes-2-mixtral-8x7b-dpoopenrouter/recursal/rwkv-5-3b-ai-townopenrouter/meta-llama/llama-3-8b-instruct:nitroopenrouter/perplexity/pplx-70b-chatopenrouter/fireworks/firellava-13bopenrouter/jondurbin/airoboros-l2-70bopenrouter/google/gemma-7b-it:freeopenrouter/anthropic/claude-2.0:betaopenrouter/google/palm-2-codechat-bison-32kopenrouter/google/gemma-7b-itopenrouter/recursal/eagle-7bopenrouter/perplexity/sonar-small-chatopenrouter/google/palm-2-chat-bison-32kModels in this category produced executable code.
openrouter/mistralai/mistral-7b-instructModels in this category produced code that reached full statement coverage.
openrouter/anthropic/claude-2.1:betaopenrouter/openai/gpt-4-32kopenrouter/microsoft/wizardlm-2-8x22b:nitroopenrouter/anthropic/claude-3-sonnetopenrouter/openai/gpt-3.5-turbo-0125openrouter/lizpreciatior/lzlv-70b-fp16-hfopenrouter/databricks/dbrx-instructopenrouter/meta-llama/llama-2-70b-chat:nitroopenrouter/openai/gpt-4-1106-previewopenrouter/mistralai/mistral-mediumopenrouter/mistralai/mixtral-8x7b-instructopenrouter/phind/phind-codellama-34bopenrouter/perplexity/sonar-medium-chatopenrouter/anthropic/claude-3-opusopenrouter/anthropic/claude-1openrouter/anthropic/claude-2.1openrouter/mistralai/mistral-smallopenrouter/meta-llama/llama-3-70b-instruct:nitroopenrouter/anthropic/claude-3-opus:betaopenrouter/gryphe/mythomax-l2-13bopenrouter/microsoft/wizardlm-2-8x22bopenrouter/anthropic/claude-3-sonnet:betaopenrouter/anthropic/claude-3-haikuopenrouter/mistralai/mixtral-8x22b-instructopenrouter/meta-llama/llama-3-70b-instructopenrouter/meta-llama/llama-2-70b-chatopenrouter/huggingfaceh4/zephyr-orpo-141b-a35bModels in this category did not respond with more content than requested.
openrouter/openai/gpt-4-turboopenrouter/openai/gpt-3.5-turboopenrouter/google/gemini-pro-1.5symflower/symbolic-executionopenrouter/openai/gpt-4-turbo-previewopenrouter/openai/gpt-4-32k-0314openrouter/openai/gpt-4openrouter/openai/gpt-3.5-turbo-0301openrouter/openai/gpt-4-vision-previewopenrouter/openai/gpt-4-0314openrouter/anthropic/claude-instant-1.2openrouter/openrouter/auto