aider/website/docs/leaderboards/edit.md
{: .note :} This old aider code editing leaderboard has been replaced by the new, much more challenging polyglot leaderboard.
Aider's code editing benchmark asks the LLM to edit python source files to complete 133 small coding exercises from Exercism. This measures the LLM's coding ability, and whether it can write new code that integrates into existing code. The model also has to successfully apply all its changes to the source file without human intervention.
<input type="text" id="editSearchInput" placeholder="Search..." style="width: 100%; max-width: 800px; margin: 10px auto; padding: 8px; display: block; border: 1px solid #ddd; border-radius: 4px;"> <table style="width: 100%; max-width: 800px; margin: auto; border-collapse: collapse; box-shadow: 0 2px 4px rgba(0,0,0,0.1); font-size: 14px;"> <thead style="background-color: #f2f2f2;"> <tr> <th style="padding: 8px; text-align: left;">Model</th> <th style="padding: 8px; text-align: center;">Percent completed correctly</th> <th style="padding: 8px; text-align: center;">Percent using correct edit format</th> <th style="padding: 8px; text-align: left;">Command</th> <th style="padding: 8px; text-align: center;">Edit format</th> </tr> </thead> <tbody> {% assign edit_sorted = site.data.edit_leaderboard | sort: 'pass_rate_2' | reverse %} {% for row in edit_sorted %} <tr style="border-bottom: 1px solid #ddd;"> <td style="padding: 8px;">{{ row.model }}</td> <td style="padding: 8px; text-align: center;">{{ row.pass_rate_2 }}%</td> <td style="padding: 8px; text-align: center;">{{ row.percent_cases_well_formed }}%</td> <td style="padding: 8px;"><code>{{ row.command }}</code></td> <td style="padding: 8px; text-align: center;">{{ row.edit_format }}</td> </tr> {% endfor %} </tbody> </table><canvas id="editChart" width="800" height="450" style="margin-top: 20px"></canvas>
<script src="https://unpkg.com/patternomaly/dist/patternomaly.js"></script> <script src="https://cdn.jsdelivr.net/npm/chart.js"></script> <script> {% assign data_source = edit_sorted %} {% assign pass_rate_field = "pass_rate_2" %} {% include leaderboard.js %} </script> <style> tr.selected { color: #0056b3; } table { table-layout: fixed; } td, th { word-wrap: break-word; overflow-wrap: break-word; } td:nth-child(3), td:nth-child(4) { font-size: 12px; } </style>The key benchmarking results are:
Aider uses different "edit formats" to collect code edits from different LLMs. The "whole" format is the easiest for an LLM to use, but it uses a lot of tokens and may limit how large a file can be edited. Models which can use one of the diff formats are much more efficient, using far fewer tokens. Models that use a diff-like format are able to edit larger files with less cost and without hitting token limits.
Aider is configured to use the best edit format for the popular OpenAI and Anthropic models and the other models recommended on the LLM page. For lesser known models aider will default to using the "whole" editing format since it is the easiest format for an LLM to use.
Contributions of benchmark results are welcome! See the benchmark README for information on running aider's code editing benchmarks. Submit results by opening a PR with edits to the benchmark results data files.
<p class="post-date"> By Paul Gauthier, last updated <!--[[[cog import subprocess import datetimefiles = [ 'aider/website/docs/leaderboards/edit.md', 'aider/website/_data/edit_leaderboard.yml', ]
def get_last_modified_date(file): result = subprocess.run(['git', 'log', '-1', '--format=%ct', file], capture_output=True, text=True) if result.returncode == 0: timestamp = int(result.stdout.strip()) return datetime.datetime.fromtimestamp(timestamp) return datetime.datetime.min
mod_dates = [get_last_modified_date(file) for file in files] latest_mod_date = max(mod_dates) cog.out(f"{latest_mod_date.strftime('%B %d, %Y.')}") ]]]--> April 12, 2025.
<!--[[[end]]]--> </p>