docs/usage/competitive_runs.md
!!! abstract "Running swe-agent competitively on benchmarks" This page contains information on our competitive runs on SWE-bench, as well as tips and tricks for evaluating on large batches.
* Please make sure you're familiar with [the command line basics](cl_tutorial.md) and the [batch mode](batch_mode.md)
* The default examples will be executing code in a Docker sandbox, so make sure you have docker installed ([docker troubleshooting](../installation/tips.md)).
!!! hint "Most recent configs" You can find all benchmark submission configs here
Examples of configurations for SWE-bench submissions:
claude-3-7-sonnet-20250219.claude-3-7-sonnet-latest,
then uses o1 to discriminate between them.
This is a very expensive configuration.
If you use it, also make sure to use Claude 3.7 instead of claude 3.5.!!! warning "Retry configurations and command line arguments"
Note that the structure of the configuration with agents that run multiple attempts is different from the one of the
default agent. In particular, supplying options like --agent.model.name etc. will cause (potentially confusing)
error messages. Take a look at the above configuration file to see the structure!
You can find the command with which to run each config at the top of the config file.
In order to run on multiple workers with Claude, you need to use multiple API keys in order to have enough cache break points. For this, please set the following environment variable before running
# concatenate your keys
export CLAUDE_API_KEY_ROTATION="KEY1:::KEY2:::KEY3"
See our notes on Claude for more details.
We run our configuration on a machine with 32GB memory and 8 cores. To avoid out-of-memory (OOM) situations, we recommend setting
--instances.deployment.docker_args=--memory=10g
limiting the maximum amount of memory per worker.
In our case, this completely avoided any instances of running OOM.
However, OOM situations can potentially lock you out of the server, so you might want to use a script like the following as a second layer defense to kill any process that hogs too much memory (note that this will affect any script and not just swe-agent):
<details> <summary>Memory sentinel</summary>--8<-- "docs/usage/memory_sentinel.py"
If swe-agent dies or you frequently abort it, you might have leftover docker containers (they are cleaned up by normal termination of swe-agent but can be left over if it is killed). You can use a sentinel script like the following to clean them up periodically (note that this will affect any long running container and not just those from swe-agent):
<details> <summary>Container sentinel</summary>--8<-- "docs/usage/containers_sentinel.sh"