docs/sglang_multiturn/sandbox_fusion.rst
Last updated: 06/10/2025.
sandbox-fusion as the code execution system, providing the community with a reimplementation of retools.sglang tool-calling protocol and define tools for sandbox fusion.async-rollout process, ensuring sandbox fusion tools follow asyncIO conventions.code field in the JSON from the model.language parameter is defined... code-block:: python
OpenAIFunctionToolSchema( type="function", function=OpenAIFunctionSchema( name="code_interpreter", description="A tool for executing code.", parameters=OpenAIFunctionParametersSchema( type="object", properties={ "code": OpenAIFunctionPropertySchema( type="string", description="The code to execute.", enum=None, ) }, required=["code"], ), strict=False, ) )
+----------------------------+--------------------------------------------------------------+
| Parameter Name | Description |
+============================+==============================================================+
| num_workers | Number of worker threads/processes per DP to request runner. |
+----------------------------+--------------------------------------------------------------+
| rate_limit | Global limit of concurrent code executions. Default: 10 |
+----------------------------+--------------------------------------------------------------+
| default_timeout | Timeout (in seconds) for each code execution. Default: 30 |
+----------------------------+--------------------------------------------------------------+
| default_language | Default programming language. Default: "python" |
+----------------------------+--------------------------------------------------------------+
| enable_global_rate_limit | Whether to enable global rate limiting. Default: True |
+----------------------------+--------------------------------------------------------------+
| sandbox_fusion_url | URL for the veFaas sandbox execution service |
+----------------------------+--------------------------------------------------------------+
Objective:
Limit the number of inflight requests using a token bucket model.
Ensure ordered submission to code runners to avoid starvation due to backoff.
Design Highlights:
Use Ray Global Actor as a singleton distributed counter at cluster level.
Semaphore used for counting, with acquire and release in separate thread pools to preserve order.
Use Ray’s cloud-pickle to serialize functions for decoupled ExecutionWorker.
.. code-block:: python
@ray.remote(concurrency_groups={"acquire": 1,"release": 10}) class TokenBucketWorker: def init(self, rate_limit: int): self.rate_limit = rate_limit self.current_count = 0 self._semaphore = threading.Semaphore(rate_limit)
@ray.method(concurrency_group="acquire")
def acquire(self):
self._semaphore.acquire()
self.current_count += 1
@ray.method(concurrency_group="release")
def release(self):
self._semaphore.release()
self.current_count -= 1
def get_current_count(self):
return self.current_count
class ExecutionWorker: def init(self, enable_global_rate_limit=True, rate_limit=10): self.rate_limit_worker = self._init_rate_limit(rate_limit) if enable_global_rate_limit else None
def _init_rate_limit(self, rate_limit):
return TokenBucketWorker.options(name="rate-limiter", get_if_exists=True).remote(rate_limit)
def execute(self, fn: Callable[..., T], *fn_args, **fn_kwargs) -> T:
with ExitStack() as stack:
stack.callback(self.rate_limit_worker.release.remote)
ray.get(self.rate_limit_worker.acquire.remote())
try:
return fn(*fn_args, **fn_kwargs)
except Exception as e:
logger.warning(f"Error when executing code: {e}")
def init_execution_pool(num_workers: int, enable_global_rate_limit=True, rate_limit=10, mode: PoolMode=PoolMode.ThreadMode): if mode == PoolMode.ThreadMode: return ray.remote(ExecutionWorker).options(max_concurrency=num_workers).remote( enable_global_rate_limit=enable_global_rate_limit, rate_limit=rate_limit ) else: raise NotImplementedError("Process mode is not implemented yet")
Use instance_id to identify requests across multiple dialogue rounds.
Use execution_pool to implement async invocation.
Cleanup state after rollout completion.
.. code-block:: python
class SandboxFusionTool(BaseTool): def init(self, config: dict, tool_schema: OpenAIFunctionToolSchema): ... self.execution_pool = init_execution_pool(...) ...
async def create(self, instance_id: Optional[str] = None, ...):
...
async def execute(self, instance_id: str, parameters: dict[str, Any], **kwargs) -> Tuple[str, float, dict]:
code = parameters.get("code", "")
timeout = parameters.get("timeout", self.default_timeout)
language = parameters.get("language", self.default_language)
if not isinstance(code, str):
code = str(code)
result = await self.execution_pool.execute.remote(self.execute_code,instance_id,code,timeout,language)
self._instance_dict[instance_id]["reward"].append(result.strip())
return result, result, {}
def execute_code(self,instance_id,code,timeout=30,language="python"):
result_status, metadata = _process_single_case(0, None, None,self.sandbox_fusion_url, code, timeout, language)
# we should always expect this since we don't have correct answer
if metadata["run_status"] == "Finished":
actual_output = metadata["stdout"] if metadata["stdout"] is not None else ""
return actual_output
else:
return "no stdout here"
async def calc_reward(self, instance_id: str, ...):
...
async def release(self, instance_id: str, ...):
...
AsyncRolloutReq is built correctly.max_seq_len.sglang output, validate tool call and result.we provide e2e test scripts in tests/special_e2e folder, named tests/special_e2e/run_gsm8k_fsdp_sgl_multiturn_sf_tool.sh
by setting 'trainer.rollout_data_dir' you can dump the rollout data to local disk. here is an sample taken from the rollout data:
.. code-block:: python
{ "input": "
system\nYou are a math expert. You are given a question and you need to solve it step by step. Reasoning step by step before any tool call. You should use the `calc_gsm8k_reward` tool after step by step solving the question, before generate final answer at least once and refine your answer if necessary. Put your final answer in the format of `#### <answer>`.\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{\"type\": \"function\", \"function\": {\"name\": \"code_interpreter\", \"description\": \"A tool for executing code.\", \"parameters\": {\"type\": \"object\", \"properties\": {\"code\": {\"type\": \"string\", \"description\": \"The code to execute.\", \"enum\": null}}, \"required\": [\"code\"]}, \"strict\": false}}\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>\n
user\nJohn gets a bonus that's the same percentage every year. Last year he made $100,000 and got a $10,000 bonus. This year he makes $200,000. How much will John make this year when adding both his total pay and bonus together? Let's think step by step and output the final answer after `####`.\n
assistant\n",
"output": "assistant\n<think>\nOkay, let's see. John gets a bonus that's the same percentage every year. Last year, he made $100,000 and got a $10,000 bonus. This year, he makes $200,000. The question is asking how much he'll make this year when adding both his total pay and bonus together.\n\nFirst, I need to figure out the percentage bonus he gets. Since the bonus is the same percentage each year, if I can find that percentage based on last year's details, I can apply it to this year's earnings to find this year's bonus. Then, I'll add that bonus to his total pay this year.\n\nLast year, he earned $100,000 and got a $10,000 bonus. To find the percentage, I can divide the bonus by his total earnings. So, 10,000 divided by 100,000. Let me calculate that.\n\n10,000 / 100,000 equals 0.1, which is 10% as a percentage. So, his bonus is 10% of his pay each year.\n\nThis year, he makes $200,000. So, the bonus this year would be 10% of 200,000. Let me compute that using the code interpreter to ensure accuracy.\n
<tool_call>\n{\"name\": \"code_interpreter\", \"arguments\": {\"code\": \"total_pay_this_year = 200000\\nbonus_percentage = 10 / 100\\nbonus_this_year = total_pay_this_year * bonus_percentage\\ntotal_income_this_year = total_pay_this_year + bonus_this_year\\nprint(total_income_this_year)\", \"executes\": \"True\"}}\n</tool_call>\n
tool\n220000.0\n\n
assistant\n<think>\nSo the code shows that John's total income this year would be $220,000. This is consistent with the 10% bonus on last year's pay of $100,000, which was $10,000. Therefore, this year's bonus of $20,000 added to his total pay of $200,000 results in a total income of $220,000.\n\nI need to verify that there are no other factors at play. The problem states it's a constant percentage bonus each year, so there's no mention of changing percentages or other variables. Therefore, applying a straightforward percentage of 10% on this year's partial salary amount would result in $20,000. Adding that to his total pay gives $220,000. The calculation is correct and aligns with the question's parameters.\n</think>\n\n#### 220000.0",
"score": 0,
"step": 1
}
here is the readable format version:
.. code-block:: python
[system]
You are a math expert. You are given a question and you need to solve it step by step. Reasoning step by step before any tool call. You should use the calc_gsm8k_reward tool after step by step solving the question, before generate final answer at least once and refine your answer if necessary. Put your final answer in the format of #### <answer>.
You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags: <tools> {"type": "function", "function": {"name": "code_interpreter", "description": "A tool for executing code.", "parameters": {"type": "object", "properties": {"code": {"type": "string", "description": "The code to execute.", "enum": null}}, "required": ["code"]}, "strict": false}} </tools>
For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags: <tool_call> {"name": <function-name>, "arguments": <args-json-object>} </tool_call>
[user]
John gets a bonus that's the same percentage every year. Last year he made $100,000 and got a $10,000 bonus. This year he makes $200,000. How much will John make this year when adding both his total pay and bonus together? Let's think step by step and output the final answer after ####.
[assistant]
<think> Okay, let's see. John gets a bonus that's the same percentage every year. Last year, he made $100,000 and got a $10,000 bonus. This year, he makes $200,000. The question is asking how much he'll make this year when adding both his total pay and bonus together.First, I need to figure out the percentage bonus he gets. Since the bonus is the same percentage each year, if I can find that percentage based on last year's details, I can apply it to this year's earnings to find this year's bonus. Then, I'll add that bonus to his total pay this year.
Last year, he earned $100,000 and got a $10,000 bonus. To find the percentage, I can divide the bonus by his total earnings. So, 10,000 divided by 100,000. Let me calculate that.
10,000 / 100,000 equals 0.1, which is 10% as a percentage. So, his bonus is 10% of his pay each year.
This year, he makes $200,000. So, the bonus this year would be 10% of 200,000. Let me compute that using the code interpreter to ensure accuracy. <tool_call> {"name": "code_interpreter", "arguments": {"code": "total_pay_this_year = 200000\nbonus_percentage = 10 / 100\nbonus_this_year = total_pay_this_year * bonus_percentage\ntotal_income_this_year = total_pay_this_year + bonus_this_year\nprint(total_income_this_year)", "executes": "True"}} </tool_call>
[tool]
220000.0
[assistant]
<think> So the code shows that John's total income this year would be $220,000. This is consistent with the 10% bonus on last year's pay of $100,000, which was $10,000. Therefore, this year's bonus of $20,000 added to his total pay of $200,000 results in a total income of $220,000.I need to verify that there are no other factors at play. The problem states it's a constant percentage bonus each year, so there's no mention of changing percentages or other variables. Therefore, applying a straightforward percentage of 10% on this year's partial salary amount would result in $20,000. Adding that to his total pay gives $220,000. The calculation is correct and aligns with the question's parameters. </think>
You can also use the RolloutViewer TUI tool to view the dumped rollout data:
.. code-block:: bash
python scripts/rollout_viewer.py ${trainer.rollout_data_dir}
.. image:: https://github.com/user-attachments/assets/e34e5157-2880-4a21-afb2-73885d0dfb11 :alt: RolloutViewer screenshot