qwencoder-eval/tool_calling_eval/berkeley-function-call-leaderboard/LOG_GUIDE.md
An inference log is included along with the llm response in the results file to help you analyze and debug the model's performance, and to better understand the model behavior. To enable a more detailed log, use the
--include-input-logflag in the generation command.
The log is structured as a list, representing a conversational interaction between the model, system, and user. There are five types of roles in the log:
user: Represents the user's input or query.
assistant: Represents the model's raw response.
tool: Represents the output of a function execution, if the model makes a valid function call. Each function call results in a separate tool entry.
state_info: Represents the state of the backend API system at the end of each turn. The initial state is also included at the beginning of the log. You can exclude this entry by using the --exclude-state-log flag in the generation command.
inference_input: Snapshot of the fully-transformed input just before it's sent to the model API endpoint. Useful for debugging input integrity and format.
--include-input-log flag is set in the generation command.handler_log: Represents internal logs from the inference pipeline. These entries indicate various stages and events within the pipeline, including:
model_response_decoded field. Following this, any function calls are executed, and the current turn continues.model_response_decoded field. The pipeline then proceeds to the next turn.For single-turn categories, the only log entry available is the inference input (under handler_log role), because there is no interaction with the model or system.
For multi-turn categories, we understand the provided ground truth may seem nonsensical without context. We have provided a utility script to simulate a conversation between the ground truth and the system:
cd berkeley-function-call-leaderboard/bfcl_eval/scripts
python visualize_multi_turn_ground_truth_conversation.py
The generated conversation logs will be saved in berkeley-function-call-leaderboard/bfcl_eval/scripts/ground_truth_conversation.