docs/content/Tools/creating-a-tool.mdx
import { Callout } from 'nextra/components'; import { Steps } from 'nextra/components';
This guide provides developers with a comprehensive, step-by-step approach to creating their own custom tools for DocsGPT. By developing custom tools, you can significantly extend DocsGPT's capabilities, enabling it to interact with new data sources, services, and perform specialized actions tailored to your unique needs.
While DocsGPT offers a range of built-in tools and a versatile API Tool, there are many scenarios where a custom Python tool is the best solution:
Before you begin, ensure you have:
application/agents/tools/ directory where custom tools reside.Custom tools in DocsGPT are Python classes that inherit from a base Tool class and implement specific methods to define their behavior, capabilities, and configuration needs.
The foundation for all custom tools is the abstract base class, located in application/agents/tools/base.py. Your custom tool class must inherit from this class.
Your custom tool class needs to implement the following methods:
__init__(self, config: dict)
config dictionary. This dictionary is populated based on the tool's settings, often configured through the DocsGPT UI or environment variables. For example, you would store API keys, base URLs, or database connection strings here.brave.py):
class BraveSearchTool(Tool):
def __init__(self, config):
self.config = config
self.token = config.get("token", "") # API Key for Brave Search
self.base_url = "https://api.search.brave.com/res/v1"
execute_action(self, action_name: str, **kwargs) -> dict
Purpose: This is the workhorse of your tool. The LLM, acting as an agent, calls this method when it decides to use one of the actions your tool provides.
Parameters:
action_name (str): A string specifying which of the tool's actions to run (e.g., "brave_web_search").**kwargs (dict): A dictionary containing the parameters for that specific action. These parameters are defined in the tool's metadata (get_actions_metadata()) and are extracted or inferred by the LLM from the user's query.Return Value: A dictionary containing the result of the action. It's good practice to include keys like:
status_code (int): An HTTP-like status code (e.g., 200 for success, 500 for error).message (str): A human-readable message describing the outcome.data (any): The actual data payload returned by the action (if applicable).error (str): An error message if the action failed.Example (read_webpage.py):
def execute_action(self, action_name: str, **kwargs) -> str:
if action_name != "read_webpage":
return f"Error: Unknown action '{action_name}'. This tool only supports 'read_webpage'."
url = kwargs.get("url")
if not url:
return "Error: URL parameter is missing."
# ... (logic to fetch and parse webpage) ...
try:
# ...
return markdown_content
except Exception as e:
return f"Error processing URL {url}: {e}"
A more structured return:
# ... inside execute_action
try:
# ... logic ...
return {"status_code": 200, "message": "Webpage read successfully", "data": markdown_content}
except Exception as e:
return {"status_code": 500, "message": f"Error processing URL {url}", "error": str(e)}
get_actions_metadata(self) -> list
Purpose: This method is critical for the LLM to understand what your tool can do, when to use it, and what parameters it needs. It effectively advertises your tool's capabilities.
Return Value: A list of dictionaries. Each dictionary describes one distinct action the tool can perform and must follow a specific JSON schema structure.
name (str): A unique and descriptive name for the action (e.g., mytool_get_user_details). It's a common convention to prefix with the tool name to avoid collisions.description (str): A clear, concise, and unambiguous description of what the action does. Write this for the LLM. The LLM uses this description to decide if this action is appropriate for a given user query.parameters (dict): A JSON Schema object defining the parameters that the action expects. This schema tells the LLM what arguments are needed, their types, and which are required.
type: Should always be "object".properties: A dictionary where each key is a parameter name, and the value is an object defining its type (e.g., "string", "integer", "boolean") and description.required: A list of strings, where each string is the name of a parameter that is mandatory for the action.Example (postgres.py - partial):
def get_actions_metadata(self):
return [
{
"name": "postgres_execute_sql",
"description": "Execute an SQL query against the PostgreSQL database...",
"parameters": {
"type": "object",
"properties": {
"sql_query": {
"type": "string",
"description": "The SQL query to execute.",
},
},
"required": ["sql_query"],
"additionalProperties": False, # Good practice to prevent unexpected params
},
},
# ... other actions like postgres_get_schema
]
get_config_requirements(self) -> dict
Purpose: Defines the configuration parameters that your tool needs to function (e.g., API keys, specific base URLs, connection strings, default settings). This information can be used by the DocsGPT UI to dynamically render configuration fields for your tool or for validation.
Return Value: A dictionary where keys are the configuration item names (which will be keys in the config dict passed to __init__) and values are dictionaries describing each requirement:
type (str): The expected data type of the config value (e.g., "string", "boolean", "integer").description (str): A human-readable description of what this configuration item is for.secret (bool, optional): Set to True if the value is sensitive (e.g., an API key) and should be masked or handled specially in UIs. Defaults to False.Example (brave.py):
def get_config_requirements(self):
return {
"token": { # This 'token' will be a key in the config dict for __init__
"type": "string",
"description": "Brave Search API key for authentication",
"secret": True
},
}
DocsGPT's ToolManager (located in application/agents/tools/tool_manager.py) automatically discovers and loads tools.
As long as your custom tool:
application/agents/tools/ directory (and the filename is not base.py or starts with __).Tool base class.execute_action, get_actions_metadata, get_config_requirements).The ToolManager should be able to load it when DocsGPT starts.
config dictionary passed to your tool's __init__ method is typically populated from settings defined in the DocsGPT UI (if available for the tool) or from environment variables/configuration files that DocsGPT loads (see ⚙️ App Configuration). The keys in this dictionary should match the names you define in get_config_requirements().secret: True in get_config_requirements()) and let DocsGPT's configuration system inject them via the config dictionary at runtime. This ensures that secrets are managed securely and are not exposed in your codebase.get_actions_metadata() are extremely clear, specific, and unambiguous. This is the primary way the LLM understands your tool.execute_action logic (and the private methods it calls). Return informative error messages in the result dictionary so the LLM or user can understand what went wrong.If you develop a custom tool that you believe could be valuable to the broader DocsGPT community and is general-purpose:
By following this guide, you can create powerful custom tools that extend DocsGPT's capabilities to your specific operational environment.