docs/en/tools/file-document/docxsearchtool.mdx
DOCXSearchToolThe DOCXSearchTool is a RAG tool designed for semantic searching within DOCX documents.
It enables users to effectively search and extract relevant information from DOCX files using query-based searches.
This tool is invaluable for data analysis, information management, and research tasks,
streamlining the process of finding specific information within large document collections.
Install the crewai_tools package by running the following command in your terminal:
uv pip install docx2txt 'crewai[tools]'
The following example demonstrates initializing the DOCXSearchTool to search within any DOCX file's content or with a specific DOCX file path.
from crewai_tools import DOCXSearchTool
# Initialize the tool to search within any DOCX file's content
tool = DOCXSearchTool()
# OR
# Initialize the tool with a specific DOCX file,
# so the agent can only search the content of the specified DOCX file
tool = DOCXSearchTool(docx='path/to/your/document.docx')
The following parameters can be used to customize the DOCXSearchTool's behavior:
| Argument | Type | Description |
|---|---|---|
| docx | string | Optional. An argument that specifies the path to the DOCX file you want to search. If not provided during initialization, the tool allows for later specification of any DOCX file's content path for searching. |
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
from chromadb.config import Settings
tool = DOCXSearchTool(
config={
"embedding_model": {
"provider": "openai",
"config": {
"model": "text-embedding-3-small",
# "api_key": "sk-...",
},
},
"vectordb": {
"provider": "chromadb", # or "qdrant"
"config": {
# "settings": Settings(persist_directory="/content/chroma", allow_reset=True, is_persistent=True),
# from qdrant_client.models import VectorParams, Distance
# "vectors_config": VectorParams(size=384, distance=Distance.COSINE),
}
},
}
)