Back to Mem0

⚙️ Custom

embedchain/docs/components/data-sources/custom.mdx

2.0.11.5 KB
Original Source

When we say "custom", we mean that you can customize the loader and chunker to your needs. This is done by passing a custom loader and chunker to the add method.

python
from embedchain import App
import your_loader
from my_module import CustomLoader
from my_module import CustomChunker

app = App()
loader = CustomLoader()
chunker = CustomChunker()

app.add("source", data_type="custom", loader=loader, chunker=chunker)
<Note> The custom loader and chunker must be a class that inherits from the [`BaseLoader`](https://github.com/embedchain/embedchain/blob/main/embedchain/loaders/base_loader.py) and [`BaseChunker`](https://github.com/embedchain/embedchain/blob/main/embedchain/chunkers/base_chunker.py) classes respectively. </Note> <Note> If the `data_type` is not a valid data type, the `add` method will fallback to the `custom` data type and expect a custom loader and chunker to be passed by the user. </Note>

Example:

python
from embedchain import App
from embedchain.loaders.github import GithubLoader

app = App()

loader = GithubLoader(config={"token": "ghp_xxx"})

app.add("repo:embedchain/embedchain type:repo", data_type="github", loader=loader)

app.query("What is Embedchain?")
# Answer: Embedchain is a Data Platform for Large Language Models (LLMs). It allows users to seamlessly load, index, retrieve, and sync unstructured data in order to build dynamic, LLM-powered applications. There is also a JavaScript implementation called embedchain-js available on GitHub.