embedchain/docs/components/data-sources/custom.mdx
When we say "custom", we mean that you can customize the loader and chunker to your needs. This is done by passing a custom loader and chunker to the add method.
from embedchain import App
import your_loader
from my_module import CustomLoader
from my_module import CustomChunker
app = App()
loader = CustomLoader()
chunker = CustomChunker()
app.add("source", data_type="custom", loader=loader, chunker=chunker)
Example:
from embedchain import App
from embedchain.loaders.github import GithubLoader
app = App()
loader = GithubLoader(config={"token": "ghp_xxx"})
app.add("repo:embedchain/embedchain type:repo", data_type="github", loader=loader)
app.query("What is Embedchain?")
# Answer: Embedchain is a Data Platform for Large Language Models (LLMs). It allows users to seamlessly load, index, retrieve, and sync unstructured data in order to build dynamic, LLM-powered applications. There is also a JavaScript implementation called embedchain-js available on GitHub.