docs/development/create-a-component.md
A fundamental concept in kotaemon is "component".
Anything that isn't data or data structure is a "component". A component can be thought of as a step within a pipeline. It takes in some input, processes it, and returns an output, just the same as a Python function! The output will then become an input for the next component in a pipeline. In fact, a pipeline is just a component. More appropriately, a nested component: a component that makes use of one or more other components in the processing step. So in reality, there isn't a difference between a pipeline and a component! Because of that, in kotaemon, we will consider them the same as "component".
To define a component, you will:
kotaemon.base.BaseComponentrun.The syntax of a component is as follow:
from kotaemon.base import BaseComponent
from kotaemon.llms import LCAzureChatOpenAI
from kotaemon.parsers import RegexExtractor
class FancyPipeline(BaseComponent):
param1: str = "This is param1"
param2: int = 10
param3: float
node1: BaseComponent # this is a node because of BaseComponent type annotation
node2: LCAzureChatOpenAI # this is also a node because LCAzureChatOpenAI subclasses BaseComponent
node3: RegexExtractor # this is also a node bceause RegexExtractor subclasses BaseComponent
def run(self, some_text: str):
prompt = (self.param1 + some_text) * int(self.param2 + self.param3)
llm_pred = self.node2(prompt).text
matches = self.node3(llm_pred)
return matches
Then this component can be used as follow:
llm = LCAzureChatOpenAI(endpoint="some-endpont")
extractor = RegexExtractor(pattern=["yes", "Yes"])
component = FancyPipeline(
param1="Hello"
param3=1.5
node1=llm,
node2=llm,
node3=extractor
)
component("goodbye")
This way, we can define each operation as a reusable component, and use them to compose larger reusable components!
By defining a component as above, we formally encapsulate all the necessary information inside a single class. This introduces several benefits: