examples/custom_provider_plugin/README.md
This example demonstrates how to create a custom provider plugin that extends LangExtract with your own model backend.
Note: This is an example included in the LangExtract repository for reference. It is not part of the LangExtract package and won't be installed when you pip install langextract.
Automated Creation: Instead of manually copying this example, use the provider plugin generator script:
python scripts/create_provider_plugin.py MyProvider --with-schema
This will create a complete plugin structure with all boilerplate code ready for customization.
custom_provider_plugin/
├── pyproject.toml # Package configuration and metadata
├── README.md # This file
├── langextract_provider_example/ # Package directory
│ ├── __init__.py # Package initialization
│ ├── provider.py # Custom provider implementation
│ └── schema.py # Custom schema implementation (optional)
└── test_example_provider.py # Test script
provider.py)from langextract.core import base_model
from langextract.providers import router
@router.register(
r'^gemini', # Pattern for model IDs this provider handles
)
class CustomGeminiProvider(base_model.BaseLanguageModel):
def __init__(self, model_id: str, **kwargs):
# Initialize your backend client
def infer(self, batch_prompts, **kwargs):
# Call your backend API and return results
pyproject.toml)[project.entry-points."langextract.providers"]
custom_gemini = "langextract_provider_example:CustomGeminiProvider"
This entry point allows LangExtract to automatically discover your provider.
schema.py)Providers can optionally implement custom schemas for structured output:
Flow: Examples → from_examples() → to_provider_config() → Provider kwargs → Inference
from langextract.core import schema as core_schema
class CustomProviderSchema(core_schema.BaseSchema):
@classmethod
def from_examples(cls, examples_data, attribute_suffix="_attributes"):
# Analyze examples to find patterns
# Build schema based on extraction classes and attributes seen
return cls(schema_dict)
def to_provider_config(self):
# Convert schema to provider kwargs
return {
"response_schema": self._schema_dict,
"enable_structured_output": True
}
@property
def requires_raw_output(self):
# True = provider emits raw JSON, no markdown fences needed
return True
Then in your provider:
class CustomProvider(base_model.BaseLanguageModel):
@classmethod
def get_schema_class(cls):
return CustomProviderSchema # Tell LangExtract about your schema
def __init__(self, **kwargs):
# Receive schema config in kwargs when use_schema_constraints=True
self.response_schema = kwargs.get('response_schema')
def infer(self, batch_prompts, **kwargs):
# Use schema during API calls
if self.response_schema:
config['response_schema'] = self.response_schema
# Navigate to this example directory first
cd examples/custom_provider_plugin
# Install in development mode
pip install -e .
# Test the provider (must be run from this directory)
python test_example_provider.py
Since this example registers the same pattern as the default Gemini provider, you must explicitly specify it:
import langextract as lx
# Option A: build a model explicitly and pass it to extract()
config = lx.factory.ModelConfig(
model_id="gemini-2.5-flash",
provider="CustomGeminiProvider",
provider_kwargs={"api_key": "your-api-key"},
)
model = lx.factory.create_model(config)
result = lx.extract(
text_or_documents="Your text here",
model=model,
prompt_description="Extract key information",
examples=[...],
)
# Option B: let extract() build the model from a ModelConfig
result = lx.extract(
text_or_documents="Your text here",
config=lx.factory.ModelConfig(
model_id="gemini-2.5-flash",
provider="CustomGeminiProvider",
provider_kwargs={"api_key": "your-api-key"},
),
prompt_description="Extract key information",
examples=[...],
)
# Copy this example directory
cp -r examples/custom_provider_plugin/ ~/langextract-myprovider/
# Rename the package directory
cd ~/langextract-myprovider/
mv langextract_provider_example langextract_myprovider
Edit pyproject.toml:
name = "langextract-myprovider"myprovider = "langextract_myprovider:MyProvider"Edit provider.py:
CustomGeminiProvider to MyProvider@router.register(...) patterns to match your model IDsEdit schema.py:
MyProviderSchemafrom_examples() for your extraction formatto_provider_config() for your API requirementsrequires_raw_output (abstract in BaseSchema) based on whether your provider emits raw JSON/YAML or fenced output# Install in development mode
pip install -e .
# Test your provider
python -c "
from langextract.providers import load_plugins_once, router
load_plugins_once()
print('Provider registered:', any('myprovider' in str(e) for e in router.list_entries()))
"
# Build package
python -m build
# Upload to PyPI
twine upload dist/*
Share with the community:
load_plugins_once() in testspyproject.tomlapply_schema(None) is called (see provider.py for implementation)Apache License 2.0