Custom Provider Plugin Example

This example demonstrates how to create a custom provider plugin that extends LangExtract with your own model backend.

Note: This is an example included in the LangExtract repository for reference. It is not part of the LangExtract package and won't be installed when you pip install langextract.

Automated Creation: Instead of manually copying this example, use the provider plugin generator script:

bash

python scripts/create_provider_plugin.py MyProvider --with-schema

This will create a complete plugin structure with all boilerplate code ready for customization.

Structure

custom_provider_plugin/
├── pyproject.toml                      # Package configuration and metadata
├── README.md                            # This file
├── langextract_provider_example/        # Package directory
│   ├── __init__.py                     # Package initialization
│   ├── provider.py                     # Custom provider implementation
│   └── schema.py                       # Custom schema implementation (optional)
└── test_example_provider.py            # Test script

Key Components

Provider Implementation (`provider.py`)

python

from langextract.core import base_model
from langextract.providers import router

@router.register(
    r'^gemini',  # Pattern for model IDs this provider handles
)
class CustomGeminiProvider(base_model.BaseLanguageModel):
    def __init__(self, model_id: str, **kwargs):
        # Initialize your backend client

    def infer(self, batch_prompts, **kwargs):
        # Call your backend API and return results

Package Configuration (`pyproject.toml`)

toml

[project.entry-points."langextract.providers"]
custom_gemini = "langextract_provider_example:CustomGeminiProvider"

This entry point allows LangExtract to automatically discover your provider.

Custom Schema Support (`schema.py`)

Providers can optionally implement custom schemas for structured output:

Flow: Examples → from_examples() → to_provider_config() → Provider kwargs → Inference

python

from langextract.core import schema as core_schema

class CustomProviderSchema(core_schema.BaseSchema):
    @classmethod
    def from_examples(cls, examples_data, attribute_suffix="_attributes"):
        # Analyze examples to find patterns
        # Build schema based on extraction classes and attributes seen
        return cls(schema_dict)

    def to_provider_config(self):
        # Convert schema to provider kwargs
        return {
            "response_schema": self._schema_dict,
            "enable_structured_output": True
        }

    @property
    def requires_raw_output(self):
        # True = provider emits raw JSON, no markdown fences needed
        return True

Then in your provider:

python

class CustomProvider(base_model.BaseLanguageModel):
    @classmethod
    def get_schema_class(cls):
        return CustomProviderSchema  # Tell LangExtract about your schema

    def __init__(self, **kwargs):
        # Receive schema config in kwargs when use_schema_constraints=True
        self.response_schema = kwargs.get('response_schema')

    def infer(self, batch_prompts, **kwargs):
        # Use schema during API calls
        if self.response_schema:
            config['response_schema'] = self.response_schema

Installation

bash

# Navigate to this example directory first
cd examples/custom_provider_plugin

# Install in development mode
pip install -e .

# Test the provider (must be run from this directory)
python test_example_provider.py

Usage

Since this example registers the same pattern as the default Gemini provider, you must explicitly specify it:

python

import langextract as lx

# Option A: build a model explicitly and pass it to extract()
config = lx.factory.ModelConfig(
    model_id="gemini-2.5-flash",
    provider="CustomGeminiProvider",
    provider_kwargs={"api_key": "your-api-key"},
)
model = lx.factory.create_model(config)

result = lx.extract(
    text_or_documents="Your text here",
    model=model,
    prompt_description="Extract key information",
    examples=[...],
)

# Option B: let extract() build the model from a ModelConfig
result = lx.extract(
    text_or_documents="Your text here",
    config=lx.factory.ModelConfig(
        model_id="gemini-2.5-flash",
        provider="CustomGeminiProvider",
        provider_kwargs={"api_key": "your-api-key"},
    ),
    prompt_description="Extract key information",
    examples=[...],
)

Creating Your Own Provider - Step by Step

1. Copy and Rename

bash

# Copy this example directory
cp -r examples/custom_provider_plugin/ ~/langextract-myprovider/

# Rename the package directory
cd ~/langextract-myprovider/
mv langextract_provider_example langextract_myprovider

2. Update Package Configuration

Edit pyproject.toml:

Change name = "langextract-myprovider"
Update description and author information
Change entry point: myprovider = "langextract_myprovider:MyProvider"

3. Modify Provider Implementation

Edit provider.py:

Change class name from CustomGeminiProvider to MyProvider
Update @router.register(...) patterns to match your model IDs
Replace Gemini API calls with your backend
Add any provider-specific parameters

4. Add Schema Support (Optional)

Edit schema.py:

Rename to MyProviderSchema
Customize from_examples() for your extraction format
Update to_provider_config() for your API requirements
Implement requires_raw_output (abstract in BaseSchema) based on whether your provider emits raw JSON/YAML or fenced output

5. Install and Test

bash

# Install in development mode
pip install -e .

# Test your provider
python -c "
from langextract.providers import load_plugins_once, router
load_plugins_once()
print('Provider registered:', any('myprovider' in str(e) for e in router.list_entries()))
"

6. Write Tests

Test that your provider loads and handles basic inference
Verify schema support works (if implemented)
Test error handling for your specific API

bash

# Build package
python -m build

# Upload to PyPI
twine upload dist/*

Share with the community:

Submit a PR to add your provider to the Community Providers Registry
Open an issue on LangExtract GitHub to announce your provider and get feedback

Common Pitfalls to Avoid

Forgetting to trigger plugin loading - Plugins load lazily, use load_plugins_once() in tests
Pattern conflicts - Avoid patterns that conflict with built-in providers
Missing dependencies - List all requirements in pyproject.toml
Schema mismatches - Test schema generation with real examples
Not handling None schema - Provider must clear schema when apply_schema(None) is called (see provider.py for implementation)

License

Apache License 2.0

Custom Provider Plugin Example

Custom Provider Plugin Example

Structure

Key Components

Provider Implementation (provider.py)

Package Configuration (pyproject.toml)

Custom Schema Support (schema.py)

Installation

Usage

Creating Your Own Provider - Step by Step

1. Copy and Rename

2. Update Package Configuration

3. Modify Provider Implementation

4. Add Schema Support (Optional)

5. Install and Test

6. Write Tests

7. Publish to PyPI and Share with Community

Common Pitfalls to Avoid

License

Provider Implementation (`provider.py`)

Package Configuration (`pyproject.toml`)

Custom Schema Support (`schema.py`)