Back to Langextract

Custom Provider Plugin Example

examples/custom_provider_plugin/README.md

1.3.06.9 KB
Original Source

Custom Provider Plugin Example

This example demonstrates how to create a custom provider plugin that extends LangExtract with your own model backend.

Note: This is an example included in the LangExtract repository for reference. It is not part of the LangExtract package and won't be installed when you pip install langextract.

Automated Creation: Instead of manually copying this example, use the provider plugin generator script:

bash
python scripts/create_provider_plugin.py MyProvider --with-schema

This will create a complete plugin structure with all boilerplate code ready for customization.

Structure

custom_provider_plugin/
├── pyproject.toml                      # Package configuration and metadata
├── README.md                            # This file
├── langextract_provider_example/        # Package directory
│   ├── __init__.py                     # Package initialization
│   ├── provider.py                     # Custom provider implementation
│   └── schema.py                       # Custom schema implementation (optional)
└── test_example_provider.py            # Test script

Key Components

Provider Implementation (provider.py)

python
from langextract.core import base_model
from langextract.providers import router

@router.register(
    r'^gemini',  # Pattern for model IDs this provider handles
)
class CustomGeminiProvider(base_model.BaseLanguageModel):
    def __init__(self, model_id: str, **kwargs):
        # Initialize your backend client

    def infer(self, batch_prompts, **kwargs):
        # Call your backend API and return results

Package Configuration (pyproject.toml)

toml
[project.entry-points."langextract.providers"]
custom_gemini = "langextract_provider_example:CustomGeminiProvider"

This entry point allows LangExtract to automatically discover your provider.

Custom Schema Support (schema.py)

Providers can optionally implement custom schemas for structured output:

Flow: Examples → from_examples()to_provider_config() → Provider kwargs → Inference

python
from langextract.core import schema as core_schema

class CustomProviderSchema(core_schema.BaseSchema):
    @classmethod
    def from_examples(cls, examples_data, attribute_suffix="_attributes"):
        # Analyze examples to find patterns
        # Build schema based on extraction classes and attributes seen
        return cls(schema_dict)

    def to_provider_config(self):
        # Convert schema to provider kwargs
        return {
            "response_schema": self._schema_dict,
            "enable_structured_output": True
        }

    @property
    def requires_raw_output(self):
        # True = provider emits raw JSON, no markdown fences needed
        return True

Then in your provider:

python
class CustomProvider(base_model.BaseLanguageModel):
    @classmethod
    def get_schema_class(cls):
        return CustomProviderSchema  # Tell LangExtract about your schema

    def __init__(self, **kwargs):
        # Receive schema config in kwargs when use_schema_constraints=True
        self.response_schema = kwargs.get('response_schema')

    def infer(self, batch_prompts, **kwargs):
        # Use schema during API calls
        if self.response_schema:
            config['response_schema'] = self.response_schema

Installation

bash
# Navigate to this example directory first
cd examples/custom_provider_plugin

# Install in development mode
pip install -e .

# Test the provider (must be run from this directory)
python test_example_provider.py

Usage

Since this example registers the same pattern as the default Gemini provider, you must explicitly specify it:

python
import langextract as lx

# Option A: build a model explicitly and pass it to extract()
config = lx.factory.ModelConfig(
    model_id="gemini-2.5-flash",
    provider="CustomGeminiProvider",
    provider_kwargs={"api_key": "your-api-key"},
)
model = lx.factory.create_model(config)

result = lx.extract(
    text_or_documents="Your text here",
    model=model,
    prompt_description="Extract key information",
    examples=[...],
)

# Option B: let extract() build the model from a ModelConfig
result = lx.extract(
    text_or_documents="Your text here",
    config=lx.factory.ModelConfig(
        model_id="gemini-2.5-flash",
        provider="CustomGeminiProvider",
        provider_kwargs={"api_key": "your-api-key"},
    ),
    prompt_description="Extract key information",
    examples=[...],
)

Creating Your Own Provider - Step by Step

1. Copy and Rename

bash
# Copy this example directory
cp -r examples/custom_provider_plugin/ ~/langextract-myprovider/

# Rename the package directory
cd ~/langextract-myprovider/
mv langextract_provider_example langextract_myprovider

2. Update Package Configuration

Edit pyproject.toml:

  • Change name = "langextract-myprovider"
  • Update description and author information
  • Change entry point: myprovider = "langextract_myprovider:MyProvider"

3. Modify Provider Implementation

Edit provider.py:

  • Change class name from CustomGeminiProvider to MyProvider
  • Update @router.register(...) patterns to match your model IDs
  • Replace Gemini API calls with your backend
  • Add any provider-specific parameters

4. Add Schema Support (Optional)

Edit schema.py:

  • Rename to MyProviderSchema
  • Customize from_examples() for your extraction format
  • Update to_provider_config() for your API requirements
  • Implement requires_raw_output (abstract in BaseSchema) based on whether your provider emits raw JSON/YAML or fenced output

5. Install and Test

bash
# Install in development mode
pip install -e .

# Test your provider
python -c "
from langextract.providers import load_plugins_once, router
load_plugins_once()
print('Provider registered:', any('myprovider' in str(e) for e in router.list_entries()))
"

6. Write Tests

  • Test that your provider loads and handles basic inference
  • Verify schema support works (if implemented)
  • Test error handling for your specific API

7. Publish to PyPI and Share with Community

bash
# Build package
python -m build

# Upload to PyPI
twine upload dist/*

Share with the community:

Common Pitfalls to Avoid

  1. Forgetting to trigger plugin loading - Plugins load lazily, use load_plugins_once() in tests
  2. Pattern conflicts - Avoid patterns that conflict with built-in providers
  3. Missing dependencies - List all requirements in pyproject.toml
  4. Schema mismatches - Test schema generation with real examples
  5. Not handling None schema - Provider must clear schema when apply_schema(None) is called (see provider.py for implementation)

License

Apache License 2.0