Back to Opik

Python SDK Testing Patterns

.agents/skills/python-sdk/testing.md

2.0.22-6605-merge-20655.3 KB
Original Source

Python SDK Testing Patterns

Test Naming Convention

python
# Pattern: test_WHAT__CASE_DESCRIPTION__EXPECTED_RESULT
def test_tracked_function__error_inside_inner_function__caught_in_top_level_span():
    pass

# Happy path: test_WHAT__happyflow
def test_optimization_lifecycle__happyflow():
    pass

Using fake_backend (Integration Tests)

For testing integrations that create traces/spans:

python
from tests.testlib import TraceModel, SpanModel, ANY_BUT_NONE, assert_equal
from opik.decorator import tracker

def test_track__one_nested_function__happyflow(fake_backend):
    @tracker.track
    def f_inner(x):
        return "inner-output"

    @tracker.track
    def f_outer(x):
        f_inner("inner-input")
        return "outer-output"

    f_outer("outer-input")
    tracker.flush_tracker()

    EXPECTED_TRACE_TREE = TraceModel(
        id=ANY_BUT_NONE,
        name="f_outer",
        input={"x": "outer-input"},
        output={"output": "outer-output"},
        start_time=ANY_BUT_NONE,
        end_time=ANY_BUT_NONE,
        spans=[
            SpanModel(
                id=ANY_BUT_NONE,
                name="f_outer",
                input={"x": "outer-input"},
                output={"output": "outer-output"},
                spans=[
                    SpanModel(
                        id=ANY_BUT_NONE,
                        name="f_inner",
                        input={"x": "inner-input"},
                        output={"output": "inner-output"},
                        spans=[],
                    )
                ],
            )
        ],
    )

    assert len(fake_backend.trace_trees) == 1
    assert_equal(EXPECTED_TRACE_TREE, fake_backend.trace_trees[0])

Using Verifiers (E2E Tests)

For actual API call tests:

python
from tests.e2e import verifiers

def test_trace_creation__e2e__happyflow(opik_client: opik.Opik):
    trace = opik_client.trace(
        name="test-trace",
        input={"query": "test"},
        output={"result": "success"}
    )

    verifiers.verify_trace(
        opik_client=opik_client,
        trace_id=trace.id,
        name="test-trace",
        input={"query": "test"},
        output={"result": "success"},
    )

Testlib Utilities

python
from tests.testlib import ANY_BUT_NONE, ANY_STRING, assert_equal

# ANY_BUT_NONE - matches any value that is not None
# ANY_STRING - matches any string
# assert_equal - deep comparison with ANY_* support

Parameterized Tests

python
@pytest.mark.parametrize(
    "text,expected_sentiment",
    [
        ("I love this product!", "positive"),
        ("This is terrible.", "negative"),
        ("The sky is blue.", "neutral"),
    ],
)
def test_sentiment_classification(text, expected_sentiment):
    metric = Sentiment()
    result = metric.score(text)
    assert expected_sentiment in result.reason

Key Rules

  • Always test public API only
  • Use fake_backend for integration tests
  • Use verifiers for E2E tests
  • Study existing similar tests before adding new ones

Running E2E Tests Locally

Pick the backend setup that matches what you're doing. Both need a few env vars the CI workflow sets implicitly.

bash
# backend in docker, matches GitHub Actions:
TOGGLE_RUNNERS_ENABLED=true ./opik.sh --backend

# then run the suite:
cd sdks/python
OPIK_URL_OVERRIDE=http://localhost:5173/api/ \
  venv/bin/pytest tests/e2e/ \
    --ignore=tests/e2e/test_guardrails.py \
    -vv --durations=20

Option B — dev-runner (iterating on backend code)

The native Java backend inherits your shell env, so you must export MinIO credentials and the runners flag before starting it — otherwise attachment and runner tests fail on environmental grounds, not real regressions.

bash
export AWS_ACCESS_KEY_ID=THAAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=LESlrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
export TOGGLE_RUNNERS_ENABLED=true
./scripts/dev-runner.sh --restart

cd sdks/python
OPIK_URL_OVERRIDE=http://localhost:8080/ \
  venv/bin/pytest tests/e2e/ \
    --ignore=tests/e2e/test_guardrails.py \
    -vv --durations=20

The AWS_* values are MinIO's root-user/root-password (from deployment/docker-compose/docker-compose.yaml). They are the canonical "EXAMPLE" strings — no real AWS account is involved; the Java backend's S3 client reuses the standard AWS env-var names when pointed at MinIO via S3_URL.

Why the gotchas

  • TOGGLE_RUNNERS_ENABLED defaults to false in docker-compose. Without it, 8 tests in tests/e2e/runner/ error out on setup because the runners feature is disabled in the backend.
  • --ignore=tests/e2e/test_guardrails.py — the guardrails Python service isn't part of the default compose stack. CI ignores these explicitly (python_sdk_e2e_tests.yml:90).
  • MinIO creds — only relevant in Option B (dev-runner). Option A's backend container already has them baked in.

Debugging a failing e2e test

bash
# Capture SDK DEBUG logs to a file:
OPIK_FILE_LOGGING_LEVEL=DEBUG OPIK_LOGGING_FILE=/tmp/opik-sdk.log \
  venv/bin/pytest tests/e2e/test_tracing.py::test_name -vv

# And the backend-side error is in:
docker logs opik-backend-1   # Option A
tail -f /tmp/opik-opik-backend.log   # Option B (dev-runner)