.agents/skills/adk-style/references/testing.md
# Good — describes what the caller observes
def test_empty_queue_returns_none():
def test_retry_stops_after_max_attempts():
def test_missing_key_raises_key_error():
# Bad — describes implementation details
def test_deque_popleft_called():
def test_retry_counter_incremented():
def test_dict_getitem_raises():
The first line describes the expected behavior from the caller's perspective. For complex tests (multi-step, multi-invocation), follow with a structured breakdown of Setup, Act, and Assert.
# Good — simple test, one-liner is enough
"""Getting from an empty cache returns the default value."""
# Good — complex test with structured breakdown
"""Partial FR re-runs nested Workflow, resolved child completes
while unresolved stays interrupted.
Setup: outer_wf → inner_wf → (child_a, child_b) → join.
Both children interrupt on first run.
Act:
- Run 2: resolve only child_a's FR.
- Run 3: resolve child_b's FR.
Assert:
- Run 2: child_a produces output, invocation still interrupted.
- Run 3: child_b produces output, join completes, no interrupts.
"""
# Bad — restates the implementation
"""LRUCache._store.get returns sentinel when key missing."""
"""ThreadPool._accept_tasks flag checked in submit()."""
If a test checks multiple unrelated behaviors, split it. If you can't describe the test in one sentence, it's testing too much.
# Bad — tests capacity AND eviction AND default in one test
def test_cache_behavior():
assert cache.size == 0
assert cache.get('x') is None
cache.put('a', 1)
assert cache.size == 1
# Good — split into focused tests
def test_new_cache_is_empty():
"""A freshly created cache has no entries."""
def test_cache_evicts_oldest_when_full():
"""Adding to a full cache removes the least recently used entry."""
# Bad — reaches into private attributes
assert pool._workers[0].is_alive
assert parser._state == 'HEADER'
assert isinstance(router._handler, _FastHandler)
# Good — tests through the public interface
assert pool.active_count == 1
assert parser.parse('data') == expected
assert router.route('/api') == handler
ADK tests should use real implementations as much as possible instead of mocking.
Define the simplest possible setup that triggers the behavior:
# Good — minimal fixture, one purpose
def make_user(role='viewer'):
return User(name='test', email='[email protected]', role=role)
# Bad — kitchen-sink fixture with unrelated setup
def make_full_test_env():
db = create_database()
user = create_user_with_billing()
setup_notifications()
...
When a helper class or fixture is used by only one test, define it inline inside the test function. This keeps the setup visible at the point of use and avoids scrolling to distant module-level definitions. Extract to module level only when 3+ tests share the same helper.
# Good — helper defined inline, right next to the test
@pytest.mark.asyncio
async def test_state_delta_bundled_with_output():
"""State set before yield is flushed onto the output event."""
class _Node(BaseNode):
async def _run_impl(self, *, ctx, node_input):
ctx.state['color'] = 'blue'
yield 'result'
ctx, events = _make_ctx()
await NodeRunner(node=_Node(name='n'), parent_ctx=ctx).run()
assert events[0].output == 'result'
assert events[0].actions.state_delta['color'] == 'blue'
# Bad — helper defined 300 lines above, reader must scroll
class _StateThenOutputNode(BaseNode):
async def _run_impl(self, *, ctx, node_input):
ctx.state['color'] = 'blue'
yield 'result'
# ... 300 lines later ...
async def test_state_delta_bundled_with_output():
node = _StateThenOutputNode(name='n')
...
# Good — reads like a specification
assert queue.size == 0
assert config.get('timeout') == 30
assert response.status_code == 404
# Bad — overly defensive, tests framework behavior
assert isinstance(queue, Queue)
assert hasattr(config, 'get')
assert len(response.headers) > 0
Every test has three distinct steps:
Keep steps distinct. Separate with blank lines. In simple tests where each step is a single statement, blank lines can be omitted. In complex tests, use descriptive comments like "Given [situation]", "When [action]", "Then [expectation]" — avoid bare labels that add no information.
# Good — clear visual separation
def test_cache_returns_stored_value():
cache = Cache()
cache.put('key', 'value')
result = cache.get('key')
assert result == 'value'
# Good — simple test, blank lines omitted
def test_new_cache_is_empty():
assert Cache().size == 0
# Bad — steps interleaved
def test_cache_behavior():
cache = Cache()
cache.put('key', 'value')
result = cache.get('key')
assert result == 'value'
cache.put('key2', 'value2') # more setup after assert
assert cache.size == 2
"""Tests for <ComponentName>.
Verifies that <component> correctly <high-level behavior>.
"""
# --- Fixtures (minimal, one purpose each) ---
def _make_service():
...
# --- Tests (one behavior per test) ---
def test_<behavior_description>():
"""<One sentence: what the system does from the outside.>"""
# Given a service with default config
service = _make_service()
input_data = 'hello'
# When the operation is performed
result = service.do_something(input_data)
# Then the result matches expectations
assert result == expected