.agents/skills/write-tests/SKILL.md
Tests are not production code. They are documentation — each one is a tiny executable spec that says "this system does X." A reader should grasp the intent in seconds. A failure should point to exactly one broken behavior, not leave you going through a 40-line test body.
Follow these steps in order before writing any test code.
Decide the framework. Testing a function's return value, side effects, or module interactions
→ Vitest (lives under packages/<name>/test/). Testing that a real HTTP request to a running app
produces the correct Sentry envelope → Playwright (lives under
dev-packages/e2e-tests/test-applications/<app>/tests/).
Read 2–3 existing test files in the target test/ directory. Specifically note:
vi.mock style they use (string path or import form)beforeEach (clearAllMocks vs restoreAllMocks)../../src/... vs @sentry/...)describe/it nesting depth and naming stylepatchRoute also need patchAppUse to work correctly?)Match what you find. Consistency within a package matters more than idealized best practice.
Check for existing test utilities before writing helpers from scratch:
packages/core/test/mocks/ — TestClient, getDefaultTestClientOptions, fake transportspackages/core/test/testutils.ts — clearGlobalScope(), version gatingdev-packages/test-utils/ — waitForTransaction, waitForError, waitForSession,
getPlaywrightConfig, mock Sentry server, event proxydev-packages/node-integration-tests/utils/ — createEsmAndCjsTests, assertion helpersIdentify the behaviors that matter most — edge cases, error paths, boundary conditions. Don't aim for quantity; aim for the tests that would catch real regressions.
The goal is not to maximize test count. A large suite of shallow happy-path tests gives a false sense of coverage — they pass on every change, including changes that introduce bugs. A smaller suite that targets edge cases, error paths, and boundary conditions catches far more regressions.
Before writing a test, ask: "If this test didn't exist, what bug could ship?" If you can't answer that concretely, the test probably isn't worth writing. Prioritize:
null inputnext())Don't waste tests on: trivial getters/setters, pure delegation to well-tested libraries, TypeScript type constraints (the compiler already checks those), or re-testing the same behavior that a higher-level test already covers.
Structure every test with the AAA pattern, separated by blank lines. The whitespace makes the phases obvious — no labels or comments needed.
it('skips errors already captured by middleware', () => {
const error = new Error('already captured');
Object.defineProperty(error, '__sentry_captured__', { value: true });
responseHandler(createMockContext(500, error));
expect(mockCaptureException).not.toHaveBeenCalled();
});
Each test makes exactly one behavioral claim. Multiple expect calls are fine when they assert on
different facets of the same outcome. But if you're checking two unrelated behaviors, those are
two tests. No conditional logic, no branching, no try/catch — a test is a straight line.
If someone refactored the internals but the function still returned the correct result, would this test break? If yes, you're testing wiring, not behavior.
// Bad: asserts nothing meaningful
it('handles the request', async () => {
expect(() => handler(mockReq)).not.toThrow();
});
// Good: asserts on the observable outcome
it('sets transaction name from route path', () => {
responseHandler(createMockContext(200));
expect(mockSetTransactionName).toHaveBeenCalledWith('GET /test');
});
Default to exact matching. toMatchObject, expect.objectContaining, and expect.arrayContaining
silently ignore fields that matter. This has caused real bugs to ship in this codebase.
Use toEqual by default. The same applies to toHaveBeenCalledWith — spell out every
argument rather than wrapping in objectContaining. This is the single most common place where
loose assertions creep in:
// Bad: silently ignores any missing or extra properties in the call
expect(startSpan).toHaveBeenCalledWith(expect.objectContaining({ name: 'middleware', op: 'middleware.hono' }));
// Good: exact match on the full argument — if the shape changes, the test catches it
expect(startSpan).toHaveBeenCalledWith({
name: 'middleware',
op: 'middleware.hono',
onlyIfParent: true,
parentSpan: fakeRootSpan,
attributes: { 'sentry.op': 'middleware.hono', 'sentry.origin': 'auto.middleware.hono' },
});
The only valid reasons to use toMatchObject or objectContaining are: (1) the object is
generated by a framework or third-party library and contains fields you don't control (timestamps,
random IDs, internal framework state), or (2) the object has 10+ fields and the test only
cares about 2–3 of them (in which case individual .toBe() checks on those fields are still
preferred). If you wrote the object being asserted, you can spell it out — use toEqual.
When you do fall back, prefer individual .toBe() checks over objectContaining:
expect(event.transaction).toBe('GET /users/:id');
expect(event.contexts?.trace?.op).toBe('http.server');
Every toContain / toContainEqual needs a toHaveLength companion. Without it, the
assertion passes even if the array has unexpected extra items:
// Bad: doesn't notice extra unexpected spans
expect(spanNames).toContain('authMiddleware');
// Good: locks down both content and count
expect(spanNames).toHaveLength(1);
expect(spanNames).toContain('authMiddleware');
Use exported constants, not magic numbers. If the code under test uses named constants like
SPAN_STATUS_OK, reference those same constants in assertions. If the constant's value ever
changes, tests using magic numbers silently pass with wrong expectations.
Names should be concise, descriptive, and read as correct English. Lead with the verb.
| Quality | Example |
|---|---|
| Good | 'captures error when context.error is set' |
| Good | 'does not re-capture errors already captured by wrapMiddlewareWithSpan' |
| Good | 'returns empty array when no items match' |
| Bad | 'should correctly return the formatted price string when given a valid positive number' |
| Bad | 'test error handling' / 'works correctly' |
Drop "should" — it adds words without adding meaning.
// Weak
const url = 'http://test';
// Strong — exercises URL parsing, path handling, query strings
const url = 'https://api.example.com/users/42?include=profile&format=json';
If the valid range is 1–100, test -2, -1, 0, 1, 2, 99, 100, 101, Number.POSITIVE_INFINITY. Bugs cluster at boundaries — off-by-one errors, inclusive/exclusive confusion, type coercion.
'', [], {}, undefined, null0, false, '', NaN — these trip up loose truthiness checksEach edge case gets its own test with a descriptive name.
*.test.ts, mirroring the source path: src/shared/patchRoute.ts →
test/shared/patchRoute.test.ts.../../src/...). But when importing from a different package
(e.g., @sentry/core in a @sentry/node test), use the package name — that's a real dependency, not the code under test./** @vitest-environment jsdom */ at top of file.Prefer spies and stubs over full module mocks. A spy observes behavior without replacing the system under test. A full mock replaces it — and now you're testing your mock, not your code.
const warnSpy = vi.spyOn(SentryCore.debug, 'warn');
sentry(app);
expect(warnSpy).toHaveBeenCalledWith(expect.stringContaining('not initialized'));
When you need vi.mock: If the package's existing tests use string paths
(vi.mock('../../src/utils')), match that style. If you're creating the first test file for a
package, prefer the import form for type safety:
vi.mock(import('../../src/utils'), async importOriginal => {
const actual = await importOriginal();
return { ...actual, helperFn: vi.fn() };
});
Always restore mocks. This repo does not set restoreMocks: true globally — you are
responsible for cleanup. Leaked mocks cause mysterious failures in unrelated tests. Use whatever
cleanup the existing tests in your package use. If creating the first test file, use:
beforeEach(() => {
vi.restoreAllMocks();
});
Use the library's built-in matchers. Never use try/catch in tests.
expect(() => parseConfig(null)).toThrow('config is required');
await expect(asyncOp()).rejects.toThrow(TimeoutError);
For async callbacks where you need to verify an assertion actually ran, use expect.assertions(n).
Use it.each or it.for for data-driven cases — not raw for loops. it.each integrates with
the test runner, gives better output, and each case runs independently.
it.each([
{ input: 0, expected: 'zero' },
{ input: 1, expected: 'one' },
{ input: -1, expected: 'negative' },
])('classifies $input as $expected', ({ input, expected }) => {
expect(classify(input)).toBe(expected);
});
Tests must never depend on execution order or share mutable state. For this codebase, many tests need to reset global Sentry state:
beforeEach(() => {
clearGlobalScope();
getCurrentScope().clear();
getIsolationScope().clear();
});
1-2 levels of describe is usually enough. Deeper nesting makes tests harder to find and read.
describe('patchRoute', () => {
describe('sub-app middleware wrapping', () => {
it('wraps .use() middleware handlers', async () => { ... });
it('does not wrap sole route handlers', async () => { ... });
});
});
Write E2E tests when you need to verify that the SDK correctly instruments a real application.
Unit tests can't catch integration bugs between the SDK and a framework's request lifecycle.
Also use the /e2e skill for running E2E tests.
dev-packages/e2e-tests/test-applications/<app-name>/tests/*.test.ts.APP_NAME) go in tests/constants.ts.playwright.config.ts using getPlaywrightConfig from
@sentry-internal/test-utils.Set up a promise for the expected Sentry event, trigger the action, then await and assert.
test('captures transaction for GET /users/:id', async ({ baseURL }) => {
const transactionPromise = waitForTransaction(APP_NAME, event => {
return event.contexts?.trace?.op === 'http.server' && event.transaction === 'GET /users/:id';
});
const response = await fetch(`${baseURL}/users/42`);
expect(response.status).toBe(200);
const transaction = await transactionPromise;
expect(transaction.contexts?.trace?.op).toBe('http.server');
expect(transaction.transaction).toBe('GET /users/:id');
});
The predicate must be specific enough to match only your event. A vague predicate can match an unrelated event from a parallel test, causing flaky passes or hangs.
Prefer asserting on the exact span count alongside individual field checks:
const spans = transaction.spans || [];
expect(spans).toHaveLength(2);
const middlewareSpan = spans.find(s => s.description === 'middlewareA');
expect(middlewareSpan?.op).toBe('middleware.hono');
expect(middlewareSpan?.origin).toBe('auto.middleware.hono');
expect(middlewareSpan?.status).toBe('ok');
Check both the exception value and the mechanism. The mechanism tells you how the error was captured — that's the SDK's actual responsibility:
const errorEvent = await errorPromise;
expect(errorEvent.exception?.values?.[0]?.value).toBe('connection refused');
const mechanism = errorEvent.exception?.values?.[0]?.mechanism;
expect(mechanism?.handled).toBe(false);
expect(mechanism?.type).toBe('auto.http.hono.context_error');
For Playwright tests (unlike Vitest), for...of loops are the established codebase convention.
Use for...of (not .forEach()) so Playwright's test registration works correctly:
for (const { name, prefix } of SCENARIOS) {
test.describe(name, () => {
test('captures named middleware span', async ({ baseURL }) => {
// ...
});
});
}
APP_NAME must match proxyServerName in start-event-proxy.mjs.await: The waitFor* helpers return a promise. Without await, the test passes
vacuously and the assertion never runs.Before you're done, verify each test against these criteria:
beforeEach)"foo", "test", 123)toThrow / rejects.toThrow onlytoEqual by default; toHaveBeenCalledWith spells out full argumentstoContain, toContainEqual) paired with toHaveLengthSPAN_STATUS_OK) instead of magic numbersvitest run <file> or single Playwright test)