`e2e/` — live end-to-end test harness

True end-to-end coverage for the Slack bridge: send real user messages in a real Slack workspace, sample the bot's reply while it's streaming, take screenshots in the middle of long streams, and verify what landed.

Why this exists. Unit tests (under src/__tests__/) lock in the internal contracts of each module — they don't catch issues that only surface end-to-end: an open code fence leaking through the rest of the Slack message during streaming, a mrkdwn translation that looks right in tests but renders weird in Slack's actual client, a Block Kit limit we forgot about, a Bolt event that doesn't fire under some setting.

The catalog at e2e/cases.ts is the source of truth for what "feature-complete" means.

What's in here

e2e/
├── README.md         this
├── cases.ts          catalog of test cases (technical axes; expand liberally)
├── slack-api.ts      Slack Web API helpers (history, thread replies, sampling)
├── run.ts            harness entrypoint — sends prompts, samples, screenshots
└── results/          per-run output: screenshots + JSON report

Running

bash

# from packages/slack/

# one-time: log into Slack once in the playwright browser profile.
# Subsequent runs reuse that profile.
pnpm exec playwright open --browser=chromium --user-data-dir=./e2e/.chrome-profile \
  https://app.slack.com/client/T05QFA4BW9X/C0B49MEJ1HQ

# then:
pnpm e2e

The runner expects .env to already contain SLACK_BOT_TOKEN (used for polling the channel history while the bot streams). Sending the user message happens through the playwright-driven Slack UI using Atai's session cookies from the persistent profile.

How sampling works

For each case the harness:

Sends the prompt via the Slack UI (or /agent slash command).
Polls conversations.replies (or .history for DMs / flat replies) every sampleIntervalMs until maxWaitMs elapses.
At each sample, records:
- elapsed time
- bot's reply text snapshot
- bracket-balance check (isBalanced(text))
At each screenshots[i] offset (ms after send), takes a screenshot of the Slack thread pane via playwright.
After the run, writes results/<timestamp>/report.json and the screenshots.

What this catches that unit tests don't

Open code fences leaking through the rest of the Slack message
Slack's chat.update rate limits creating visible "jumps"
mrkdwn rendering differences vs. our translator's expectations
The bot's actual streaming cadence with the model
Thread vs DM rendering differences
Real concurrency from multiple users in the channel
Mid-stream cancellation / kill behaviour

Adding cases

Edit cases.ts. The bar is low — anything you'd want to see working in Slack belongs in the catalog. Don't be afraid of duplication with unit tests; the unit test proves the code is internally correct, the E2E proves Slack actually renders it that way.

Limitations (current)

Sending the user message still relies on UI automation (no user token), so a one-time signin in the persistent profile is required.
A future enhancement: a long-lived user OAuth token would let us skip the browser entirely for the send step (screenshots still need the browser, but sampling already uses pure API).

`e2e/` — live end-to-end test harness

e2e/ — live end-to-end test harness

What's in here

Running

How sampling works

What this catches that unit tests don't

Adding cases

Limitations (current)

`e2e/` — live end-to-end test harness