docs/POSTHOG_BROKEN_LINKS.md
This document describes how to set up dashboards and alerts in PostHog to monitor broken links (404 errors) on the CopilotKit documentation site.
The 404 page (app/not-found.tsx) tracks the broken_link_accessed event with the following properties:
broken_url (string): The pathname that resulted in a 404 (e.g., /langgraph/quickstart)broken_url_full (string): Full URL including query params (e.g., https://docs.copilotkit.ai/langgraph/quickstart?theme=dark)query_params (string|null): Query string if presentreferrer_url (string): Full URL where the user came from (or "(direct)" if none)referrer_domain (string|null): Domain/hostname of the referrer (e.g., www.copilotkit.ai, partner.com)referrer_path (string|null): Path of the referrer page (e.g., /features/generative-ui)is_internal_referrer (boolean): Whether the referrer is from copilotkit.ai or localhostuser_agent (string): Browser user agent stringis_likely_bot (boolean): Whether the request appears to be from a bot/crawlertimestamp (string): ISO timestamp of the eventviewport_width (number): Browser viewport widthviewport_height (number): Browser viewport heightPurpose: High-level view of 404 errors and trends
Insights to add:
Total 404s Over Time
broken_link_accessedis_likely_bot = falseTop Broken URLs
broken_link_accessedis_likely_bot = falsebroken_urlInternal vs External Referrers
broken_link_accessedis_likely_bot = falseis_internal_referrerBot vs Human Traffic
broken_link_accessedis_likely_botPurpose: Find and fix broken links on our own pages
Insights to add:
Pages with Broken Links
broken_link_accessedis_internal_referrer = trueis_likely_bot = falsereferrer_pathBroken Link Pairs
broken_link_accessedis_internal_referrer = trueis_likely_bot = falsereferrer_path and broken_urlRecent Internal 404s
broken_link_accessedis_internal_referrer = trueis_likely_bot = falsePurpose: Identify broken links from partner sites, blogs, and external sources
Insights to add:
Top External Referrer Domains
broken_link_accessedis_internal_referrer = falseis_likely_bot = falsereferrer_domainPartner Broken Link Pairs
broken_link_accessedis_internal_referrer = falseis_likely_bot = falsereferrer_domain, referrer_path, and broken_urlAll Referrer Sources
broken_link_accessedis_likely_bot = falsereferrer_domainPurpose: Get notified when there's an unusual increase in 404 errors
Configuration:
is_likely_bot = false (exclude bot traffic)When it fires: A sudden spike often indicates:
Purpose: Detect when a specific URL starts getting many 404s
Configuration:
is_likely_bot = falseWhen it fires: Indicates a specific URL is being heavily accessed but doesn't exist
Purpose: Catch broken links on our own documentation pages
Configuration:
is_internal_referrer = trueis_likely_bot = falseWhen it fires: One of our docs pages has a broken link that needs fixing
SELECT
broken_url,
COUNT(*) as hits
FROM events
WHERE
event = 'broken_link_accessed'
AND properties.referrer_path = '/langgraph/quickstart'
AND properties.is_likely_bot = false
AND timestamp > now() - INTERVAL 30 DAY
GROUP BY broken_url
ORDER BY hits DESC
SELECT
properties.broken_url,
COUNT(*) as hits,
COUNT(DISTINCT properties.referrer_url) as unique_referrers
FROM events
WHERE
event = 'broken_link_accessed'
AND properties.is_likely_bot = false
AND timestamp > now() - INTERVAL 7 DAY
GROUP BY properties.broken_url
ORDER BY hits DESC
LIMIT 20
SELECT
properties.broken_url_full,
COUNT(*) as hits
FROM events
WHERE
event = 'broken_link_accessed'
AND properties.query_params IS NOT NULL
AND properties.is_likely_bot = false
AND timestamp > now() - INTERVAL 7 DAY
GROUP BY properties.broken_url_full
ORDER BY hits DESC
SELECT
properties.referrer_domain,
properties.referrer_path,
properties.broken_url,
COUNT(*) as hits
FROM events
WHERE
event = 'broken_link_accessed'
AND properties.is_internal_referrer = false
AND properties.is_likely_bot = false
AND timestamp > now() - INTERVAL 30 DAY
GROUP BY
properties.referrer_domain,
properties.referrer_path,
properties.broken_url
ORDER BY hits DESC
LIMIT 20
SELECT
properties.referrer_path,
properties.broken_url,
COUNT(*) as hits
FROM events
WHERE
event = 'broken_link_accessed'
AND properties.referrer_domain = 'partner-site.com'
AND properties.is_likely_bot = false
AND timestamp > now() - INTERVAL 90 DAY
GROUP BY
properties.referrer_path,
properties.broken_url
ORDER BY hits DESC
Weekly Review:
After Deployments:
Monthly Cleanup:
npm run check-linksPartner Outreach (as needed):
The tracking automatically filters out common bots using user agent patterns:
/bot/i/crawl/i/spider/i/slurp/i/mediapartners/i/googlebot/i/bingbot/i/facebookexternalhit/i/twitterbot/iUse the is_likely_bot = false filter in dashboards to focus on human traffic.
The tracking does NOT collect:
The data collected (URLs, referrers, user agents) is standard web analytics information used solely for improving documentation quality.