Back to Source Monitor

Phase 1: Backend Fixes — Research

.vbw-planning/milestones/polish-and-reliability/phases/01-backend-fixes/.context-lead.md

0.13.04.6 KB
Original Source

Phase 01 Context (Compiled)

Goal

Not available

Success Criteria

Not available

Requirements (Not available)

No matching requirements found

Active Decisions

None

Codebase Map Available

Codebase mapping exists in .vbw-planning/codebase/. Key files:

  • ARCHITECTURE.md
  • CONCERNS.md
  • PATTERNS.md
  • DEPENDENCIES.md
  • STRUCTURE.md
  • CONVENTIONS.md
  • TESTING.md
  • STACK.md

Read ARCHITECTURE.md, CONCERNS.md, and STRUCTURE.md first to bootstrap codebase understanding.

Research Findings

Phase 1: Backend Fixes — Research

Item 1: HTTP Client Hardening

Key Files

  • lib/source_monitor/http.rbDEFAULT_USER_AGENT, default_headers, client, configure_request
  • lib/source_monitor/configuration/http_settings.rbHTTPSettings with user_agent, headers accessors
  • lib/source_monitor/fetching/feed_fetcher.rbrequest_headers (per-source custom_headers merge)
  • test/lib/source_monitor/http_test.rb — header merge/override tests

Current Behavior

  • DEFAULT_USER_AGENT = "SourceMonitor/#{SourceMonitor::VERSION}" (line 17 of http.rb)
  • default_headers builds: User-Agent, Accept (RSS-only), Accept-Encoding (line 89-97)
  • configure_request merges: default_headers(settings).merge(headers) (line 60) — passed headers override defaults
  • FeedFetcher#request_headers starts from source.custom_headers, adds If-None-Match/If-Modified-Since
  • HTTPSettings#default_user_agent returns same "SourceMonitor/#{VERSION}" string
  • user_agent accessor supports callables (lambda/proc) via resolve_callable

Change Plan

  • Change HTTPSettings#default_user_agent to return polite bot string: "Mozilla/5.0 (compatible; SourceMonitor/#{VERSION})"
  • Add Accept-Language: en-US,en;q=0.9 and DNT: 1 to default_headers
  • Add Referer header in FeedFetcher#request_headers using source.website_url
  • Update default_headers Accept to prepend text/html: "text/html, application/rss+xml, ..."
  • Update tests asserting on DEFAULT_USER_AGENT and Accept values

Item 2: Health Check Status Transition

Key Files

  • app/jobs/source_monitor/source_health_check_job.rbperform, broadcast_outcome
  • lib/source_monitor/health/source_health_check.rbcall, create_log, successful_status?
  • lib/source_monitor/health/source_health_monitor.rbcall, determine_status, improving_streak?
  • lib/source_monitor/health.rbsetup!, register_fetch_callback (after_fetch_completed wiring)
  • app/jobs/source_monitor/fetch_feed_job.rbperform(source_id, force: true) to enqueue full fetch

Current Behavior

  • SourceHealthCheckJob#perform does HTTP GET, creates HealthCheckLog, broadcasts toast — never updates health_status
  • SourceHealthMonitor runs via after_fetch_completed callback — computes rolling success rate from fetch_logs
  • health_status values: healthy, warning, critical, improving, declining, auto_paused (free string, no enum)
  • improving_streak? requires 2+ consecutive successes with prior failure in window
  • FetchFeedJob.perform_later(source.id, force: true) enqueues immediate full fetch bypassing should_run?

Change Plan

  • After successful health check on degraded source, enqueue FetchFeedJob.perform_later(source.id, force: true)
  • Degraded = health_status in %w[declining critical warning]
  • Full fetch creates real fetch_log → triggers SourceHealthMonitor → natural status transition
  • Add check in SourceHealthCheckJob#perform after broadcast_outcome

Item 3: Scrape Rate Limiting

Key Files

  • lib/source_monitor/configuration/scraping_settings.rbDEFAULT_MAX_IN_FLIGHT = 25, reset!
  • lib/source_monitor/scraping/enqueuer.rbrate_limit_exhausted? (line 108-114)
  • lib/source_monitor/scraping/state.rbIN_FLIGHT_STATUSES = %w[pending processing]
  • lib/source_monitor/scraping/bulk_source_scraper.rb — loop with break on rate_limited (line 75-124)
  • lib/source_monitor/scraping/bulk_result_presenter.rb — message builder (line 42-75)
  • test/lib/source_monitor/scraping/bulk_source_scraper_test.rb — rate limit test

Current Behavior

  • DEFAULT_MAX_IN_FLIGHT = 25 — counts pending + processing items
  • When limit hit: enqueuer returns :rate_limited, bulk scraper breaks loop, presenter shows message
  • Setting to nil already disables the limit
  • normalize_numeric ensures only positive integers or nil

Change Plan

  • Change DEFAULT_MAX_IN_FLIGHT = nil (was 25)
  • Config option remains: users can set their own limit
  • Rate limit check, presenter, tests all already handle nil correctly
  • Update tests that depend on default being 25