Back to Source Monitor

.Context Dev

.vbw-planning/milestones/polish-and-reliability/phases/06-fetch-throughput-defaults/.context-dev.md

0.13.018.1 KB
Original Source

Phase 06 Context

Goal

Not available

Skills Reference

Codebase Map Available

Codebase mapping exists in .vbw-planning/codebase/. Key files:

  • ARCHITECTURE.md
  • CONCERNS.md
  • PATTERNS.md
  • DEPENDENCIES.md
  • STRUCTURE.md
  • CONVENTIONS.md
  • TESTING.md
  • STACK.md

Read CONVENTIONS.md, PATTERNS.md, STRUCTURE.md, and DEPENDENCIES.md first to bootstrap codebase understanding.

Changed Files (Delta)

  • .vbw-planning/config.json
  • .vbw-planning/discovery.json
  • .vbw-planning/ROADMAP.md
  • .vbw-planning/STATE.md
  • CLAUDE.md

Code Slices

.vbw-planning/config.json (46 lines)

{
  "effort": "thorough",
  "autonomy": "standard",
  "auto_commit": true,
  "planning_tracking": "manual",
  "auto_push": "never",
  "verification_tier": "standard",
  "skill_suggestions": true,
  "auto_install_skills": false,
  "discovery_questions": true,
  "context_compiler": true,
  "visual_format": "unicode",
  "max_tasks_per_plan": 5,
  "prefer_teams": "always",
  "branch_per_milestone": false,
  "plain_summary": true,
  "active_profile": "default",
  "custom_profiles": {},
  "model_profile": "quality",
  "model_overrides": {},
  "agent_max_turns": {
    "scout": 15,
    "qa": 25,
    "architect": 30,
    "debugger": 80,
    "lead": 50,
    "dev": 75
  },
  "qa_skip_agents": [
    "docs"
  ],
  "worktree_isolation": "on",
  "token_budgets": false,
  "two_phase_completion": false,
  "metrics": false,
  "smart_routing": false,
  "validation_gates": false,
  "snapshot_resume": false,
  "lease_locks": false,
  "event_recovery": false,
  "monorepo_routing": false,
  "rolling_summary": false,
  "require_phase_discussion": false,
  "auto_uat": false,
  "compaction_trigger": 130000
}

.vbw-planning/discovery.json (128 lines, first 30 shown)

{
  "answered": [
    {
      "question": "What matters most in the conventions cleanup?",
      "answer": "All of the above: Model conventions, Controller patterns, Dead code removal",
      "category": "scope",
      "phase": "4",
      "date": "2026-02-10"
    },
    {
      "question": "How should we handle convention violations that would change public API behavior?",
      "answer": "Fix everything -- rename/restructure even if it changes method signatures or route patterns",
      "category": "api-policy",
      "phase": "4",
      "date": "2026-02-10"
    },
    {
      "question": "Favicon discovery strategy?",
      "answer": "Multi-strategy cascade: /favicon.ico -> HTML parsing (full GET, Nokogiri, prefer largest) -> Google Favicon API. Skip DuckDuckGo.",
      "area": "favicon-discovery",
      "phase": "02",
      "date": "2026-02-20"
    },
    {
      "question": "How to handle downloaded favicons before storage?",
      "answer": "Store raw original via Active Storage, define two variants: 32x32 (standard) and 64x64 (retina). SVGs stored as-is AND rasterized to PNG.",
      "area": "image-processing",
      "phase": "02",
      "date": "2026-02-20"
    },

.vbw-planning/ROADMAP.md (153 lines, first 30 shown)

# Roadmap

## Milestone: polish-and-reliability

### Phases

1. [x] **Backend Fixes** -- Fix browser User-Agent default, health check status transitions, and smarter scrape rate limiting
2. [x] **Favicon Support** -- Automatically save source favicons via Active Storage with background fetch job
3. [x] **Toast Stacking** -- Cap visible toast notifications with click-to-expand for bulk operation UX
4. [x] **Bug Fixes & Polish** -- Fix OPML import warning, toast positioning, dashboard alignment, source deletion, and published column
5. [x] **Source Enhancements** -- Add pagination/filtering for sources, per-source scrape rate limiting, and word count metrics
6. [ ] **Fetch Throughput & Small Server Defaults** -- Fix fetch pipeline error handling, add scheduling jitter/stagger, and optimize defaults for 1-CPU/2GB deployments

### Phase Details

#### Phase 1: Backend Fixes

**Goal:** Fix three independent backend issues: bot-blocked feeds due to User-Agent, health check not updating status, and overly aggressive scrape limiting.

**Requirements:**
- REQ-UA-01: Change default User-Agent from "SourceMonitor/VERSION" to a browser-like string
- REQ-HC-01: After a successful manual health check on a declining/critical/warning source, trigger SourceHealthMonitor re-evaluation or directly transition status to "improving"
- REQ-SL-01: Refine max_in_flight_per_source to only count actively-running scrape jobs (not queued ones)

**Success Criteria:**
- [ ] Default UA string resembles a real browser (e.g., Mozilla/5.0 compatible)
- [ ] Successful manual health check on a declining source transitions it to improving
- [ ] Scrape limit counts only actively-running jobs, queued items don't count toward the cap
- [ ] All existing tests pass, new tests cover changed behavior
- [ ] RuboCop zero offenses, Brakeman zero warnings

.vbw-planning/STATE.md (32 lines)

# State

## Current Position

- **Milestone:** polish-and-reliability
- **Phase:** 6 -- Fetch Throughput & Small Server Defaults
- **Status:** Planned
- **Progress:** 83%
- **Plans:** 4 (0/4 complete)

## Decisions

| Decision | Date | Context |
|----------|------|---------|
| Active Storage for favicons | 2026-02-20 | has_one_attached with guard, consistent with ItemContent pattern |
| Smarter scrape limit | 2026-02-20 | Count only running jobs, not queued; keeps safety but removes false bottleneck |
| Browser-like default UA | 2026-02-20 | Simple global fix for bot-blocked feeds like Uber |
| Health check triggers status update | 2026-02-20 | Successful manual health check should transition declining -> improving |
| Toast cap + hover expand | 2026-02-20 | Max 3 visible, +N more badge, hover to see all |

## Todos

- [x] Fix deprecation: `rails/tasks/statistics.rake` removed from Rakefile (2026-02-21)

## Metrics

- **Started:** 2026-02-20
- **Phases:** 6
- **Tests at start:** 1033

## Blockers
None

CLAUDE.md (245 lines, first 30 shown)

# SourceMonitor

**Core value:** Drop-in Rails engine for feed monitoring, content scraping, and operational dashboards.

## Active Context

**Milestone:** polish-and-reliability (extended)
**Phase:** 6 of 6 -- Fetch Throughput & Small Server Defaults (pending planning)
**Previous phases:** Backend Fixes, Favicon Support, Toast Stacking, Bug Fixes & Polish, Source Enhancements (all complete)
**Next action:** /vbw:vibe to plan and execute Phase 6

## Key Decisions

- Keep PostgreSQL-only for now
- Keep host-app auth model
- Ruby autoload for lib/ modules (not Zeitwerk)
- PG parallel fork segfault resolved: switched to thread-based parallelism in aia-ssl-fix milestone

## Installed Skills

- agent-browser (global)
- flowdeck (global)
- ralph-tui-create-json (global)
- ralph-tui-prd (global)
- vastai (global)
- find-skills (global)

## Learned Patterns

- Sub-module extraction: create `module/submodule.rb` with `require_relative`, lazy accessors, forwarding methods for backward compat

Active Plan


phase: 6 plan: 4 title: Queue Separation -- Maintenance Queue wave: 1 depends_on: [] must_haves:

  • "Configuration has maintenance_queue_name attr_accessor defaulting to 'source_monitor_maintenance'"
  • "Configuration has maintenance_queue_concurrency attr_accessor defaulting to 1"
  • "queue_name_for(:maintenance) returns the configured maintenance queue name with prefix"
  • "concurrency_for(:maintenance) returns the configured maintenance queue concurrency"
  • "FetchFeedJob and ScheduleFetchesJob remain on :fetch queue"
  • "ScrapeItemJob remains on :scrape queue"
  • "SourceHealthCheckJob, ImportSessionHealthCheckJob, ImportOpmlJob, LogCleanupJob, ItemCleanupJob, FaviconFetchJob, DownloadContentImagesJob all use :maintenance queue"
  • "example solid_queue.yml includes source_monitor_maintenance queue"
  • "all existing job tests pass, new tests verify queue assignments"
  • "RuboCop zero offenses on changed files" skills_used:
  • sm-job
  • sm-configuration-setting

Objective

Add a third "maintenance" queue for non-fetch jobs so the fetch queue is dedicated to FetchFeedJob + ScheduleFetchesJob only. This prevents slow maintenance operations (cleanup, favicon, images, health check, import) from competing for fetch queue slots. REQ-FT-09, REQ-FT-10.

Context

  • @ lib/source_monitor/configuration.rb -- queue_name_for (line 60-79) and concurrency_for (line 81-90) currently support :fetch and :scrape roles only
  • @ app/jobs/source_monitor/application_job.rb -- source_monitor_queue helper delegates to SourceMonitor.queue_name(role)
  • @ app/jobs/source_monitor/fetch_feed_job.rb -- source_monitor_queue :fetch (stays)
  • @ app/jobs/source_monitor/schedule_fetches_job.rb -- source_monitor_queue :fetch (stays)
  • @ app/jobs/source_monitor/scrape_item_job.rb -- source_monitor_queue :scrape (stays)
  • @ app/jobs/source_monitor/source_health_check_job.rb -- source_monitor_queue :fetch (change to :maintenance)
  • @ app/jobs/source_monitor/import_session_health_check_job.rb -- source_monitor_queue :fetch (change to :maintenance)
  • @ app/jobs/source_monitor/import_opml_job.rb -- source_monitor_queue :fetch (change to :maintenance)
  • @ app/jobs/source_monitor/log_cleanup_job.rb -- source_monitor_queue :fetch (change to :maintenance)
  • @ app/jobs/source_monitor/item_cleanup_job.rb -- source_monitor_queue :fetch (change to :maintenance)
  • @ app/jobs/source_monitor/favicon_fetch_job.rb -- source_monitor_queue :fetch (change to :maintenance)
  • @ app/jobs/source_monitor/download_content_images_job.rb -- source_monitor_queue :fetch (change to :maintenance)
  • @ examples/advanced_host/files/config/solid_queue.yml -- needs maintenance queue entry
  • @ test/lib/source_monitor/configuration_test.rb -- existing configuration tests

Tasks

06-04-T1: Add maintenance queue configuration

Files: lib/source_monitor/configuration.rb

Add maintenance_queue_name to attr_accessor (default: "#{DEFAULT_QUEUE_NAMESPACE}_maintenance"). Add maintenance_queue_concurrency to attr_accessor (default: 1 -- conservative for small servers). Extend queue_name_for to handle :maintenance role. Extend concurrency_for to handle :maintenance role.

Acceptance: SourceMonitor.config.maintenance_queue_name returns "source_monitor_maintenance". SourceMonitor.config.queue_name_for(:maintenance) returns the name with any ActiveJob prefix. SourceMonitor.config.concurrency_for(:maintenance) returns 1.

06-04-T2: Move non-fetch jobs to maintenance queue

Files: app/jobs/source_monitor/source_health_check_job.rb, app/jobs/source_monitor/import_session_health_check_job.rb, app/jobs/source_monitor/import_opml_job.rb, app/jobs/source_monitor/log_cleanup_job.rb, app/jobs/source_monitor/item_cleanup_job.rb, app/jobs/source_monitor/favicon_fetch_job.rb, app/jobs/source_monitor/download_content_images_job.rb

Change source_monitor_queue :fetch to source_monitor_queue :maintenance in all 7 job files. This is a one-line change per file.

Acceptance: grep -r 'source_monitor_queue :fetch' app/jobs/ returns only fetch_feed_job.rb and schedule_fetches_job.rb. All 7 other jobs show source_monitor_queue :maintenance.

06-04-T3: Update example Solid Queue config

Files: examples/advanced_host/files/config/solid_queue.yml

Add source_monitor_maintenance queue entry with concurrency: 1 (matching the conservative default). Add a comment explaining the three queue roles.

Acceptance: Example config shows three SourceMonitor queues: fetch, scrape, maintenance.

06-04-T4: Write tests for queue separation

Files: test/lib/source_monitor/configuration_test.rb

Add tests: (1) "maintenance_queue_name defaults to source_monitor_maintenance", (2) "queue_name_for(:maintenance) returns maintenance queue name", (3) "concurrency_for(:maintenance) returns maintenance queue concurrency", (4) "maintenance_queue_name is configurable", (5) "queue_name_for raises for unknown role" (ensure :maintenance doesn't break existing error for truly unknown roles). Also add a test that verifies each job class resolves to the expected queue: fetch jobs → fetch queue, maintenance jobs → maintenance queue, scrape jobs → scrape queue.

Acceptance: All tests pass. bin/rails test test/lib/source_monitor/configuration_test.rb exits 0.

Verification

bash
bin/rails test test/lib/source_monitor/configuration_test.rb
bin/rubocop lib/source_monitor/configuration.rb app/jobs/source_monitor/*.rb examples/advanced_host/files/config/solid_queue.yml

Success Criteria

  • Non-fetch jobs (7 jobs) use maintenance queue
  • Fetch queue dedicated to FetchFeedJob + ScheduleFetchesJob only
  • Scrape queue unchanged (ScrapeItemJob)
  • config.maintenance_queue_name setting exists with default "source_monitor_maintenance"
  • config.maintenance_queue_concurrency defaults to 1 (small server friendly)
  • Example Solid Queue config updated with three SourceMonitor queues
  • All existing tests pass, new tests verify queue assignments
  • RuboCop zero offenses

Research Findings

Research: Fetch Throughput & Small Server Defaults

Source: Debug Investigation (3 parallel debuggers, all HIGH confidence)

Finding 1: Queue Saturation (Debugger H1)

Current state:

  • fetch_queue_concurrency defaults to 2 (lib/source_monitor/configuration.rb:40)
  • DEFAULT_BATCH_SIZE = 100 in Scheduler (lib/source_monitor/scheduler.rb:8)
  • ScheduleFetchesJob runs every minute (test/dummy/config/recurring.yml)
  • Each fetch is I/O-bound: 5-15s per request (15s HTTP timeout, 5s open timeout)
  • ALL job types share the same fetch queue: FetchFeedJob, ScheduleFetchesJob, SourceHealthCheckJob, ImportOpmlJob, FaviconFetchJob, DownloadContentImagesJob, LogCleanupJob, ItemCleanupJob
  • No limits_concurrency from Solid Queue is used; advisory locks per-source only
  • Advisory lock contention causes 30-second retry wait (fetch_feed_job.rb:5,11-13)

Math: With concurrency=2 and ~2s avg fetch time, throughput is ~60 jobs/min. But with 100 jobs enqueued per batch cycle, backlog grows continuously.

Finding 2: Thundering Herd (Debugger H2)

Current state:

  • ImportOpmlJob#build_attributes does NOT set next_fetch_at -- all imported sources start as NULL (immediately due)
  • SourcesController#create also has no next_fetch_at initialization
  • Scheduler treats NULL as immediately due (table[:next_fetch_at].eq(nil).or(table[:next_fetch_at].lteq(now)))
  • Fixed-interval path has ZERO jitter: Time.current + fixed_minutes.minutes exactly
  • Adaptive jitter is ±10% (JITTER_PERCENT = 0.1) but insufficient when base times are nearly identical
  • Scheduler enqueues all 100 due sources in a tight loop with no delay between them
  • No queue-aware scheduling: adaptive interval never checks how many other sources are already scheduled

Finding 3: Stale Processing Status (Debugger H3)

Current state:

  • update_source_state! (fetch_runner.rb:83-91) rescues ALL StandardError including DB update failures
  • No ensure block guarantees fetch_status reset from "fetching" to "idle"/"failed"
  • FollowUpHandler#call has no error handling -- exceptions propagate past mark_complete!
  • StalledFetchReconciler recovers after 10 minutes (STALE_QUEUE_TIMEOUT = 10.minutes)
  • User's screenshot showed sources 9-10 minutes overdue -- exactly at the reconciler threshold

Findings

Existing Configuration Hooks (host app can already override)

  • config.fetch_queue_concurrency -- defaults to 2
  • config.fetch_queue_name / config.scrape_queue_name -- queue names
  • ENV["SOURCE_MONITOR_FETCH_CONCURRENCY"] -- env var override in example config
  • config.fetching.adaptive_enabled -- toggle adaptive intervals
  • config.fetching.increase_factor / decrease_factor -- interval tuning
  • config.fetching.min_interval / max_interval -- interval bounds
  • config.http.timeout / config.http.open_timeout -- HTTP timeouts
  • config.http.max_retries -- retry count

Missing Configuration Hooks

  • No batch size configuration (hardcoded DEFAULT_BATCH_SIZE = 100)
  • No stale queue timeout configuration (hardcoded 10.minutes)
  • No jitter percentage configuration (hardcoded JITTER_PERCENT = 0.1)
  • No option to stagger initial fetch times on import

Relevant Patterns

  1. Configuration DSL pattern: All settings use SourceMonitor.configure { |config| ... } -- new knobs should follow the same pattern via settings sub-objects
  2. FetchingSettings (lib/source_monitor/configuration/fetching_settings.rb): Already has adaptive interval knobs; batch size and jitter should live here
  3. SchedulerSettings: Does not exist yet -- scheduler has hardcoded constants
  4. AdvisoryLock pattern: Per-source locking prevents duplicate fetches but doesn't help with throughput

Risks

  1. Memory on 1-CPU/2GB: Increasing concurrency too high will exhaust memory. Each Solid Queue worker thread holds a DB connection. Sweet spot is likely 3-5 for this hardware.
  2. Connection pool: More concurrent workers need larger DB connection pool. Default pool is usually 5 -- must coordinate with Solid Queue config.
  3. Backward compatibility: Changing defaults could affect existing host apps. All changes should be opt-in or conservative new defaults.

Recommendations

Priority 1: Fix error handling (correctness)

  • Split rescue in update_source_state! to only catch broadcast errors
  • Add ensure block in FetchRunner#run for status safety net
  • Add rescue in FollowUpHandler#call

Priority 2: Add scheduling jitter/stagger (throughput)

  • Add jitter to fixed-interval path
  • Stagger next_fetch_at during OPML import
  • Make JITTER_PERCENT configurable via FetchingSettings
  • Stagger job enqueuing in Scheduler (spread across the minute window)

Priority 3: Optimize defaults for small servers (configuration)

  • Lower default fetch_queue_concurrency to 2 (keep current) -- it's actually appropriate for 1-CPU/2GB
  • Lower DEFAULT_BATCH_SIZE from 100 to 25 and make configurable
  • Lower STALE_QUEUE_TIMEOUT from 10 to 5 minutes
  • Separate utility jobs (cleanup, favicon) from fetch queue

Priority 4: Document scaling guidance

  • Add comments/docs showing how to tune for different server sizes
  • Provide example configurations for small/medium/large deployments