.vbw-planning/milestones/polish-and-reliability/phases/06-fetch-throughput-defaults/.context-dev.md
Not available
Codebase mapping exists in .vbw-planning/codebase/. Key files:
ARCHITECTURE.mdCONCERNS.mdPATTERNS.mdDEPENDENCIES.mdSTRUCTURE.mdCONVENTIONS.mdTESTING.mdSTACK.mdRead CONVENTIONS.md, PATTERNS.md, STRUCTURE.md, and DEPENDENCIES.md first to bootstrap codebase understanding.
.vbw-planning/config.json.vbw-planning/discovery.json.vbw-planning/ROADMAP.md.vbw-planning/STATE.mdCLAUDE.md.vbw-planning/config.json (46 lines){
"effort": "thorough",
"autonomy": "standard",
"auto_commit": true,
"planning_tracking": "manual",
"auto_push": "never",
"verification_tier": "standard",
"skill_suggestions": true,
"auto_install_skills": false,
"discovery_questions": true,
"context_compiler": true,
"visual_format": "unicode",
"max_tasks_per_plan": 5,
"prefer_teams": "always",
"branch_per_milestone": false,
"plain_summary": true,
"active_profile": "default",
"custom_profiles": {},
"model_profile": "quality",
"model_overrides": {},
"agent_max_turns": {
"scout": 15,
"qa": 25,
"architect": 30,
"debugger": 80,
"lead": 50,
"dev": 75
},
"qa_skip_agents": [
"docs"
],
"worktree_isolation": "on",
"token_budgets": false,
"two_phase_completion": false,
"metrics": false,
"smart_routing": false,
"validation_gates": false,
"snapshot_resume": false,
"lease_locks": false,
"event_recovery": false,
"monorepo_routing": false,
"rolling_summary": false,
"require_phase_discussion": false,
"auto_uat": false,
"compaction_trigger": 130000
}
.vbw-planning/discovery.json (128 lines, first 30 shown){
"answered": [
{
"question": "What matters most in the conventions cleanup?",
"answer": "All of the above: Model conventions, Controller patterns, Dead code removal",
"category": "scope",
"phase": "4",
"date": "2026-02-10"
},
{
"question": "How should we handle convention violations that would change public API behavior?",
"answer": "Fix everything -- rename/restructure even if it changes method signatures or route patterns",
"category": "api-policy",
"phase": "4",
"date": "2026-02-10"
},
{
"question": "Favicon discovery strategy?",
"answer": "Multi-strategy cascade: /favicon.ico -> HTML parsing (full GET, Nokogiri, prefer largest) -> Google Favicon API. Skip DuckDuckGo.",
"area": "favicon-discovery",
"phase": "02",
"date": "2026-02-20"
},
{
"question": "How to handle downloaded favicons before storage?",
"answer": "Store raw original via Active Storage, define two variants: 32x32 (standard) and 64x64 (retina). SVGs stored as-is AND rasterized to PNG.",
"area": "image-processing",
"phase": "02",
"date": "2026-02-20"
},
.vbw-planning/ROADMAP.md (153 lines, first 30 shown)# Roadmap
## Milestone: polish-and-reliability
### Phases
1. [x] **Backend Fixes** -- Fix browser User-Agent default, health check status transitions, and smarter scrape rate limiting
2. [x] **Favicon Support** -- Automatically save source favicons via Active Storage with background fetch job
3. [x] **Toast Stacking** -- Cap visible toast notifications with click-to-expand for bulk operation UX
4. [x] **Bug Fixes & Polish** -- Fix OPML import warning, toast positioning, dashboard alignment, source deletion, and published column
5. [x] **Source Enhancements** -- Add pagination/filtering for sources, per-source scrape rate limiting, and word count metrics
6. [ ] **Fetch Throughput & Small Server Defaults** -- Fix fetch pipeline error handling, add scheduling jitter/stagger, and optimize defaults for 1-CPU/2GB deployments
### Phase Details
#### Phase 1: Backend Fixes
**Goal:** Fix three independent backend issues: bot-blocked feeds due to User-Agent, health check not updating status, and overly aggressive scrape limiting.
**Requirements:**
- REQ-UA-01: Change default User-Agent from "SourceMonitor/VERSION" to a browser-like string
- REQ-HC-01: After a successful manual health check on a declining/critical/warning source, trigger SourceHealthMonitor re-evaluation or directly transition status to "improving"
- REQ-SL-01: Refine max_in_flight_per_source to only count actively-running scrape jobs (not queued ones)
**Success Criteria:**
- [ ] Default UA string resembles a real browser (e.g., Mozilla/5.0 compatible)
- [ ] Successful manual health check on a declining source transitions it to improving
- [ ] Scrape limit counts only actively-running jobs, queued items don't count toward the cap
- [ ] All existing tests pass, new tests cover changed behavior
- [ ] RuboCop zero offenses, Brakeman zero warnings
.vbw-planning/STATE.md (32 lines)# State
## Current Position
- **Milestone:** polish-and-reliability
- **Phase:** 6 -- Fetch Throughput & Small Server Defaults
- **Status:** Planned
- **Progress:** 83%
- **Plans:** 4 (0/4 complete)
## Decisions
| Decision | Date | Context |
|----------|------|---------|
| Active Storage for favicons | 2026-02-20 | has_one_attached with guard, consistent with ItemContent pattern |
| Smarter scrape limit | 2026-02-20 | Count only running jobs, not queued; keeps safety but removes false bottleneck |
| Browser-like default UA | 2026-02-20 | Simple global fix for bot-blocked feeds like Uber |
| Health check triggers status update | 2026-02-20 | Successful manual health check should transition declining -> improving |
| Toast cap + hover expand | 2026-02-20 | Max 3 visible, +N more badge, hover to see all |
## Todos
- [x] Fix deprecation: `rails/tasks/statistics.rake` removed from Rakefile (2026-02-21)
## Metrics
- **Started:** 2026-02-20
- **Phases:** 6
- **Tests at start:** 1033
## Blockers
None
CLAUDE.md (245 lines, first 30 shown)# SourceMonitor
**Core value:** Drop-in Rails engine for feed monitoring, content scraping, and operational dashboards.
## Active Context
**Milestone:** polish-and-reliability (extended)
**Phase:** 6 of 6 -- Fetch Throughput & Small Server Defaults (pending planning)
**Previous phases:** Backend Fixes, Favicon Support, Toast Stacking, Bug Fixes & Polish, Source Enhancements (all complete)
**Next action:** /vbw:vibe to plan and execute Phase 6
## Key Decisions
- Keep PostgreSQL-only for now
- Keep host-app auth model
- Ruby autoload for lib/ modules (not Zeitwerk)
- PG parallel fork segfault resolved: switched to thread-based parallelism in aia-ssl-fix milestone
## Installed Skills
- agent-browser (global)
- flowdeck (global)
- ralph-tui-create-json (global)
- ralph-tui-prd (global)
- vastai (global)
- find-skills (global)
## Learned Patterns
- Sub-module extraction: create `module/submodule.rb` with `require_relative`, lazy accessors, forwarding methods for backward compat
phase: 6 plan: 4 title: Queue Separation -- Maintenance Queue wave: 1 depends_on: [] must_haves:
Add a third "maintenance" queue for non-fetch jobs so the fetch queue is dedicated to FetchFeedJob + ScheduleFetchesJob only. This prevents slow maintenance operations (cleanup, favicon, images, health check, import) from competing for fetch queue slots. REQ-FT-09, REQ-FT-10.
@ lib/source_monitor/configuration.rb -- queue_name_for (line 60-79) and concurrency_for (line 81-90) currently support :fetch and :scrape roles only@ app/jobs/source_monitor/application_job.rb -- source_monitor_queue helper delegates to SourceMonitor.queue_name(role)@ app/jobs/source_monitor/fetch_feed_job.rb -- source_monitor_queue :fetch (stays)@ app/jobs/source_monitor/schedule_fetches_job.rb -- source_monitor_queue :fetch (stays)@ app/jobs/source_monitor/scrape_item_job.rb -- source_monitor_queue :scrape (stays)@ app/jobs/source_monitor/source_health_check_job.rb -- source_monitor_queue :fetch (change to :maintenance)@ app/jobs/source_monitor/import_session_health_check_job.rb -- source_monitor_queue :fetch (change to :maintenance)@ app/jobs/source_monitor/import_opml_job.rb -- source_monitor_queue :fetch (change to :maintenance)@ app/jobs/source_monitor/log_cleanup_job.rb -- source_monitor_queue :fetch (change to :maintenance)@ app/jobs/source_monitor/item_cleanup_job.rb -- source_monitor_queue :fetch (change to :maintenance)@ app/jobs/source_monitor/favicon_fetch_job.rb -- source_monitor_queue :fetch (change to :maintenance)@ app/jobs/source_monitor/download_content_images_job.rb -- source_monitor_queue :fetch (change to :maintenance)@ examples/advanced_host/files/config/solid_queue.yml -- needs maintenance queue entry@ test/lib/source_monitor/configuration_test.rb -- existing configuration testsFiles: lib/source_monitor/configuration.rb
Add maintenance_queue_name to attr_accessor (default: "#{DEFAULT_QUEUE_NAMESPACE}_maintenance"). Add maintenance_queue_concurrency to attr_accessor (default: 1 -- conservative for small servers). Extend queue_name_for to handle :maintenance role. Extend concurrency_for to handle :maintenance role.
Acceptance: SourceMonitor.config.maintenance_queue_name returns "source_monitor_maintenance". SourceMonitor.config.queue_name_for(:maintenance) returns the name with any ActiveJob prefix. SourceMonitor.config.concurrency_for(:maintenance) returns 1.
Files: app/jobs/source_monitor/source_health_check_job.rb, app/jobs/source_monitor/import_session_health_check_job.rb, app/jobs/source_monitor/import_opml_job.rb, app/jobs/source_monitor/log_cleanup_job.rb, app/jobs/source_monitor/item_cleanup_job.rb, app/jobs/source_monitor/favicon_fetch_job.rb, app/jobs/source_monitor/download_content_images_job.rb
Change source_monitor_queue :fetch to source_monitor_queue :maintenance in all 7 job files. This is a one-line change per file.
Acceptance: grep -r 'source_monitor_queue :fetch' app/jobs/ returns only fetch_feed_job.rb and schedule_fetches_job.rb. All 7 other jobs show source_monitor_queue :maintenance.
Files: examples/advanced_host/files/config/solid_queue.yml
Add source_monitor_maintenance queue entry with concurrency: 1 (matching the conservative default). Add a comment explaining the three queue roles.
Acceptance: Example config shows three SourceMonitor queues: fetch, scrape, maintenance.
Files: test/lib/source_monitor/configuration_test.rb
Add tests: (1) "maintenance_queue_name defaults to source_monitor_maintenance", (2) "queue_name_for(:maintenance) returns maintenance queue name", (3) "concurrency_for(:maintenance) returns maintenance queue concurrency", (4) "maintenance_queue_name is configurable", (5) "queue_name_for raises for unknown role" (ensure :maintenance doesn't break existing error for truly unknown roles). Also add a test that verifies each job class resolves to the expected queue: fetch jobs → fetch queue, maintenance jobs → maintenance queue, scrape jobs → scrape queue.
Acceptance: All tests pass. bin/rails test test/lib/source_monitor/configuration_test.rb exits 0.
bin/rails test test/lib/source_monitor/configuration_test.rb
bin/rubocop lib/source_monitor/configuration.rb app/jobs/source_monitor/*.rb examples/advanced_host/files/config/solid_queue.yml
config.maintenance_queue_name setting exists with default "source_monitor_maintenance"config.maintenance_queue_concurrency defaults to 1 (small server friendly)Current state:
fetch_queue_concurrency defaults to 2 (lib/source_monitor/configuration.rb:40)DEFAULT_BATCH_SIZE = 100 in Scheduler (lib/source_monitor/scheduler.rb:8)test/dummy/config/recurring.yml)limits_concurrency from Solid Queue is used; advisory locks per-source onlyfetch_feed_job.rb:5,11-13)Math: With concurrency=2 and ~2s avg fetch time, throughput is ~60 jobs/min. But with 100 jobs enqueued per batch cycle, backlog grows continuously.
Current state:
ImportOpmlJob#build_attributes does NOT set next_fetch_at -- all imported sources start as NULL (immediately due)SourcesController#create also has no next_fetch_at initializationtable[:next_fetch_at].eq(nil).or(table[:next_fetch_at].lteq(now)))Time.current + fixed_minutes.minutes exactlyJITTER_PERCENT = 0.1) but insufficient when base times are nearly identicalCurrent state:
update_source_state! (fetch_runner.rb:83-91) rescues ALL StandardError including DB update failuresensure block guarantees fetch_status reset from "fetching" to "idle"/"failed"FollowUpHandler#call has no error handling -- exceptions propagate past mark_complete!StalledFetchReconciler recovers after 10 minutes (STALE_QUEUE_TIMEOUT = 10.minutes)config.fetch_queue_concurrency -- defaults to 2config.fetch_queue_name / config.scrape_queue_name -- queue namesENV["SOURCE_MONITOR_FETCH_CONCURRENCY"] -- env var override in example configconfig.fetching.adaptive_enabled -- toggle adaptive intervalsconfig.fetching.increase_factor / decrease_factor -- interval tuningconfig.fetching.min_interval / max_interval -- interval boundsconfig.http.timeout / config.http.open_timeout -- HTTP timeoutsconfig.http.max_retries -- retry countSourceMonitor.configure { |config| ... } -- new knobs should follow the same pattern via settings sub-objectslib/source_monitor/configuration/fetching_settings.rb): Already has adaptive interval knobs; batch size and jitter should live hereupdate_source_state! to only catch broadcast errorsensure block in FetchRunner#run for status safety netFollowUpHandler#callnext_fetch_at during OPML importJITTER_PERCENT configurable via FetchingSettingsfetch_queue_concurrency to 2 (keep current) -- it's actually appropriate for 1-CPU/2GBDEFAULT_BATCH_SIZE from 100 to 25 and make configurableSTALE_QUEUE_TIMEOUT from 10 to 5 minutes