.vbw-planning/codebase/ARCHITECTURE.md
SourceMonitor is a mountable Rails 8 engine that ingests RSS, Atom, and JSON feeds, scrapes full article content, and surfaces Solid Queue-powered dashboards for monitoring and remediation. It is packaged as a RubyGem and mounted into a host Rails application.
Mountable Rails Engine with isolate_namespace SourceMonitor. The engine:
ApplicationRecord, ApplicationController, and ApplicationJob base classessourcemon_)lib/source_monitor/fetching/)The primary data ingestion pipeline:
FeedFetcher -- Orchestrates HTTP request, feed parsing via Feedjira, item creation, adaptive interval scheduling, and retry policiesFetchRunner -- Entry point for enqueuing fetch jobs; handles concurrency controlFetchError hierarchy -- Typed error classes (TimeoutError, ConnectionError, HTTPError, ParsingError)RetryPolicy -- Exponential backoff with circuit breaker patternStalledFetchReconciler -- Recovers stalled fetch jobslib/source_monitor/scraping/ + lib/source_monitor/scrapers/)Pluggable content extraction system:
Scrapers::Base -- Abstract adapter contract; subclasses implement #call returning a Result structScrapers::Readability -- Default adapter using ruby-readabilityScraping::ItemScraper -- Orchestrator that resolves adapter, executes scrape, persists resultsScraping::BulkSourceScraper -- Batch scraping across all items for a sourceScraping::Enqueuer -- Manages scrape job queuing with in-flight throttlingScraping::State -- Tracks in-flight scrape state via cache/memorylib/source_monitor/health/)Source health tracking system:
SourceHealthMonitor -- Computes health status from recent fetch history (sliding window)SourceHealthCheck -- One-off health probe for a source URLImportSourceHealthCheck -- Variant for import wizard health checksSourceHealthReset -- Resets health status for a sourcelib/source_monitor/scheduler.rb)Scheduler -- Periodic job that finds sources due for fetch using FOR UPDATE SKIP LOCKEDStalledFetchReconciler for recoverylib/source_monitor/events.rb)ItemCreatedEvent, ItemScrapedEvent, FetchCompletedEventSourceMonitor.config.events.after_* DSLlib/source_monitor/configuration.rb)Rich nested configuration object with sub-configs:
HTTPSettings -- timeout, retries, user agent, proxy, headersScraperRegistry -- pluggable adapter registrationRetentionSettings -- item retention days, max items, strategy (destroy/soft_delete)RealtimeSettings -- adapter selection (solid_cable/redis/async)FetchingSettings -- adaptive interval tuningHealthSettings -- health window and threshold configurationAuthenticationSettings -- pluggable authentication/authorization handlersScrapingSettings -- concurrency limitsEvents -- callback registrationModels -- concern injection and custom validation registrationlib/source_monitor/model_extensions.rb)Dynamic model customization system:
ModelExtensions.register(model_class, key) -- called in each model class bodyModelExtensions.reload! -- re-applies all extensions (called on configuration change)lib/source_monitor/realtime/)Realtime::Adapter -- Configures Action Cable based on selected adapterRealtime::Broadcaster -- Broadcasts source/item updates and toast notificationsDashboard::TurboBroadcaster -- Wires dashboard stat updates to Turbo Streamslib/source_monitor/setup/)Comprehensive host-app installation workflow:
Setup::CLI -- Command-line interface for setupSetup::Workflow -- Orchestrates multi-step installationSetup::Requirements / Setup::Detectors -- System requirement checksSetup::GemfileEditor / Setup::BundleInstaller / Setup::NodeInstaller -- Dependency installationSetup::InstallGenerator -- Rails generator for migrations, routes, initializerSetup::Verification::Runner -- Post-install verification (Solid Queue, Action Cable)app/controllers/source_monitor/import_sessions_controller.rb)Multi-step wizard for bulk feed import:
ImportSession model with JSONB columnsSource (sourcemon_sources)
|-- has_many Item (sourcemon_items)
| |-- has_one ItemContent (sourcemon_item_contents) [separate table for large content]
| |-- has_many ScrapeLog (sourcemon_scrape_logs)
| +-- has_many LogEntry (sourcemon_log_entries) [polymorphic]
|-- has_many FetchLog (sourcemon_fetch_logs)
|-- has_many HealthCheckLog (sourcemon_health_check_logs)
+-- has_many LogEntry (sourcemon_log_entries)
LogEntry (sourcemon_log_entries)
|-- delegated_type :loggable -> FetchLog | ScrapeLog | HealthCheckLog
ImportSession (sourcemon_import_sessions)
+-- JSONB columns for wizard state
ImportHistory (sourcemon_import_histories)
+-- Records completed imports
All jobs inherit from SourceMonitor::ApplicationJob which inherits from the host app's ApplicationJob (or ActiveJob::Base).
| Job | Queue | Purpose |
|---|---|---|
ScheduleFetchesJob | fetch | Recurring: triggers Scheduler to find and enqueue due sources |
FetchFeedJob | fetch | Fetches a single source's feed, creates items |
ScrapeItemJob | scrape | Scrapes content for a single item |
SourceHealthCheckJob | fetch | Runs health check for a source |
ImportSessionHealthCheckJob | fetch | Health checks during OPML import wizard |
ImportOpmlJob | fetch | Bulk-creates sources from OPML import |
LogCleanupJob | fetch | Recurring: prunes old log entries |
ItemCleanupJob | fetch | Recurring: prunes old items per retention policy |
Queue names are configurable via config.fetch_queue_name and config.scrape_queue_name.
Security::Authentication -- Pluggable authentication via handler callbacks (symbol method names or callables)Security::ParameterSanitizer -- HTML sanitization of all user inputs via ActionView::Base.full_sanitizerModels::Sanitizable -- Concern that sanitizes string and hash model attributes before validationModels::UrlNormalizable -- URL normalization and validation concernSanitizesSearchParams -- Controller concern for search parameter sanitizationprotect_from_forgery with: :exception)Instrumentation -- Emits ActiveSupport::Notifications events for fetch lifecycleMetrics -- In-memory counters and gauges, populated via notification subscriberssource_monitor.fetch.start, source_monitor.fetch.finish, source_monitor.scheduler.run, source_monitor.items.duplicate, source_monitor.items.retention