.vbw-planning/milestones/ui-fixes-and-smart-scraping/phases/04-smart-scrape-recommendations/.context-dev.md
Not available
Codebase mapping exists in .vbw-planning/codebase/. Key files:
ARCHITECTURE.mdCONCERNS.mdPATTERNS.mdDEPENDENCIES.mdSTRUCTURE.mdCONVENTIONS.mdTESTING.mdSTACK.mdRead CONVENTIONS.md, PATTERNS.md, STRUCTURE.md, and DEPENDENCIES.md first to bootstrap codebase understanding.
.vbw-planning/discovery.json.vbw-planning/STATE.mdapp/assets/builds/source_monitor/application.cssGemfile.locktest/dummy/Gemfile.lock.vbw-planning/discovery.json (184 lines, first 30 shown){
"answered": [
{
"question": "What matters most in the conventions cleanup?",
"answer": "All of the above: Model conventions, Controller patterns, Dead code removal",
"category": "scope",
"phase": "4",
"date": "2026-02-10"
},
{
"question": "How should we handle convention violations that would change public API behavior?",
"answer": "Fix everything -- rename/restructure even if it changes method signatures or route patterns",
"category": "api-policy",
"phase": "4",
"date": "2026-02-10"
},
{
"question": "Favicon discovery strategy?",
"answer": "Multi-strategy cascade: /favicon.ico -> HTML parsing (full GET, Nokogiri, prefer largest) -> Google Favicon API. Skip DuckDuckGo.",
"area": "favicon-discovery",
"phase": "02",
"date": "2026-02-20"
},
{
"question": "How to handle downloaded favicons before storage?",
"answer": "Store raw original via Active Storage, define two variants: 32x32 (standard) and 64x64 (retina). SVGs stored as-is AND rasterized to PNG.",
"area": "image-processing",
"phase": "02",
"date": "2026-02-20"
},
.vbw-planning/STATE.md (25 lines)# State
**Project:** SourceMonitor
**Milestone:** ui-fixes-and-smart-scraping
**Phase:** 04 (Smart Scrape Recommendations)
**Plans:** 0/5 complete
**Progress:** 75%
**Status:** Planned
## Decisions
| Decision | Date | Context |
|----------|------|---------|
| Active Storage for favicons | 2026-02-20 | has_one_attached with guard, consistent with ItemContent pattern |
| Smarter scrape limit | 2026-02-20 | Count only running jobs, not queued; keeps safety but removes false bottleneck |
| Browser-like default UA | 2026-02-20 | Simple global fix for bot-blocked feeds like Uber |
| Health check triggers status update | 2026-02-20 | Successful manual health check should transition declining -> improving |
| Toast cap + hover expand | 2026-02-20 | Max 3 visible, +N more badge, hover to see all |
## Todos
- [x] Fix deprecation: `rails/tasks/statistics.rake` removed from Rakefile (2026-02-21)
## Blockers
None
app/assets/builds/source_monitor/application.css (2193 lines, first 30 shown)*, ::before, ::after {
--tw-border-spacing-x: 0;
--tw-border-spacing-y: 0;
--tw-translate-x: 0;
--tw-translate-y: 0;
--tw-rotate: 0;
--tw-skew-x: 0;
--tw-skew-y: 0;
--tw-scale-x: 1;
--tw-scale-y: 1;
--tw-pan-x: ;
--tw-pan-y: ;
--tw-pinch-zoom: ;
--tw-scroll-snap-strictness: proximity;
--tw-gradient-from-position: ;
--tw-gradient-via-position: ;
--tw-gradient-to-position: ;
--tw-ordinal: ;
--tw-slashed-zero: ;
--tw-numeric-figure: ;
--tw-numeric-spacing: ;
--tw-numeric-fraction: ;
--tw-ring-inset: ;
--tw-ring-offset-width: 0px;
--tw-ring-offset-color: #fff;
--tw-ring-color: rgb(59 130 246 / 0.5);
--tw-ring-offset-shadow: 0 0 #0000;
--tw-ring-shadow: 0 0 #0000;
--tw-shadow: 0 0 #0000;
--tw-shadow-colored: 0 0 #0000;
Gemfile.lock (426 lines, first 30 shown)PATH
remote: .
specs:
source_monitor (0.10.2)
cssbundling-rails (~> 1.4)
faraday (~> 2.9)
faraday-follow_redirects (~> 0.4)
faraday-gzip (~> 3.0)
faraday-retry (~> 2.2)
feedjira (>= 3.2, < 5.0)
jsbundling-rails (~> 1.3)
nokolexbor (~> 0.5)
rails (>= 8.0.3, < 10.0)
ransack (~> 4.2)
ruby-readability (~> 0.7)
solid_cable (>= 3.0, < 4.0)
solid_queue (>= 0.3, < 3.0)
turbo-rails (~> 2.0)
GEM
remote: https://rubygems.org/
specs:
action_text-trix (2.1.16)
railties
actioncable (8.1.2)
actionpack (= 8.1.2)
activesupport (= 8.1.2)
nio4r (~> 2.0)
websocket-driver (>= 0.6.1)
zeitwerk (~> 2.6)
test/dummy/Gemfile.lock (409 lines, first 30 shown)PATH
remote: ../..
specs:
source_monitor (0.10.2)
cssbundling-rails (~> 1.4)
faraday (~> 2.9)
faraday-follow_redirects (~> 0.4)
faraday-gzip (~> 3.0)
faraday-retry (~> 2.2)
feedjira (>= 3.2, < 5.0)
jsbundling-rails (~> 1.3)
nokolexbor (~> 0.5)
rails (>= 8.0.3, < 10.0)
ransack (~> 4.2)
ruby-readability (~> 0.7)
solid_cable (>= 3.0, < 4.0)
solid_queue (>= 0.3, < 3.0)
turbo-rails (~> 2.0)
GEM
remote: https://rubygems.org/
specs:
action_text-trix (2.1.16)
railties
actioncable (8.1.2)
actionpack (= 8.1.2)
activesupport (= 8.1.2)
nio4r (~> 2.0)
websocket-driver (>= 0.6.1)
zeitwerk (~> 2.6)
phase: "04" plan: "05" title: "Bulk Scrape Enablement with Confirmation Modal" wave: 3 depends_on: ["01", "03"] must_haves:
Add the ability to select multiple scrape candidate sources and enable scraping for them in bulk. Includes checkboxes on source rows, a confirmation modal with count/warning, and a controller action that updates scraping_enabled on selected sources.
Files:
app/controllers/source_monitor/bulk_scrape_enablements_controller.rbDescription: Create a controller for bulk-enabling scraping on selected sources. Follows CRUD-everything pattern as a standalone resource (not nested under individual source):
module SourceMonitor
class BulkScrapeEnablementsController < ApplicationController
def create
source_ids = Array(params.dig(:bulk_scrape_enablement, :source_ids)).map(&:to_i).reject(&:zero?)
if source_ids.empty?
handle_empty_selection
return
end
sources = Source.where(id: source_ids, scraping_enabled: false)
updated_count = sources.update_all(
scraping_enabled: true,
scraper_adapter: default_adapter,
updated_at: Time.current
)
respond_to do |format|
format.turbo_stream do
responder = SourceMonitor::TurboStreams::StreamResponder.new
responder.toast(
message: "Scraping enabled for #{updated_count} #{'source'.pluralize(updated_count)}.",
level: :success
)
# Redirect to refresh the sources index
responder.redirect(source_monitor.sources_path)
render turbo_stream: responder.render(view_context)
end
format.html do
redirect_to source_monitor.sources_path,
notice: "Scraping enabled for #{updated_count} #{'source'.pluralize(updated_count)}."
end
end
end
private
def default_adapter
SourceMonitor.config.scrapers.default_adapter_name || "readability"
end
def handle_empty_selection
respond_to do |format|
format.turbo_stream do
responder = SourceMonitor::TurboStreams::StreamResponder.new
responder.toast(message: "No sources selected.", level: :warning)
render turbo_stream: responder.render(view_context), status: :unprocessable_entity
end
format.html do
redirect_to source_monitor.sources_path, alert: "No sources selected."
end
end
end
end
end
Tests:
test/controllers/source_monitor/bulk_scrape_enablements_controller_test.rb:
Files:
config/routes.rbDescription: Add top-level resource route (not nested under sources since it operates across multiple sources):
resources :bulk_scrape_enablements, only: :create
Tests:
POST /bulk_scrape_enablements routes to bulk_scrape_enablements#createFiles:
app/views/source_monitor/sources/index.html.erbapp/views/source_monitor/sources/_row.html.erbDescription:
Wrap the sources table in a data-controller="select-all" scope. Add:
<th scope="col" class="w-10 px-3 py-3">
<input type="checkbox"
data-select-all-target="master"
data-action="select-all#toggleAll"
class="rounded border-slate-300 text-blue-600 focus:ring-blue-500"
aria-label="Select all sources">
</th>
_row.html.erb (only for scrape candidates):<td class="w-10 px-3 py-4">
<% if scrape_candidates.include?(source.id) %>
<input type="checkbox"
name="bulk_scrape_enablement[source_ids][]"
value="<%= source.id %>"
data-select-all-target="item"
data-action="select-all#toggleItem"
class="rounded border-slate-300 text-violet-600 focus:ring-violet-500"
aria-label="Select <%= source.name %>">
<% end %>
</td>
bulk-action-bar that shows/hides based on checkbox state, or reuse select-all with a connected bar target. Add a form that submits to bulk_scrape_enablements#create with a confirmation modal:<div data-select-all-target="actionBar" class="hidden sticky bottom-0 border-t border-slate-200 bg-white px-4 py-3 shadow-md">
<div class="flex items-center justify-between">
<span class="text-sm text-slate-700">
<span data-select-all-target="count">0</span> source(s) selected
</span>
<button type="button"
data-action="modal#open"
class="inline-flex items-center rounded-md bg-violet-600 px-4 py-2 text-sm font-semibold text-white shadow hover:bg-violet-500">
Enable Scraping
</button>
</div>
</div>
Tests:
Files:
app/views/source_monitor/sources/_bulk_scrape_enable_modal.html.erbDescription:
Create a confirmation modal using the existing modal Stimulus controller. The modal shows a warning and submits the bulk enablement form:
<div data-controller="modal" class="relative">
<div data-modal-target="panel" class="hidden fixed inset-0 z-50 flex items-center justify-center bg-black/50" data-action="click->modal#backdrop">
<div class="w-full max-w-md rounded-lg bg-white shadow-xl" data-action="click->modal#stop">
<div class="border-b border-slate-200 px-6 py-4">
<h3 class="text-lg font-semibold text-slate-900">Enable Scraping</h3>
</div>
<div class="px-6 py-4">
<p class="text-sm text-slate-700">
This will enable scraping for the selected sources using the default scraper adapter.
Each source's items will be scraped on their next scheduled run.
</p>
<p class="mt-3 text-sm font-medium text-amber-700">
This action will modify the selected sources' configuration.
</p>
</div>
<div class="flex justify-end gap-3 border-t border-slate-200 px-6 py-4">
<button type="button"
data-action="modal#close"
class="rounded-md border border-slate-200 px-4 py-2 text-sm font-medium text-slate-700 hover:bg-slate-50">
Cancel
</button>
<button type="submit"
class="rounded-md bg-violet-600 px-4 py-2 text-sm font-semibold text-white shadow hover:bg-violet-500">
Confirm Enable
</button>
</div>
</div>
</div>
</div>
The modal is wired into the form wrapping the sources table so the submit button posts the checked source IDs.
Tests:
Files:
app/assets/javascripts/source_monitor/controllers/select_all_controller.jsDescription:
Extend the existing select-all controller to support an optional actionBar target and count target. When any checkbox is checked, show the action bar and update the count:
Add to static targets: "actionBar", "count"
Add method updateActionBar():
updateActionBar() {
if (!this.hasActionBarTarget) return;
const checkedCount = this.itemTargets.filter(cb => cb.checked).length;
if (this.hasCountTarget) {
this.countTarget.textContent = checkedCount;
}
if (checkedCount > 0) {
this.actionBarTarget.classList.remove("hidden");
} else {
this.actionBarTarget.classList.add("hidden");
}
}
Call this.updateActionBar() at the end of toggleAll(), toggleItem(), syncMaster(), itemTargetConnected(), and itemTargetDisconnected().
Tests:
yarn build must succeed (ESLint check)File: app/models/source_monitor/source.rb
scraping_enabled (boolean), scraper_adapter (string, validates presence), scrape_settings (JSONB), min_scrape_interval (optional)active scope for filteringdue_for_fetch class method for schedulingavg_word_count method (lines 134-139) — currently computes average of scraped word counts onlyavg_feed_words and avg_scraped_words (lines 83-99) — query ItemContent for feed/scraped word counts by sourceitems_count auto-maintained via has_many associationFile: lib/source_monitor/configuration.rb
SourceMonitor.configure { |c| c.attr = value } pattern@scraping = ScrapingSettings.new already initializedmax_in_flight_per_source, max_bulk_batch_size, min_scrape_intervalattr_accessor + DEFAULT_* constant + reset! methodFiles: app/controllers/source_monitor/dashboard_controller.rb, lib/source_monitor/dashboard/queries.rb
queries.stats, queries.recent_activity, queries.quick_actions, queries.job_metrics, queries.upcoming_fetch_schedule{ total_sources, active_sources, failed_sources, total_items, fetches_today, health_distribution }Files: app/controllers/source_monitor/sources_controller.rb + view
searchable_with mixin)@q (Ransack query), computes @avg_feed_word_counts and @avg_scraped_word_counts as hashes (source_id -> avg)item_activity_rates, word count mapsFiles: app/jobs/source_monitor/scrape_item_job.rb, lib/source_monitor/scraping/bulk_source_scraper.rb, lib/source_monitor/scraping/item_scraper.rb
source.scraping_enabled?, respects min_scrape_interval, calls ItemScraper:current, :unscraped, :all. Returns Result struct with: status, selection, attempted_count, enqueued_count, already_enqueued_count, failure_count, failure_details, messages, rate_limitedAdapterResolver, calls adapter, persists resultFiles: app/models/source_monitor/item.rb, app/models/source_monitor/item_content.rb
has_one :item_content (autosave: true)feed_word_count and scraped_word_count (computed)feed_word_count from item.content, scraped_word_count from item_content.scraped_contentbefore_save hookFile: lib/source_monitor/analytics/sources_index_metrics.rb
base_scope, result_scope, search_paramsSourceFetchIntervalDistributionFile: app/controllers/source_monitor/source_bulk_scrapes_controller.rb
POST /sources/:source_id/bulk_scrape{ bulk_scrape: { selection: :current/:unscraped/:all } }SourceTurboResponses mixinFile: config/routes.rb
resource :bulk_scrape, only: :createget "/dashboard" -> dashboard#indexcreate_source! factory, mocks services, tests Turbo Stream responsesdom_id(source, :row)scrape_recommendation_threshold to ScrapingSettings with DEFAULT constant + resetavg_feed_words)SourcesIndexMetrics pattern for recommendation computationavg_feed_words Ransacker joins ItemContent — need all items to have content recordsSource.scrape_candidates scope using avg_feed_words thresholdPOST /sources/:id/test_scrape -> comparison pageSourceMonitor::Analytics::ScrapeRecommendations for candidate computation