Back to Source Monitor

.Context Dev

.vbw-planning/milestones/ui-fixes-and-smart-scraping/phases/04-smart-scrape-recommendations/.context-dev.md

0.13.019.2 KB
Original Source

Phase 04 Context

Goal

Not available

Codebase Map Available

Codebase mapping exists in .vbw-planning/codebase/. Key files:

  • ARCHITECTURE.md
  • CONCERNS.md
  • PATTERNS.md
  • DEPENDENCIES.md
  • STRUCTURE.md
  • CONVENTIONS.md
  • TESTING.md
  • STACK.md

Read CONVENTIONS.md, PATTERNS.md, STRUCTURE.md, and DEPENDENCIES.md first to bootstrap codebase understanding.

Changed Files (Delta)

  • .vbw-planning/discovery.json
  • .vbw-planning/STATE.md
  • app/assets/builds/source_monitor/application.css
  • Gemfile.lock
  • test/dummy/Gemfile.lock

Code Slices

.vbw-planning/discovery.json (184 lines, first 30 shown)

{
  "answered": [
    {
      "question": "What matters most in the conventions cleanup?",
      "answer": "All of the above: Model conventions, Controller patterns, Dead code removal",
      "category": "scope",
      "phase": "4",
      "date": "2026-02-10"
    },
    {
      "question": "How should we handle convention violations that would change public API behavior?",
      "answer": "Fix everything -- rename/restructure even if it changes method signatures or route patterns",
      "category": "api-policy",
      "phase": "4",
      "date": "2026-02-10"
    },
    {
      "question": "Favicon discovery strategy?",
      "answer": "Multi-strategy cascade: /favicon.ico -> HTML parsing (full GET, Nokogiri, prefer largest) -> Google Favicon API. Skip DuckDuckGo.",
      "area": "favicon-discovery",
      "phase": "02",
      "date": "2026-02-20"
    },
    {
      "question": "How to handle downloaded favicons before storage?",
      "answer": "Store raw original via Active Storage, define two variants: 32x32 (standard) and 64x64 (retina). SVGs stored as-is AND rasterized to PNG.",
      "area": "image-processing",
      "phase": "02",
      "date": "2026-02-20"
    },

.vbw-planning/STATE.md (25 lines)

# State

**Project:** SourceMonitor
**Milestone:** ui-fixes-and-smart-scraping
**Phase:** 04 (Smart Scrape Recommendations)
**Plans:** 0/5 complete
**Progress:** 75%
**Status:** Planned

## Decisions

| Decision | Date | Context |
|----------|------|---------|
| Active Storage for favicons | 2026-02-20 | has_one_attached with guard, consistent with ItemContent pattern |
| Smarter scrape limit | 2026-02-20 | Count only running jobs, not queued; keeps safety but removes false bottleneck |
| Browser-like default UA | 2026-02-20 | Simple global fix for bot-blocked feeds like Uber |
| Health check triggers status update | 2026-02-20 | Successful manual health check should transition declining -> improving |
| Toast cap + hover expand | 2026-02-20 | Max 3 visible, +N more badge, hover to see all |

## Todos

- [x] Fix deprecation: `rails/tasks/statistics.rake` removed from Rakefile (2026-02-21)

## Blockers
None

app/assets/builds/source_monitor/application.css (2193 lines, first 30 shown)

*, ::before, ::after {
  --tw-border-spacing-x: 0;
  --tw-border-spacing-y: 0;
  --tw-translate-x: 0;
  --tw-translate-y: 0;
  --tw-rotate: 0;
  --tw-skew-x: 0;
  --tw-skew-y: 0;
  --tw-scale-x: 1;
  --tw-scale-y: 1;
  --tw-pan-x:  ;
  --tw-pan-y:  ;
  --tw-pinch-zoom:  ;
  --tw-scroll-snap-strictness: proximity;
  --tw-gradient-from-position:  ;
  --tw-gradient-via-position:  ;
  --tw-gradient-to-position:  ;
  --tw-ordinal:  ;
  --tw-slashed-zero:  ;
  --tw-numeric-figure:  ;
  --tw-numeric-spacing:  ;
  --tw-numeric-fraction:  ;
  --tw-ring-inset:  ;
  --tw-ring-offset-width: 0px;
  --tw-ring-offset-color: #fff;
  --tw-ring-color: rgb(59 130 246 / 0.5);
  --tw-ring-offset-shadow: 0 0 #0000;
  --tw-ring-shadow: 0 0 #0000;
  --tw-shadow: 0 0 #0000;
  --tw-shadow-colored: 0 0 #0000;

Gemfile.lock (426 lines, first 30 shown)

PATH
  remote: .
  specs:
    source_monitor (0.10.2)
      cssbundling-rails (~> 1.4)
      faraday (~> 2.9)
      faraday-follow_redirects (~> 0.4)
      faraday-gzip (~> 3.0)
      faraday-retry (~> 2.2)
      feedjira (>= 3.2, < 5.0)
      jsbundling-rails (~> 1.3)
      nokolexbor (~> 0.5)
      rails (>= 8.0.3, < 10.0)
      ransack (~> 4.2)
      ruby-readability (~> 0.7)
      solid_cable (>= 3.0, < 4.0)
      solid_queue (>= 0.3, < 3.0)
      turbo-rails (~> 2.0)

GEM
  remote: https://rubygems.org/
  specs:
    action_text-trix (2.1.16)
      railties
    actioncable (8.1.2)
      actionpack (= 8.1.2)
      activesupport (= 8.1.2)
      nio4r (~> 2.0)
      websocket-driver (>= 0.6.1)
      zeitwerk (~> 2.6)

test/dummy/Gemfile.lock (409 lines, first 30 shown)

PATH
  remote: ../..
  specs:
    source_monitor (0.10.2)
      cssbundling-rails (~> 1.4)
      faraday (~> 2.9)
      faraday-follow_redirects (~> 0.4)
      faraday-gzip (~> 3.0)
      faraday-retry (~> 2.2)
      feedjira (>= 3.2, < 5.0)
      jsbundling-rails (~> 1.3)
      nokolexbor (~> 0.5)
      rails (>= 8.0.3, < 10.0)
      ransack (~> 4.2)
      ruby-readability (~> 0.7)
      solid_cable (>= 3.0, < 4.0)
      solid_queue (>= 0.3, < 3.0)
      turbo-rails (~> 2.0)

GEM
  remote: https://rubygems.org/
  specs:
    action_text-trix (2.1.16)
      railties
    actioncable (8.1.2)
      actionpack (= 8.1.2)
      activesupport (= 8.1.2)
      nio4r (~> 2.0)
      websocket-driver (>= 0.6.1)
      zeitwerk (~> 2.6)

Active Plan


phase: "04" plan: "05" title: "Bulk Scrape Enablement with Confirmation Modal" wave: 3 depends_on: ["01", "03"] must_haves:

  • "Checkboxes on source rows for selecting scrape candidates"
  • "Bulk enable scraping action for selected sources"
  • "Confirmation modal showing count and warning before enabling"
  • "Controller action to bulk-update scraping_enabled"
  • "Tests for bulk controller, Stimulus controller, and confirmation flow"

Plan 05: Bulk Scrape Enablement with Confirmation Modal

Overview

Add the ability to select multiple scrape candidate sources and enable scraping for them in bulk. Includes checkboxes on source rows, a confirmation modal with count/warning, and a controller action that updates scraping_enabled on selected sources.


Task 1: Create BulkScrapeEnablementsController

Files:

  • app/controllers/source_monitor/bulk_scrape_enablements_controller.rb

Description: Create a controller for bulk-enabling scraping on selected sources. Follows CRUD-everything pattern as a standalone resource (not nested under individual source):

ruby
module SourceMonitor
  class BulkScrapeEnablementsController < ApplicationController
    def create
      source_ids = Array(params.dig(:bulk_scrape_enablement, :source_ids)).map(&:to_i).reject(&:zero?)

      if source_ids.empty?
        handle_empty_selection
        return
      end

      sources = Source.where(id: source_ids, scraping_enabled: false)
      updated_count = sources.update_all(
        scraping_enabled: true,
        scraper_adapter: default_adapter,
        updated_at: Time.current
      )

      respond_to do |format|
        format.turbo_stream do
          responder = SourceMonitor::TurboStreams::StreamResponder.new
          responder.toast(
            message: "Scraping enabled for #{updated_count} #{'source'.pluralize(updated_count)}.",
            level: :success
          )
          # Redirect to refresh the sources index
          responder.redirect(source_monitor.sources_path)
          render turbo_stream: responder.render(view_context)
        end
        format.html do
          redirect_to source_monitor.sources_path,
            notice: "Scraping enabled for #{updated_count} #{'source'.pluralize(updated_count)}."
        end
      end
    end

    private

    def default_adapter
      SourceMonitor.config.scrapers.default_adapter_name || "readability"
    end

    def handle_empty_selection
      respond_to do |format|
        format.turbo_stream do
          responder = SourceMonitor::TurboStreams::StreamResponder.new
          responder.toast(message: "No sources selected.", level: :warning)
          render turbo_stream: responder.render(view_context), status: :unprocessable_entity
        end
        format.html do
          redirect_to source_monitor.sources_path, alert: "No sources selected."
        end
      end
    end
  end
end

Tests:

  • test/controllers/source_monitor/bulk_scrape_enablements_controller_test.rb:
    • POST create with valid source_ids: updates sources, returns success toast
    • POST create with empty source_ids: returns warning
    • Only updates sources where scraping_enabled is false
    • Sets scraper_adapter to default adapter
    • Turbo stream format returns redirect action

Task 2: Add route for bulk scrape enablements

Files:

  • config/routes.rb

Description: Add top-level resource route (not nested under sources since it operates across multiple sources):

ruby
resources :bulk_scrape_enablements, only: :create

Tests:

  • Route test: assert POST /bulk_scrape_enablements routes to bulk_scrape_enablements#create

Task 3: Add checkboxes and bulk action bar to sources index

Files:

  • app/views/source_monitor/sources/index.html.erb
  • app/views/source_monitor/sources/_row.html.erb

Description: Wrap the sources table in a data-controller="select-all" scope. Add:

  1. Header checkbox in the table header (master select-all):
erb
<th scope="col" class="w-10 px-3 py-3">
  <input type="checkbox"
         data-select-all-target="master"
         data-action="select-all#toggleAll"
         class="rounded border-slate-300 text-blue-600 focus:ring-blue-500"
         aria-label="Select all sources">
</th>
  1. Row checkboxes in _row.html.erb (only for scrape candidates):
erb
<td class="w-10 px-3 py-4">
  <% if scrape_candidates.include?(source.id) %>
    <input type="checkbox"
           name="bulk_scrape_enablement[source_ids][]"
           value="<%= source.id %>"
           data-select-all-target="item"
           data-action="select-all#toggleItem"
           class="rounded border-slate-300 text-violet-600 focus:ring-violet-500"
           aria-label="Select <%= source.name %>">
  <% end %>
</td>
  1. Bulk action bar below the table (sticky bottom bar, hidden when no checkboxes checked): Create a Stimulus controller bulk-action-bar that shows/hides based on checkbox state, or reuse select-all with a connected bar target. Add a form that submits to bulk_scrape_enablements#create with a confirmation modal:
erb
<div data-select-all-target="actionBar" class="hidden sticky bottom-0 border-t border-slate-200 bg-white px-4 py-3 shadow-md">
  <div class="flex items-center justify-between">
    <span class="text-sm text-slate-700">
      <span data-select-all-target="count">0</span> source(s) selected
    </span>
    <button type="button"
            data-action="modal#open"
            class="inline-flex items-center rounded-md bg-violet-600 px-4 py-2 text-sm font-semibold text-white shadow hover:bg-violet-500">
      Enable Scraping
    </button>
  </div>
</div>

Tests:

  • System/integration test: checkboxes appear for candidate sources, not for non-candidates

Task 4: Create confirmation modal partial

Files:

  • app/views/source_monitor/sources/_bulk_scrape_enable_modal.html.erb

Description: Create a confirmation modal using the existing modal Stimulus controller. The modal shows a warning and submits the bulk enablement form:

erb
<div data-controller="modal" class="relative">
  <div data-modal-target="panel" class="hidden fixed inset-0 z-50 flex items-center justify-center bg-black/50" data-action="click->modal#backdrop">
    <div class="w-full max-w-md rounded-lg bg-white shadow-xl" data-action="click->modal#stop">
      <div class="border-b border-slate-200 px-6 py-4">
        <h3 class="text-lg font-semibold text-slate-900">Enable Scraping</h3>
      </div>
      <div class="px-6 py-4">
        <p class="text-sm text-slate-700">
          This will enable scraping for the selected sources using the default scraper adapter.
          Each source's items will be scraped on their next scheduled run.
        </p>
        <p class="mt-3 text-sm font-medium text-amber-700">
          This action will modify the selected sources' configuration.
        </p>
      </div>
      <div class="flex justify-end gap-3 border-t border-slate-200 px-6 py-4">
        <button type="button"
                data-action="modal#close"
                class="rounded-md border border-slate-200 px-4 py-2 text-sm font-medium text-slate-700 hover:bg-slate-50">
          Cancel
        </button>
        <button type="submit"
                class="rounded-md bg-violet-600 px-4 py-2 text-sm font-semibold text-white shadow hover:bg-violet-500">
          Confirm Enable
        </button>
      </div>
    </div>
  </div>
</div>

The modal is wired into the form wrapping the sources table so the submit button posts the checked source IDs.

Tests:

  • System test: modal appears when "Enable Scraping" button clicked, "Confirm Enable" submits form

Task 5: Extend select-all Stimulus controller for action bar visibility

Files:

  • app/assets/javascripts/source_monitor/controllers/select_all_controller.js

Description: Extend the existing select-all controller to support an optional actionBar target and count target. When any checkbox is checked, show the action bar and update the count:

Add to static targets: "actionBar", "count"

Add method updateActionBar():

javascript
updateActionBar() {
  if (!this.hasActionBarTarget) return;
  const checkedCount = this.itemTargets.filter(cb => cb.checked).length;
  if (this.hasCountTarget) {
    this.countTarget.textContent = checkedCount;
  }
  if (checkedCount > 0) {
    this.actionBarTarget.classList.remove("hidden");
  } else {
    this.actionBarTarget.classList.add("hidden");
  }
}

Call this.updateActionBar() at the end of toggleAll(), toggleItem(), syncMaster(), itemTargetConnected(), and itemTargetDisconnected().

Tests:

  • yarn build must succeed (ESLint check)
  • System test: action bar appears/disappears based on checkbox state

Research Findings

Phase 04: Smart Scrape Recommendations — Research

Findings

1. Source Model Structure

File: app/models/source_monitor/source.rb

  • Key attributes: scraping_enabled (boolean), scraper_adapter (string, validates presence), scrape_settings (JSONB), min_scrape_interval (optional)
  • Existing scopes & methods:
    • active scope for filtering
    • due_for_fetch class method for scheduling
    • avg_word_count method (lines 134-139) — currently computes average of scraped word counts only
    • Ransacker columns for avg_feed_words and avg_scraped_words (lines 83-99) — query ItemContent for feed/scraped word counts by source
  • Counter caches: items_count auto-maintained via has_many association

2. Configuration DSL

File: lib/source_monitor/configuration.rb

  • SourceMonitor.configure { |c| c.attr = value } pattern
  • @scraping = ScrapingSettings.new already initialized
  • ScrapingSettings has: max_in_flight_per_source, max_bulk_batch_size, min_scrape_interval
  • Pattern: attr_accessor + DEFAULT_* constant + reset! method

3. Dashboard Structure

Files: app/controllers/source_monitor/dashboard_controller.rb, lib/source_monitor/dashboard/queries.rb

  • Dashboard renders via queries.stats, queries.recent_activity, queries.quick_actions, queries.job_metrics, queries.upcoming_fetch_schedule
  • StatsQuery returns: { total_sources, active_sources, failed_sources, total_items, fetches_today, health_distribution }
  • Widget structure: bordered card with header + divider + list/content

4. Sources Index Structure

Files: app/controllers/source_monitor/sources_controller.rb + view

  • Uses Ransack for search (searchable_with mixin)
  • Index action: builds @q (Ransack query), computes @avg_feed_word_counts and @avg_scraped_word_counts as hashes (source_id -> avg)
  • Row partial renders with item_activity_rates, word count maps
  • Row has dropdown menu with View/Edit/Delete actions

5. Scraping Pipeline

Files: app/jobs/source_monitor/scrape_item_job.rb, lib/source_monitor/scraping/bulk_source_scraper.rb, lib/source_monitor/scraping/item_scraper.rb

  • ScrapeItemJob: Checks source.scraping_enabled?, respects min_scrape_interval, calls ItemScraper
  • BulkSourceScraper: Selections: :current, :unscraped, :all. Returns Result struct with: status, selection, attempted_count, enqueued_count, already_enqueued_count, failure_count, failure_details, messages, rate_limited
  • ItemScraper: Resolves adapter via AdapterResolver, calls adapter, persists result

6. Item Model & Word Count Computation

Files: app/models/source_monitor/item.rb, app/models/source_monitor/item_content.rb

  • Item has has_one :item_content (autosave: true)
  • ItemContent stores feed_word_count and scraped_word_count (computed)
  • Computation: feed_word_count from item.content, scraped_word_count from item_content.scraped_content
  • Both computed in before_save hook

7. Analytics Patterns

File: lib/source_monitor/analytics/sources_index_metrics.rb

  • Takes base_scope, result_scope, search_params
  • Computes distribution via SourceFetchIntervalDistribution
  • Results cached via private attributes
  • Used in controller index action to populate display data

8. Existing Bulk Action Patterns

File: app/controllers/source_monitor/source_bulk_scrapes_controller.rb

  • Single source bulk scrape: POST /sources/:source_id/bulk_scrape
  • Params: { bulk_scrape: { selection: :current/:unscraped/:all } }
  • Responds with Turbo Stream, uses SourceTurboResponses mixin

9. Routes

File: config/routes.rb

  • Sources resources with nested resource :bulk_scrape, only: :create
  • Dashboard: get "/dashboard" -> dashboard#index

10. Test Patterns

  • Uses create_source! factory, mocks services, tests Turbo Stream responses
  • Asserts turbo-stream tags with dom_id(source, :row)

Relevant Patterns

  1. Configuration: Add scrape_recommendation_threshold to ScrapingSettings with DEFAULT constant + reset
  2. Dashboard widget: Follow existing card pattern (header + divider + content)
  3. Word count filtering: Leverage existing Ransack ransackers (avg_feed_words)
  4. Bulk enablement: Follow bulk scrape pattern with Turbo Stream responder
  5. Presenter pattern: Source row uses locals for computed data
  6. Analytics class: Follow SourcesIndexMetrics pattern for recommendation computation

Risks

  1. Word count accuracy: avg_feed_words Ransacker joins ItemContent — need all items to have content records
  2. Bulk action feedback: BulkSourceScraper can rate-limit — UI must handle partial enqueuing
  3. Modal confirmation UX: Needs Stimulus controller for toggle
  4. Test-first scrape page: New route/view — must follow CRUD-everything pattern

Recommendations

  1. Add to ScrapingSettings with default 200 words
  2. Add Source.scrape_candidates scope using avg_feed_words threshold
  3. Dashboard widget: new card between stats and recent activity
  4. Index badge: warning indicator on rows where avg_feed_words < threshold
  5. Bulk selection: Stimulus checkboxes + new controller action
  6. Test-first: New route POST /sources/:id/test_scrape -> comparison page
  7. Create SourceMonitor::Analytics::ScrapeRecommendations for candidate computation
  8. Confirmation modal via Stimulus controller