Back to Source Monitor

Test-First Scrape Comparison Page

.vbw-planning/milestones/ui-fixes-and-smart-scraping/phases/04-smart-scrape-recommendations/04-PLAN-04.md

0.13.06.5 KB
Original Source

Plan 04: Test-First Scrape Comparison Page

Overview

Build a "test scrape" feature that picks one recent item from a source, scrapes it on-demand, and displays a comparison page showing feed word count vs scraped word count. This lets users validate whether enabling scraping would improve content quality before committing.


Task 1: Create ScrapeTestsController

Files:

  • app/controllers/source_monitor/source_scrape_tests_controller.rb

Description: Create a new controller following CRUD-everything pattern. This is a singular nested resource under sources (one test at a time):

ruby
module SourceMonitor
  class SourceScrapeTestsController < ApplicationController
    before_action :set_source

    def create
      item = pick_test_item
      unless item
        handle_no_item
        return
      end

      result = SourceMonitor::Scraping::ItemScraper.new(item: item, source: @source).call

      @test_result = {
        item: item.reload,
        scrape_result: result,
        feed_word_count: item.item_content&.feed_word_count,
        scraped_word_count: item.item_content&.scraped_word_count,
        feed_content_preview: item.content.to_s.truncate(500),
        scraped_content_preview: item.item_content&.scraped_content.to_s.truncate(500),
        improvement: compute_improvement(item)
      }

      respond_to do |format|
        format.turbo_stream do
          render turbo_stream: turbo_stream.replace(
            "scrape_test_result_#{@source.id}",
            partial: "source_monitor/source_scrape_tests/result",
            locals: { source: @source, test_result: @test_result }
          )
        end
        format.html { render :show }
      end
    end

    private

    def set_source
      @source = Source.find(params[:source_id])
    end

    def pick_test_item
      @source.items
             .joins(:item_content)
             .where.not(sourcemon_item_contents: { feed_word_count: nil })
             .order(published_at: :desc)
             .first
    end

    def handle_no_item
      respond_to do |format|
        format.turbo_stream do
          responder = SourceMonitor::TurboStreams::StreamResponder.new
          responder.toast(message: "No items with feed content available for test scrape.", level: :warning)
          render turbo_stream: responder.render(view_context)
        end
        format.html do
          redirect_to source_monitor.source_path(@source), alert: "No items available for test scrape."
        end
      end
    end

    def compute_improvement(item)
      feed = item.item_content&.feed_word_count.to_i
      scraped = item.item_content&.scraped_word_count.to_i
      return 0 if feed.zero?
      ((scraped - feed).to_f / feed * 100).round(1)
    end
  end
end

Tests:

  • test/controllers/source_monitor/source_scrape_tests_controller_test.rb:
    • POST create with source that has items: returns success, assigns test_result
    • POST create with source that has no items: redirects with alert
    • Turbo stream format returns turbo_stream response

Task 2: Add route for scrape tests

Files:

  • config/routes.rb

Description: Add singular nested resource under sources (CRUD pattern):

ruby
resources :sources do
  # ... existing nested resources ...
  resource :scrape_test, only: :create, controller: "source_scrape_tests"
end

Tests:

  • Route test: assert POST /sources/:source_id/scrape_test routes to source_scrape_tests#create

Task 3: Create scrape test result partial

Files:

  • app/views/source_monitor/source_scrape_tests/_result.html.erb

Description: Create a partial that displays the comparison between feed and scraped content:

erb
<div id="scrape_test_result_<%= source.id %>" class="rounded-lg border border-slate-200 bg-white shadow-sm">
  <div class="border-b border-slate-200 px-5 py-4">
    <h3 class="text-lg font-medium">Scrape Test Result</h3>
    <p class="mt-1 text-xs text-slate-500">
      Tested item: "<%= test_result[:item].title.to_s.truncate(60) %>"
    </p>
  </div>
  <div class="px-5 py-4">
    <div class="grid grid-cols-2 gap-6">
      <div>
        <dt class="text-xs font-medium uppercase tracking-wide text-slate-500">Feed Word Count</dt>
        <dd class="mt-1 text-2xl font-semibold text-slate-900"><%= test_result[:feed_word_count] || "N/A" %></dd>
      </div>
      <div>
        <dt class="text-xs font-medium uppercase tracking-wide text-slate-500">Scraped Word Count</dt>
        <dd class="mt-1 text-2xl font-semibold text-slate-900"><%= test_result[:scraped_word_count] || "N/A" %></dd>
      </div>
    </div>

    <% if test_result[:improvement] && test_result[:improvement] != 0 %>
      <div class="mt-4">
        <% color = test_result[:improvement] > 0 ? "text-green-600" : "text-amber-600" %>
        <span class="text-sm font-medium <%= color %>">
          <%= test_result[:improvement] > 0 ? "+" : "" %><%= test_result[:improvement] %>% word count change
        </span>
      </div>
    <% end %>

    <% if test_result[:scrape_result]&.success? %>
      <div class="mt-4 rounded-md bg-green-50 px-3 py-2 text-sm text-green-700">
        Scrape successful. Enabling scraping for this source would capture more content.
      </div>
    <% else %>
      <div class="mt-4 rounded-md bg-amber-50 px-3 py-2 text-sm text-amber-700">
        Scrape had issues: <%= test_result[:scrape_result]&.message || "Unknown error" %>
      </div>
    <% end %>
  </div>
</div>

Tests:

  • Covered by controller integration tests

Task 4: Add test scrape button to source show page

Files:

  • app/views/source_monitor/sources/_details.html.erb OR app/views/source_monitor/sources/show.html.erb

Description: Add a "Test Scrape" button on the source show page, visible only when the source has scraping disabled. The button triggers POST /sources/:id/scrape_test via Turbo:

erb
<% unless source.scraping_enabled? %>
  <div id="scrape_test_result_<%= source.id %>" class="mt-4">
    <%= button_to "Test Scrape",
          source_monitor.source_scrape_test_path(source),
          method: :post,
          class: "inline-flex items-center rounded-md border border-violet-200 bg-violet-50 px-3 py-1.5 text-sm font-medium text-violet-700 hover:bg-violet-100",
          data: { turbo_stream: true },
          title: "Scrape a recent item to compare feed vs scraped word count" %>
  </div>
<% end %>

The Turbo Stream response from the controller replaces this div with the result partial.

Tests:

  • System test: assert "Test Scrape" button visible on source show page when scraping is disabled