.vbw-planning/milestones/ui-fixes-and-smart-scraping/phases/04-smart-scrape-recommendations/04-PLAN-01.md
Add the backend foundation for smart scrape recommendations: a configurable word count threshold, a model scope to find candidate sources, and a query object that computes recommendation data for the dashboard and sources index.
Files:
lib/source_monitor/configuration/scraping_settings.rbDescription:
Add scrape_recommendation_threshold attribute to ScrapingSettings following the existing pattern:
DEFAULT_SCRAPE_RECOMMENDATION_THRESHOLD = 200 constantattr_accessor :scrape_recommendation_thresholdreset! methodnormalize_numeric since it's an integer word count)Tests:
test/lib/source_monitor/configuration_test.rb -- Add tests for:
config.scraping.scrape_recommendation_threshold = 150Files:
app/models/source_monitor/source.rbDescription:
Add a class method scrape_candidates that returns active sources where:
Use the existing avg_feed_words ransacker SQL pattern as reference for the subquery. Accept an optional threshold parameter that defaults to SourceMonitor.config.scraping.scrape_recommendation_threshold.
def self.scrape_candidates(threshold: SourceMonitor.config.scraping.scrape_recommendation_threshold)
threshold_value = threshold.to_i
return none if threshold_value <= 0
active
.where(scraping_enabled: false)
.where(
"#{table_name}.id IN (
SELECT i.source_id
FROM #{Item.table_name} i
INNER JOIN #{ItemContent.table_name} ic ON ic.item_id = i.id
WHERE ic.feed_word_count IS NOT NULL
GROUP BY i.source_id
HAVING AVG(ic.feed_word_count) < ?
)", threshold_value
)
end
Tests:
test/models/source_monitor/source_test.rb -- Add tests for:
Files:
lib/source_monitor/analytics/scrape_recommendations.rbDescription:
Create a query object following the SourcesIndexMetrics pattern that computes scrape recommendation data:
module SourceMonitor
module Analytics
class ScrapeRecommendations
def initialize(threshold: SourceMonitor.config.scraping.scrape_recommendation_threshold)
@threshold = threshold.to_i
end
def candidates_count
@candidates_count ||= Source.scrape_candidates(threshold: @threshold).count
end
def candidate_ids
@candidate_ids ||= Source.scrape_candidates(threshold: @threshold).pluck(:id)
end
def candidate?(source_id)
candidate_ids.include?(source_id)
end
private
attr_reader :threshold
end
end
end
Add autoload :ScrapeRecommendations to the Analytics module or use require_relative.
Tests:
test/lib/source_monitor/analytics/scrape_recommendations_test.rb -- Add tests for:
candidates_count returns correct countcandidate_ids returns correct IDscandidate? returns true/false correctlyFiles:
lib/source_monitor.rb (add autoload declaration only)Description:
Add autoload entry for SourceMonitor::Analytics::ScrapeRecommendations in the Analytics module section, following the existing autoload pattern used throughout lib/source_monitor.rb.
Tests:
assert_kind_of Class, SourceMonitor::Analytics::ScrapeRecommendations