.vbw-planning/milestones/generator-enhancements/phases/05-active-storage-images/PLAN-02.md
@app/models/source_monitor/item_content.rb -- (from Plan 01) Has has_many_attached :images. The job attaches downloaded images here via item_content.images.attach(blob).
@lib/source_monitor/configuration/images_settings.rb -- (from Plan 01) Provides download_enabled?, max_download_size, download_timeout, allowed_content_types.
@lib/source_monitor/fetching/feed_fetcher/entry_processor.rb -- The integration point. After ItemCreator.call returns a created item, if images download is enabled and the item has HTML content in item.content, enqueue DownloadContentImagesJob.perform_later(item.id). Only for newly created items (not updates).
@lib/source_monitor/http.rb -- Faraday client factory. The Downloader creates its own Faraday connection: no retry (images are best-effort), short timeout from config, Accept header for images.
@app/jobs/source_monitor/application_job.rb -- Base job class. DownloadContentImagesJob inherits from this. Uses source_monitor_queue :fetch (reuse fetch queue since image downloads are I/O-bound like fetches).
@app/jobs/source_monitor/fetch_feed_job.rb -- Pattern to follow for job structure: discard_on for deserialization errors, simple perform that delegates to service objects.
@test/test_helper.rb -- WebMock disables external HTTP. Image download tests need WebMock stubs. Use stub_request(:get, url).to_return(body: png_bytes, headers: { "Content-Type" => "image/png" }).
@.claude/skills/sm-configure/SKILL.md -- Needs a new section for config.images with examples.
@app/models/source_monitor/item.rb -- The item model. item.content is a text column on sourcemon_items storing the feed entry content (HTML). This is where inline images live. The job reads item.content, rewrites it, and saves it back. item_content (separate table) stores scraped_html/scraped_content from scraping -- that happens later and is separate from feed content.
Key design decisions:
item_id (not item_content_id). The feed content with inline images is in item.content. The job reads item.content, downloads images, attaches blobs to item_content.images (building item_content if needed), and writes the rewritten HTML back to item.content.{io:, filename:, content_type:} or nil on failure.item_content.images.attached?, it skips re-downloading.A service object that downloads a single image from a URL, validates it, and returns the result.
# frozen_string_literal: true
require "faraday"
require "securerandom"
module SourceMonitor
module Images
class Downloader
Result = Struct.new(:io, :filename, :content_type, :byte_size, keyword_init: true)
attr_reader :url, :settings
def initialize(url, settings: nil)
@url = url
@settings = settings || SourceMonitor.config.images
end
# Downloads the image and returns a Result, or nil if download fails
# or the image does not meet validation criteria.
def call
response = fetch_image
return unless response
content_type = response.headers["content-type"]&.split(";")&.first&.strip&.downcase
return unless allowed_content_type?(content_type)
body = response.body
return unless body && body.bytesize > 0
return if body.bytesize > settings.max_download_size
filename = derive_filename(url, content_type)
Result.new(
io: StringIO.new(body),
filename: filename,
content_type: content_type,
byte_size: body.bytesize
)
rescue Faraday::Error, URI::InvalidURIError, Timeout::Error => _error
nil
end
private
def fetch_image
connection = Faraday.new do |f|
f.options.timeout = settings.download_timeout
f.options.open_timeout = [settings.download_timeout / 2, 5].min
f.headers["User-Agent"] = SourceMonitor.config.http.user_agent || "SourceMonitor/#{SourceMonitor::VERSION}"
f.headers["Accept"] = "image/*"
f.adapter Faraday.default_adapter
end
response = connection.get(url)
return response if response.status == 200
nil
end
def allowed_content_type?(content_type)
return false if content_type.blank?
settings.allowed_content_types.include?(content_type)
end
def derive_filename(image_url, content_type)
uri = URI.parse(image_url)
basename = File.basename(uri.path) if uri.path.present?
if basename.present? && basename.include?(".")
basename
else
ext = Rack::Mime::MIME_TYPES.invert[content_type] || ".bin"
"image-#{SecureRandom.hex(8)}#{ext}"
end
rescue URI::InvalidURIError
ext = Rack::Mime::MIME_TYPES.invert[content_type] || ".bin"
"image-#{SecureRandom.hex(8)}#{ext}"
end
end
end
end
Update lib/source_monitor.rb:
Add autoload :Downloader, "source_monitor/images/downloader" inside the module Images block (added in Plan 01).
Create test/lib/source_monitor/images/downloader_test.rb:
Use WebMock stubs for all HTTP interactions. Tests:
to_timeout)The job takes item_id, reads item.content for inline images, downloads them, attaches to item_content.images, and rewrites item.content with Active Storage URLs.
# frozen_string_literal: true
module SourceMonitor
class DownloadContentImagesJob < ApplicationJob
source_monitor_queue :fetch
discard_on ActiveJob::DeserializationError
def perform(item_id)
item = SourceMonitor::Item.find_by(id: item_id)
return unless item
return unless SourceMonitor.config.images.download_enabled?
html = item.content
return if html.blank?
# Build or find item_content for attachment storage
item_content = item.item_content || item.build_item_content
# Skip if images already attached (idempotency)
return if item_content.persisted? && item_content.images.attached?
base_url = item.url
rewriter = SourceMonitor::Images::ContentRewriter.new(html, base_url: base_url)
image_urls = rewriter.image_urls
return if image_urls.empty?
# Save item_content first so we can attach blobs to it
item_content.save! unless item_content.persisted?
# Download images and build URL mapping
url_mapping = download_images(item_content, image_urls)
return if url_mapping.empty?
# Rewrite HTML with Active Storage URLs
rewritten_html = rewriter.rewrite do |original_url|
url_mapping[original_url]
end
# Update the item content with rewritten HTML
item.update!(content: rewritten_html)
end
private
def download_images(item_content, image_urls)
url_mapping = {}
settings = SourceMonitor.config.images
image_urls.each do |image_url|
result = SourceMonitor::Images::Downloader.new(image_url, settings: settings).call
next unless result
blob = ActiveStorage::Blob.create_and_upload!(
io: result.io,
filename: result.filename,
content_type: result.content_type
)
item_content.images.attach(blob)
# Generate a serving URL for the blob
url_mapping[image_url] = Rails.application.routes.url_helpers.rails_blob_path(blob, only_path: true)
rescue StandardError => _error
# Individual image failure should not block others.
# Original URL will be preserved (graceful fallback).
next
end
url_mapping
end
end
end
Create test/jobs/source_monitor/download_content_images_job_test.rb:
Tests using WebMock stubs and Active Storage test helpers:
For each test:
SourceMonitor.configure { |c| c.images.download_to_active_storage = true } where neededcontent: '<p></p>'DownloadContentImagesJob.perform_now(item.id)item.reload.content for rewritten URLs and item.item_content.images.count
</action>
<verify>
Run PARALLEL_WORKERS=1 bin/rails test test/jobs/source_monitor/download_content_images_job_test.rb and confirm all tests pass. Run bin/rubocop app/jobs/source_monitor/download_content_images_job.rb and confirm no offenses.
</verify>
<done>
DownloadContentImagesJob created. Takes item_id, reads item.content, downloads images via Downloader, attaches to item_content via Active Storage, rewrites HTML with blob paths. Idempotent, graceful failure handling. Tests cover all scenarios.
</done>
</task>
Add an integration hook after item creation. In the process_feed_entries method, after the SourceMonitor::Events.after_item_created call (line 40), add:
enqueue_image_download(result.item)
This is inside the if result.created? block, so it only fires for new items.
Add a private method:
def enqueue_image_download(item)
return unless SourceMonitor.config.images.download_enabled?
return if item.content.blank?
SourceMonitor::DownloadContentImagesJob.perform_later(item.id)
rescue StandardError => error
# Image download enqueue failure must never break feed processing
if defined?(Rails) && Rails.respond_to?(:logger) && Rails.logger
Rails.logger.error("[SourceMonitor] Failed to enqueue image download for item #{item.id}: #{error.message}")
end
end
Create/update entry processor test:
Check if test/lib/source_monitor/fetching/feed_fetcher/entry_processor_test.rb exists. If not, create it with a proper test class. Add tests:
Use assert_enqueued_with(job: SourceMonitor::DownloadContentImagesJob, args: [item.id]) and assert_no_enqueued_jobs(only: SourceMonitor::DownloadContentImagesJob).
Test setup needs a source, a mock feed with entries (use Feedjira or a mock object), and configure images download as needed per test.
Update .claude/skills/sm-configure/SKILL.md:
| Images | \config.images` | `ImagesSettings` |` to the Configuration Sections table.### Image Downloads (Active Storage)
```ruby
config.images.download_to_active_storage = true
config.images.max_download_size = 5 * 1024 * 1024 # 5 MB
config.images.download_timeout = 15
config.images.allowed_content_types = %w[image/jpeg image/png image/webp]
| \lib/source_monitor/configuration/images_settings.rb` | Image download settings |` to the Key Source Files table.Update .claude/skills/sm-configure/reference/configuration-reference.md:
Add a complete "Images Settings" section documenting all ImagesSettings options:
| Setting | Type | Default | Description |
|---|---|---|---|
| download_to_active_storage | Boolean | false | Enable background image downloading |
| max_download_size | Integer | 10485760 (10 MB) | Maximum image file size in bytes |
| download_timeout | Integer | 30 | HTTP timeout for image downloads in seconds |
| allowed_content_types | Array | ["image/jpeg", "image/png", "image/gif", "image/webp", "image/svg+xml"] | Permitted MIME types |
Include a usage example and note about Active Storage prerequisites (host app must have Active Storage installed).
</action>
<verify>
Run PARALLEL_WORKERS=1 bin/rails test test/lib/source_monitor/fetching/feed_fetcher/entry_processor_test.rb and confirm all tests pass. Run bin/rubocop lib/source_monitor/fetching/feed_fetcher/entry_processor.rb and confirm no offenses. Verify grep -n 'config.images' .claude/skills/sm-configure/SKILL.md returns matches.
</verify>
<done>
Integration hook wired in entry_processor. DownloadContentImagesJob enqueued with item.id for newly created items with HTML content when config enabled. Entry processor tests verify all scenarios. sm-configure skill and reference updated with config.images documentation.
</done>
</task>
<task type="auto">
<name>full-plan-02-verification</name>
<files>
lib/source_monitor/images/downloader.rb
app/jobs/source_monitor/download_content_images_job.rb
lib/source_monitor/fetching/feed_fetcher/entry_processor.rb
.claude/skills/sm-configure/SKILL.md
</files>
<action>
Run the full test suite and linting to confirm no regressions:
PARALLEL_WORKERS=1 bin/rails test test/lib/source_monitor/images/downloader_test.rb test/jobs/source_monitor/download_content_images_job_test.rb test/lib/source_monitor/fetching/feed_fetcher/entry_processor_test.rb -- all new tests passbin/rails test -- full suite passes with 874+ runs and 0 failuresbin/rubocop -- zero offensesbin/brakeman --no-pager -- zero warningsconfig.images.download_to_active_storage = true enables image downloadsconfig.images sectionIf any test failures, RuboCop offenses, or Brakeman warnings are found, fix them before completing.
</action>
<verify>
bin/rails test exits 0 with 874+ runs, 0 failures. bin/rubocop exits 0 with 0 offenses. bin/brakeman --no-pager exits 0 with 0 warnings. grep -n 'config.images' .claude/skills/sm-configure/SKILL.md returns matches.
</verify>
<done>
Plan 02 complete. Full image download pipeline is operational: config enables feature, entry processor enqueues job for new items with content, job downloads images via Downloader, attaches to item_content via Active Storage, rewrites item.content with blob URLs. Graceful fallback on all failure modes. Documentation updated. Full test suite passes.
</done>
</task>
</tasks>
<verification>
PARALLEL_WORKERS=1 bin/rails test test/lib/source_monitor/images/downloader_test.rb -- all tests passPARALLEL_WORKERS=1 bin/rails test test/jobs/source_monitor/download_content_images_job_test.rb -- all tests passPARALLEL_WORKERS=1 bin/rails test test/lib/source_monitor/fetching/feed_fetcher/entry_processor_test.rb -- all tests passbin/rails test -- 874+ runs, 0 failuresbin/rubocop -- 0 offensesbin/brakeman --no-pager -- 0 warningsgrep -n 'class Downloader' lib/source_monitor/images/downloader.rb returns a matchgrep -n 'class DownloadContentImagesJob' app/jobs/source_monitor/download_content_images_job.rb returns a matchgrep -n 'enqueue_image_download' lib/source_monitor/fetching/feed_fetcher/entry_processor.rb returns a matchgrep -n 'config.images' .claude/skills/sm-configure/SKILL.md returns matchesgrep -n 'ImagesSettings' .claude/skills/sm-configure/reference/configuration-reference.md returns matches
</verification>
<success_criteria>
.vbw-planning/phases/05-active-storage-images/PLAN-02-SUMMARY.md </output>