.vbw-planning/milestones/03-coverage-analysis-quick-wins-critical-path-test-co/phases/04-code-quality-conventions-cleanup/PLAN-02.md
Extract lib/source_monitor/items/item_creator.rb (601 lines, 50+ methods) into focused sub-modules following the exact same extraction pattern used by FeedFetcher in Phase 3 (sub-module directory with require from main file). The public API (ItemCreator.call(source:, entry:) returning a Result struct) must remain unchanged. All existing ItemCreator tests must continue to pass without modification.
Cluster 1: Core attribute building (build_attributes, ~90 lines)
The build_attributes method (lines 233-271) assembles all item attributes by calling field extraction methods. This is the main orchestration method and should stay in the main file.
Cluster 2: Field extraction from feed entries (~300 lines) Methods that extract specific fields from Feedjira entry objects:
extract_guid (lines 273-287)extract_url (lines 288-311)extract_summary (lines 312-317)extract_content (lines 318-327)extract_timestamp (lines 328-337)extract_updated_timestamp (lines 338-343)extract_author (lines 344-347)extract_authors (lines 348-384)extract_categories (lines 385-394)extract_tags (lines 395-408)extract_keywords (lines 409-415)extract_enclosures (lines 416-467)extract_media_thumbnail_url (lines 468-476)extract_media_content (lines 477-500)extract_language (lines 501-512)extract_copyright (lines 513-524)extract_comments_url (lines 525-528)extract_comments_count (lines 529-535)extract_metadata (lines 536-544)
Plus utility methods: generate_fingerprint, string_or_nil, sanitize_string_array, split_keywords, safe_integer, json_entry?, atom_entry?, normalize_metadata (lines 545-601)Cluster 3: Feed content processing (~75 lines) Methods for processing raw feed content through readability:
process_feed_content (lines 137-158)should_process_feed_content? (lines 160-165)feed_content_parser_class (lines 167-170)wrap_content_for_readability (lines 171-186)default_feed_readability_options (lines 187-193)build_feed_content_metadata (lines 194-209)html_fragment? (lines 210-213)deep_copy (lines 214-231)What stays in the main file (~200 lines):
self.call, call methodexisting_item_for, find_item_by_guid, find_item_by_fingerprintinstrument_duplicate, update_existing_item, create_new_itemhandle_concurrent_duplicate, find_conflicting_item, apply_attributesbuild_attributes (calls into extracted modules)@lib/source_monitor/fetching/feed_fetcher.rb -- 285 lines. The extraction pattern to follow: main file requires sub-modules, uses lazy accessors (e.g., def source_updater; @source_updater ||= SourceUpdater.new(...); end), delegates method calls.
@lib/source_monitor/fetching/feed_fetcher/source_updater.rb -- Example sub-module: namespaced under FeedFetcher, constructor receives dependencies.
@lib/source_monitor/fetching/feed_fetcher/entry_processor.rb -- Another example sub-module.
@test/lib/source_monitor/items/item_creator_test.rb -- Existing tests. Must pass without modification.
</context>
name: extract-entry-parser
files:
lib/source_monitor/items/item_creator/entry_parser.rb (new)lib/source_monitor/items/item_creator.rbaction: Create lib/source_monitor/items/item_creator/entry_parser.rb containing a SourceMonitor::Items::ItemCreator::EntryParser class. Move these methods from item_creator.rb into the new class:
extract_guid -- entry GUID extraction with JSON/Atom fallbacksextract_url -- URL extraction with canonical/alternate link resolutionextract_summary -- summary text extractionextract_content -- content extraction from multiple methodsextract_timestamp -- published_at extractionextract_updated_timestamp -- updated_at extractionextract_author -- single author extractionextract_authors -- multi-author extraction with JSON parsingextract_categories -- category extractionextract_tags -- tag extractionextract_keywords -- keyword extraction with separator splittingextract_enclosures -- enclosure/attachment extractionextract_media_thumbnail_url -- media thumbnail extractionextract_media_content -- media content metadata extractionextract_language -- language detectionextract_copyright -- copyright extractionextract_comments_url -- comments link extractionextract_comments_count -- comments count extractionextract_metadata -- raw metadata extractiongenerate_fingerprint -- content fingerprint generationstring_or_nil, sanitize_string_array, split_keywords, safe_integer, json_entry?, atom_entry?, normalize_metadataThe EntryParser constructor takes source: and entry: (same as ItemCreator). It exposes a single public method parse that returns a hash of all extracted attributes (what build_attributes currently assembles). Add require_relative "item_creator/entry_parser" at the top of item_creator.rb. In ItemCreator, create an entry_parser lazy accessor and delegate the field extraction to it.
verify: ruby -c lib/source_monitor/items/item_creator/entry_parser.rb exits 0 AND bin/rails test test/lib/source_monitor/items/item_creator_test.rb exits 0 with zero failures
done: EntryParser extracted with all field extraction methods. Tests pass unchanged.
name: extract-content-extractor
files:
lib/source_monitor/items/item_creator/content_extractor.rb (new)lib/source_monitor/items/item_creator.rbaction: Create lib/source_monitor/items/item_creator/content_extractor.rb containing a SourceMonitor::Items::ItemCreator::ContentExtractor class. Move these methods:
process_feed_content -- orchestrates content processing through readabilityshould_process_feed_content? -- determines if content should be processedfeed_content_parser_class -- resolves the parser classwrap_content_for_readability -- wraps raw content with HTML structure for parsingdefault_feed_readability_options -- default options for readabilitybuild_feed_content_metadata -- builds metadata about processing resultshtml_fragment? -- checks if content is HTMLdeep_copy -- deep copies complex valuesThe ContentExtractor constructor takes source:. It exposes process_feed_content(raw_content, title:) as the primary public method. Add require_relative "item_creator/content_extractor" at the top of item_creator.rb. In ItemCreator, create a content_extractor lazy accessor. The EntryParser from Task 1 should call content_extractor.process_feed_content(...) instead of the local method -- wire this through the constructor or pass as a dependency.
verify: ruby -c lib/source_monitor/items/item_creator/content_extractor.rb exits 0 AND bin/rails test test/lib/source_monitor/items/item_creator_test.rb exits 0
done: ContentExtractor extracted. Feed content processing isolated. Tests pass unchanged.
name: slim-item-creator-and-wire
files:
lib/source_monitor/items/item_creator.rbaction: After Tasks 1-2, the main item_creator.rb should contain:
self.callcall method (find or create)existing_item_for, find_item_by_guid, find_item_by_fingerprintinstrument_duplicate, update_existing_item, create_new_itemhandle_concurrent_duplicate, find_conflicting_item, apply_attributesbuild_attributes (now delegates to entry_parser.parse)Clean up any dead code, orphaned requires, or duplicated constants. Ensure the main file is under 300 lines. Run RuboCop on all modified/new files.
verify: wc -l lib/source_monitor/items/item_creator.rb shows fewer than 300 lines AND bin/rubocop lib/source_monitor/items/item_creator.rb lib/source_monitor/items/item_creator/ exits 0 AND bin/rails test test/lib/source_monitor/items/item_creator_test.rb exits 0
done: ItemCreator main file under 300 lines. All sub-modules wired. RuboCop clean.
ItemCreator.call(source:, entry:) returning Result struct) works identically to before the extraction. Verify by inspecting any tests that use ItemCreator in other test files (e.g., feed_fetcher_test.rb, import_opml_job tests) to confirm they still pass.bin/rails test exits 0 with 760+ runs and 0 failures AND bin/rubocop -f simple shows no offenses detectedwc -l lib/source_monitor/items/item_creator.rb shows fewer than 300 lineswc -l lib/source_monitor/items/item_creator/entry_parser.rb lib/source_monitor/items/item_creator/content_extractor.rb shows both existbin/rails test test/lib/source_monitor/items/item_creator_test.rb exits 0 with zero failuresbin/rails test exits 0 with 760+ runs and 0 failuresbin/rubocop lib/source_monitor/items/ exits 0