docs/feed_partial_index_optimization.md
This document analyzes the dramatic performance improvement achieved by using partial indexes with a 45-day cutoff for feed queries.
The existing indexes include ALL articles regardless of age:
index_articles_on_published_score_published_at_for_moderationindex_articles_on_subforem_published_score_published_atThese indexes contain:
Feed queries typically only need:
Partial indexes only include rows that match a WHERE condition, dramatically reducing:
-- BEFORE: Full index (includes all articles ever)
CREATE INDEX index_articles_on_published_score_published_at_for_moderation
ON articles (published, score, published_at);
-- AFTER: Partial index (only recent articles)
CREATE INDEX index_articles_on_published_score_published_at_recent_feed
ON articles (published, score, published_at DESC)
WHERE published = true AND published_at > '2024-11-17 00:00:00';
Assumptions:
Size Reduction:
PostgreSQL Buffer Cache:
Index Scan Performance:
Real-world Impact:
CPU Usage:
I/O Operations:
# Primary feed query index
add_index :articles,
[:published, :score, :published_at],
name: 'index_articles_on_published_score_published_at_recent_feed',
where: "published = true AND published_at > '#{45.days.ago}'",
order: { published_at: :desc }
# Subforem-specific feed index
add_index :articles,
[:subforem_id, :published, :score, :published_at],
name: 'index_articles_on_subforem_published_score_published_at_recent_feed',
where: "published = true AND published_at > '#{45.days.ago}'",
order: { published_at: :desc }
# Featured articles index
add_index :articles,
[:featured, :published, :published_at],
name: 'index_articles_on_featured_published_published_at_recent_feed',
where: "published = true AND published_at > '#{45.days.ago}'",
order: { published_at: :desc }
# Type filtering index
add_index :articles,
[:type_of, :published, :score, :published_at],
name: 'index_articles_on_type_of_published_score_published_at_recent_feed',
where: "published = true AND published_at > '#{45.days.ago}'",
order: { published_at: :desc }
# User filtering index
add_index :articles,
[:user_id, :published, :score, :published_at],
name: 'index_articles_on_user_id_published_score_published_at_recent_feed',
where: "published = true AND published_at > '#{45.days.ago}'",
order: { published_at: :desc }
# Hotness score index
add_index :articles,
[:hotness_score, :published, :published_at],
name: 'index_articles_on_hotness_score_published_published_at_recent_feed',
where: "published = true AND published_at > '#{45.days.ago}'",
order: { hotness_score: :desc, published_at: :desc }
Analysis:
Configurable:
Settings::UserExperience.feed_lookback_daysQuery Execution Time
SELECT query, mean_time, calls
FROM pg_stat_statements
WHERE query LIKE '%articles%published%'
ORDER BY mean_time DESC;
Index Usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE tablename = 'articles' AND indexname LIKE '%recent_feed%';
Index Size
SELECT indexname, pg_size_pretty(pg_relation_size(indexname::regclass)) as size
FROM pg_indexes
WHERE tablename = 'articles' AND indexname LIKE '%recent_feed%';
After deployment, you should see:
Query outside 45-day range
Index maintenance
Migration time
algorithm: :concurrently for zero downtimeIf issues arise:
The partial index optimization provides dramatic performance improvements for feed queries:
This is a high-impact, low-risk optimization that should be implemented immediately for any production system with significant article volume.
The 45-day cutoff is conservative and can be adjusted based on actual usage patterns, but even with this conservative approach, the performance gains are substantial.