.ai/principles/distilled/database-queries.md
Prerequisite: If you haven't already, also read .ai/principles/distilled/database-fundamentals.md - it contains foundational rules that apply to all database work.
add_concurrent_index, add_concurrent_foreign_key, validate constraint) complete within 5min for migrations and 20min for post-migrationsEXPLAIN(analyze, buffers) and document results in the MR description for sequential scans on large tables, nested loops with large datasets, missing or inefficient index usage, high-cost operations, unexpected sort operationsmatches instead of raw LIKE/ILIKE SQL fragments to ensure correct case-insensitive behavior on PostgreSQLLIKE/ILIKE with a leading wildcard (for example, ILIKE '%value') without a trigram GIN indexindex_TABLE_on_COLUMN_trigramdisable_ddl_transaction! in migrationsSELECT statements when using JOINs to avoid ambiguous column errors during deploymentpluck to load IDs into memory for use as arguments in another query; use subqueries instead. Exception: when using CTEs with update_all, first pluck IDs from the CTE result and scope the update to those IDs (the CTE is dropped otherwise)pluck only within model code, or when values are needed in Ruby or cached for multiple related queriespluck results to MAX_PLUCK (1,000) records when pluck is necessaryWHERE EXISTS instead of WHERE IN wherever possible.exists?, .count, pagination) for query plan flip issues when using complex scopes with IN subqueriesGitlab::SQL::CTE) to stabilize query plans when .exists? causes plan flips, but only as a last resortUPDATE or DELETE — the CTE is dropped and the operation affects the entire table. Exception: when using CTEs with update_all, first pluck IDs from the CTE result and scope the update to those IDsiterating_tables_in_batches patternsORDER BY id over ORDER BY created_at unless accurate creation-date ordering is requiredORDER BY created_at, id (with appropriate composite index) when accurate creation-date ordering is required, especially in Cells architectureUNION (via Gitlab::SQL::Union or FromUnion) instead of complex JOINs when combining result sets from multiple queriesSELECT column_names with SELECT * in UNION sub-queries; use consistent column selection across all sub-queriesUser.cached_column_list for explicit column lists in UNION queries to avoid stale schema cache issuesApplicationRecord or Ci::ApplicationRecord, not ActiveRecord::Base; use MigrationRecord only in migration context.find_or_create_by or .first_or_create — they are not atomic; use ApplicationRecord.safe_find_or_create_by or .upsert instead.safe_find_or_create_by only in isolated code not wrapped in an existing transaction (subtransactions carry risk).upsert with unique_by when the common path is record creation and duplicate avoidance is only needed on edge casesModel.transaction when all records in the block belong to the same database table/connection; use ApplicationRecord.transaction (not ActiveRecord::Base.transaction) only when the model is not known or records span multiple modelsApplicationRecord.transaction for models on a different database (for example, Ci::* models on CiDatabase) — statements will not be rolled backsleep(n) inside a transaction blockhas_one relationship to a new table instead of adding columns to an oversized main table when the new data applies to only a subset of rowseach_batch (via EachBatch module) instead of Rails in_batches for iterating large tableseach_batch on non-unique columns without calling distinct first — it may cause infinite loopsdistinct_each_batch for iterating over non-unique columns using the loose-index scan techniquedistinct_each_batch than with standard each_batch due to recursive CTE overheadGitlab::Metrics::RuntimeLimiter) and implement a "continue later" mechanism by scheduling a new job with a cursor when the limit is reachedsleep 0.01) between batches when large volumes of data are modified to reduce primary database pressurelog_extra_metadata_on_done for traceability in KibanaGitlab::Pagination::Keyset::Iterator) when iterating by timestamp columns or composite primary keys where EachBatch cannot be usedeach_batch_count instead of summing relation.count inside each_batch for counting large tableseach_batch scope — move filters to the yielded relation inside the blockGitlab::SQL::CTE) on the yielded relation when complex conditions cause unstable query plans inside each_batchBulkInsertSafe / bulk_insert! for inserting large arrays of ActiveRecord objects in bulkBulkInsertableAssociations with with_bulk_insert to bulk-insert has_many associationsid) in all ORDER BY clauses to ensure stable sort orderORDER BY clauseORDER BY columns from different tables — denormalize if necessary to keep sort columns in one tableproject_id or sort columns to join tables) only when query performance cannot be achieved otherwise, and document the trade-offsFor the full picture, see: