src/data/question-groups/sql-queries/content/detect-date-gaps.md
You will use the LAG() function to detect gaps in a sequence of dates per user. You will compare each date with the previous one and check if the difference is greater than 1.
Let's use a table ClockIns to demonstrate how you detect gaps. The table has two columns, userId and clockInDate, representing the user identification number and the date the user clocked in with an access card into a facility. The table looks like this:
| userId | clockInDate |
|---|---|
| 1 | 2025-01-01 |
| 1 | 2025-01-02 |
| 1 | 2025-01-05 |
| 1 | 2025-01-06 |
| 2 | 2025-01-06 |
| 2 | 2025-01-06 |
| 2 | 2025-01-07 |
| 3 | 2025-01-02 |
| 3 | 2025-01-04 |
| 3 | 2025-01-06 |
| 3 | 2025-01-07 |
To query to find gaps per user looks like this:
WITH clockInGaps AS (
SELECT
userid,
clockInDate,
LAG(clockInDate) OVER (PARTITION BY userId ORDER BY clockInDate) AS previousClockInDate
FROM
clockIns
)
SELECT
userId,
previousClockInDate AS gapStart,
clockInDate AS gapEend,
clockInDate - previousClockInDate - 1 AS gapDays
FROM clockInGaps
WHERE clockInDate - previousClockInDate > 1
ORDER BY userId, gapStart;
The code above starts with creating an expression clockInGaps that queries for each user and their clockInDate and uses the LAG function to get the previous date for each user. Then, the main query filters each row and finds the gaps between the current date and the previous date. The result of the query looks like this:
| userId | gapStart | gapEnd | gapDays |
|---|---|---|---|
| 1 | 2025-01-02 | 2025-01-05 | 2 |
| 2 | 2025-01-07 | 2025-01-10 | 2 |
| 3 | 2025-01-02 | 2025-01-04 | 1 |
| 3 | 2025-01-04 | 2025-01-06 | 1 |