docs/features/deduplication.mdx
Reworkd automatically handles deduplicating data whenever your scrapers re-run.
When saving data, Reworkd uses a unique key (or composite key) based on the record's fields to determine if the data is new or if it is a duplicate of data that has already been saved.
| Scenario | Action Taken by Reworkd |
|---|---|
| New row of data saved | Inserts data and marks as a CREATE change. |
| Duplicate row of data saved | Skips insertion; no duplicate is created. |
| Updating data that has been seen before (existing key) | Updates existing record without duplication and marks as an UPDATE change |
When you are creating your schema, you must also select which of the fields you want to use as part of your primary/deduplication key. This deduplication key is critical to ensure you avoid duplicated data. It must:
If there is no one obvious key field, use multiple attributes to create a reliable composite key.