-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document Full Refresh Overwrite Deduped #43413
Conversation
The latest updates on your projects. Learn more about Vercel for Git βοΈ
|
|
||
## Destination-specific mechanism for full refresh | ||
|
||
The mechanism by which a destination connector acomplishes the full refresh will vary wildly from destination to destinaton. For our certified database and data warehouse destinations, we will be recreating the final table each sync. This allows us leave the previous sync's data viewable by writing to a "final-table-tmp" location as the sync is running, and at the end dropping the olf "final" table, and renaming the new one into place. That said, this may not possible for all destinations, and we may need to erase the existing data at the start of each full-refresh sync. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mechanism by which a destination connector acomplishes the full refresh will vary wildly from destination to destinaton. For our certified database and data warehouse destinations, we will be recreating the final table each sync. This allows us leave the previous sync's data viewable by writing to a "final-table-tmp" location as the sync is running, and at the end dropping the olf "final" table, and renaming the new one into place. That said, this may not possible for all destinations, and we may need to erase the existing data at the start of each full-refresh sync. | |
The mechanism by which a destination connector acomplishes the full refresh will vary wildly from destination to destinaton. For our certified database and data warehouse destinations, we will be recreating the final table each sync. This allows us leave the previous sync's data viewable by writing to a "final-table-tmp" location as the sync is running, and at the end dropping the old "final" table, and renaming the new one into place. That said, this may not possible for all destinations, and we may need to erase the existing data at the start of each full-refresh sync. |
The **Full Refresh** modes are the simplest methods that Airbyte uses to sync data, as they always retrieve all available information requested from the source, regardless of whether it has been synced before. This contrasts with [**Incremental sync**](./incremental-append.md), which does not sync data that has already been synced before. | ||
|
||
In the **Overwrite + Deduped** variant, new syncs will replace all data in the existing destination table and then pull the new data in. Therefore, data that has been removed from the source after an old sync will be deleted in the destination table. The **deduped** variant also means that the data in the final table will be unique per primary key \(unlike [Full Refresh Overwrite](./full-refresh-overwrite.md)\). This is determined by sorting the data using the cursor field and keeping only the latest de-duplicated data row. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a quick sentence here about "As Full Refresh syncs are Resumable as much as possible, we recommend Full Refresh Overwrite Dedup if avoiding duplicates is a strong requirement. Bear in mind this incurs additional warehouse costs.', and linking the Resumable doc page in there.
Something like this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestions. Feel free to merge after!
What
Document Full Refresh Overwrite Deduped
Review guide
User Impact
Can this PR be safely reverted and rolled back?