-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design: Simplify destination connector interface #434
Comments
We should probably take the same approach as |
We changed the
The developer is responsible for wrapping their |
All connectors done! |
The idea is to simplify the
Destination
interface in the SDK and move complexity related to asynchronous writes to the SDK.We need to write a proper design document around this. This text can be used as a basis for that.
Problem
Right now the destination connector developer has 2 options:
Write
and write records one by one synchronously to the destination.WriteAsync
that caches records andFlush
that writes the batch of cached records to the destination.The complexity of implementing the second option is a lot higher than the first option, since the developer needs to implement a caching mechanism that blocks if it gets too big and keep in mind concurrent access (both functions can be called concurrently). They also need to take care of calling the ack function to signal that a record was successfully written. None of this concerns the developer if they implement the first option.
Additionally, the interface is a bit strange since the developer is expected to implement either
Write
orWriteAsync
, but not both.Solution
We could fix all of these problems by moving the complexity of asynchronous writes to the SDK and let it handle batching. Instead of the functions
Write
,WriteAsync
andFlush
there would be only one function:This function would be expected to write all records to the 3rd party system synchronously and return once all of them are written. The SDK would take care of batching records based on some configuration options and call the
Write
function only once we actually want to write them to the 3rd party system.The benefit is that all connectors get batching capabilities out of the box, we have one implementation for all of them (no reinventing the wheel) and we lower the complexity of implementing a destination connector significantly.
We would have the ability to write records synchronously one by one (i.e. the behavior of the old
Write
function) by setting the batch size to 1, or cache records and write them in batches (i.e. the behavior of the oldWriteAsync
andFlush
functions) by increasing the batch size.The batching behavior would need to be configurable for the end user based on the number of elements (e.g. write once we collect 5 records), memory size (e.g. write once the size of the batch reaches 5 MB) and time limit (e.g. write once records are cached for 5 seconds).
Open questions
The text was updated successfully, but these errors were encountered: