Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design: Simplify destination connector interface #434

Closed
lovromazgon opened this issue Jun 1, 2022 · 3 comments
Closed

Design: Simplify destination connector interface #434

lovromazgon opened this issue Jun 1, 2022 · 3 comments
Assignees

Comments

@lovromazgon
Copy link
Member

The idea is to simplify the Destination interface in the SDK and move complexity related to asynchronous writes to the SDK.

We need to write a proper design document around this. This text can be used as a basis for that.

Problem

Right now the destination connector developer has 2 options:

  • Implement Write and write records one by one synchronously to the destination.
  • Implement WriteAsync that caches records and Flush that writes the batch of cached records to the destination.

The complexity of implementing the second option is a lot higher than the first option, since the developer needs to implement a caching mechanism that blocks if it gets too big and keep in mind concurrent access (both functions can be called concurrently). They also need to take care of calling the ack function to signal that a record was successfully written. None of this concerns the developer if they implement the first option.

Additionally, the interface is a bit strange since the developer is expected to implement either Write or WriteAsync, but not both.

Solution

We could fix all of these problems by moving the complexity of asynchronous writes to the SDK and let it handle batching. Instead of the functions Write, WriteAsync and Flush there would be only one function:

Write(context.Context, []Record) error

This function would be expected to write all records to the 3rd party system synchronously and return once all of them are written. The SDK would take care of batching records based on some configuration options and call the Write function only once we actually want to write them to the 3rd party system.

The benefit is that all connectors get batching capabilities out of the box, we have one implementation for all of them (no reinventing the wheel) and we lower the complexity of implementing a destination connector significantly.

We would have the ability to write records synchronously one by one (i.e. the behavior of the old Write function) by setting the batch size to 1, or cache records and write them in batches (i.e. the behavior of the old WriteAsync and Flush functions) by increasing the batch size.

The batching behavior would need to be configurable for the end user based on the number of elements (e.g. write once we collect 5 records), memory size (e.g. write once the size of the batch reaches 5 MB) and time limit (e.g. write once records are cached for 5 seconds).

batch

Open questions

  • How does the connector return an error that is specific enough for the SDK to know exactly which records to ack/nack?
  • How does the SDK expose parameters to the user (e.g. batch element size, batch memory size, batch timeout)?
  • How can we let the developer choose sane defaults for SDK parameters?
@lovromazgon
Copy link
Member Author

  • How does the connector return an error that is specific enough for the SDK to know exactly which records to ack/nack?

We should probably take the same approach as io.Writer which returns two values - int says how many bytes were written successfully and error containing the error if not all bytes were written.

@lovromazgon
Copy link
Member Author

  • How does the SDK expose parameters to the user (e.g. batch element size, batch memory size, batch timeout)?

We changed the Destination and Source interfaces to also return their parameters, this way we can wrap them with middleware and add additional parameters.

  • How can we let the developer choose sane defaults for SDK parameters?

The developer is responsible for wrapping their Destination/Source with middleware. If they want to change the defaults they are able to configure that in the middleware.

@lovromazgon
Copy link
Member Author

All connectors done!

Repository owner moved this from In Progress to Done in Conduit Main Sep 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

1 participant