-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chore: refactor DataSink traits to avoid duplication #14121
Chore: refactor DataSink traits to avoid duplication #14121
Conversation
add file_extension to FileSinkConfig
add schema() function to DataSink trait make FileSink a subtrait for DataSink
@@ -90,13 +109,23 @@ pub struct FileSinkConfig { | |||
pub insert_op: InsertOp, | |||
/// Controls whether partition columns are kept for the file | |||
pub keep_partition_by_columns: bool, | |||
/// File extension without a dot(.) | |||
pub file_extension: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FileSinkConfig
did not carry any file-type related information before but giving file_extension explicitly to the helper functions also bothered us. That's why we've added this to the FileSinkConfig
. Could not get a consensus on keeping FileSinkConfig file-type agnostic or adding this please let us know what do you think :) @alamb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I poked around with this PR and since you need to pass the extension into the demuxer tasks I didn't see any better way to handle this
So TLDR is I think this is fine and I don't have any better suggestion
Due to Rust's incomplete specialization implementation, we have some trivial duplication regarding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a nice improvement to the API to me. I am sure we can improve it more but this seems like a stepforward to me
/// Retrieves the file sink configuration. | ||
fn config(&self) -> &FileSinkConfig; | ||
|
||
/// Spawns writer tasks and uses `tokio::join` to collect results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpful to document what a Demux task is here on this trait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let's do this @mertak-synnada
@@ -90,13 +109,23 @@ pub struct FileSinkConfig { | |||
pub insert_op: InsertOp, | |||
/// Controls whether partition columns are kept for the file | |||
pub keep_partition_by_columns: bool, | |||
/// File extension without a dot(.) | |||
pub file_extension: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I poked around with this PR and since you need to pass the extension into the demuxer tasks I didn't see any better way to handle this
So TLDR is I think this is fine and I don't have any better suggestion
Thank you @mertak-synnada |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through this PR twice and sent some commits for further DRY. Let's enrich the demuxer comments per @alamb's suggestion and then merge this.
Which issue does this PR close?
Closes #.
This PR creates an abstraction for FileSinkers' generic behaviors and centralizes some of the functionalities. The main intention is to separate only file type-specific implementations and follow the same routes on FileSink operators.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?