-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Inserts to Partitioned Listing Table #7744
Comments
For 2, rather than peaking, I am thinking about extending the RecordBatchStream trait like so: /// Trait for types that stream [arrow::record_batch::RecordBatch]
pub trait RecordBatchStream: Stream<Item = Result<RecordBatch>> {
/// Returns the schema of this `RecordBatchStream`.
///
/// Implementation of this trait should guarantee that all `RecordBatch`'s returned by this
/// stream should have the same schema as returned from this method.
fn schema(&self) -> SchemaRef;
fn partition_info(&self) -> &PartitionInfo
}
|
I am not sure about extending the I think actually writing to a partitioned datasource will require a more dynamic approach, with something similar to a
|
This description makes sense, and I agree that we can't know the number of partitions during planning. I'll spend some more time thinking on this. Perhaps filesink could consume a Will have to think on this more... 🤔 |
@alamb I went with receivers of receivers rather than streams of streams, but this approach is implemented here: #7791. So far, I am not trying to do hive style partitioning, but this PR I think sets it up to be much easier. |
Thank you @devinjdangelo -- #7791 looks great. I plan to check it out carefully tomorrow. |
Is your feature request related to a problem or challenge?
It is currently unsupported to run an insert into query for a listing table which is partitioned by a column.
Describe the solution you'd like
For 2, unless there is a slick solution FileSink could simply peak at each stream before initializing a writer.
Describe alternatives you've considered
No response
Additional context
Progress on inserts to sorted tables may be relevant https://github.com/apache/arrow-datafusion/pull/7743/files
The text was updated successfully, but these errors were encountered: