Morsel-Driven Parallelism Using Rayon #2199

tustvold · 2022-04-11T12:52:08Z

UPDATE June 2024: DataFusion does not use Morsel Driven Parallelism, instead it uses volcano pull + exchange style execution

You can read more about details and analysis in https://dl.acm.org/doi/10.1145/3626246.3653368

A proposal for reformulating the parallelism story within DataFusion to use a morsel-driven approach based on rayon. More details, background, and discussion can be found in the proposal document here, please feel free to comment there.

The keys highlights are:

Decouples parallelism from the partitioning expressed in the physical plan, allowing for:
- Better handling of imbalanced partitions
- Adaptive parallelism based on compute availability at execution time
- Parallelism within a partition, such as decoding parquet columns in parallel, parallel sort, etc...
The first step to reducing the complexity associated with the current futures-based concurrency model
Improvements to thread-locality, observability and performance

…ayon (#2199) (#2226) * Morsel-driven Parallelism using rayon (#2199) * Fix LIFO spawn ordering * Further docs for ExecutionPipeline * Deduplicate concurrent wakes * Add license headers * Sort Cargo.toml * Revert accidental change to ParquetExec * Handle wakeups triggered by other threads * Use SeqCst memory ordering * Review feedback * Add panic handler * Cleanup structs Add test of tokio interoperation * Review feedback * Use BatchPartitioner Cleanup error handling * Clarify shutdown characteristics * Fix racy test_panic * Don't overload Query nomenclature * Rename QueryResults to ExecutionResults * Further review feedback * Merge scheduler into datafusion/core * Review feedback * Fix partitioned execution * Format * Format Cargo.toml * Fix doc link

JasonLi-cn · 2022-09-30T07:47:24Z

binary code

use datafusion::arrow::record_batch::RecordBatch;
use datafusion::arrow::util::pretty::print_batches;
use datafusion::error::Result;
use datafusion::prelude::*;
use datafusion::scheduler::Scheduler;
use futures::{StreamExt, TryStreamExt};
use std::env;

#[tokio::main]
async fn main() -> Result<()> {
    let name = "test_table";
    let mut args = env::args();
    args.next();
    let table_path = args.next().expect("parquet file");
    let sql = &args.next().expect("sql");
    let using_scheduler = args.next().is_some();

    // create local session context
    let config = SessionConfig::new()
        .with_information_schema(true)
        .with_target_partitions(4);
    let context = SessionContext::with_config(config);

    // register parquet file with the execution context
    context
        .register_parquet(name, &table_path, ParquetReadOptions::default())
        .await?;

    let task = context.task_ctx();
    let query = context.sql(sql).await.unwrap();
    let plan = query.create_physical_plan().await.unwrap();

    println!("Start query, using scheduler {}", using_scheduler);
    let now = std::time::Instant::now();
    let results = if using_scheduler {
        let scheduler = Scheduler::new(4);
        let stream = scheduler.schedule(plan, task).unwrap().stream();
        let results: Vec<RecordBatch> = stream.try_collect().await.unwrap();
        results
    } else {
        context.sql(sql).await?.collect().await?
    };
    let elapsed = now.elapsed().as_millis();
    println!("End query, elapsed {} ms", elapsed);
    print_batches(&results)?;
    Ok(())
}

/// Execute sql
async fn plan_and_collect(
    context: &SessionContext,
    sql: &str,
) -> Result<Vec<RecordBatch>> {
    context.sql(sql).await?.collect().await
}

test data

format: parquet
number of files: 4
rows: 16405852 * 4 = 65623408
number of columns: 6
schema: uint32, uint32, uint32, uint32, string, uint32

test result

SQLs:

select count(distinct column0) from test_table;
select * from test_table order by column5 limit 10;

The performance is similar with and without the Scheduler! Is there a problem with where I use it?

@tustvold

tustvold · 2022-09-30T08:38:05Z

Yes that is expected, I've had to park working on this for a bit in favour of some other things. See #2504 for the follow on work

JasonLi-cn · 2022-09-30T09:26:08Z

Yes that is expected, I've had to park working on this for a bit in favour of some other things. See #2504 for the follow on work

Ok, thanks!
By the way, do you know the difference between ClickHouse's Query Execution Pipeline and Datafusion's Execution model(likes vectorized volcano model)? And what are the advantages of ClickHouse?

alamb · 2024-06-19T16:48:43Z

Updated description of this ticket to note DF doesn't use morsel driven parallelism, and added link to the paper https://dl.acm.org/doi/10.1145/3626246.3653368

tustvold added the enhancement New feature or request label Apr 11, 2022

xudong963 added the design label Apr 11, 2022

matthewmturner mentioned this issue Apr 11, 2022

Experiment with rust-s3 and caching datafusion-contrib/datafusion-objectstore-s3#53

Draft

This was referenced Apr 11, 2022

RFC: More Granular File Operators #2079

Closed

Support Non-Tokio Schedulers #2201

Closed

Render Single Line Description of ExecutionPlan #2216

Closed

tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Apr 13, 2022

Morsel-driven Parallelism using rayon (apache#2199)

f8da884

tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Apr 13, 2022

Morsel-driven Parallelism using rayon (apache#2199)

56702d8

tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Apr 13, 2022

Morsel-driven Parallelism using rayon (apache#2199)

02f4d95

tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Apr 13, 2022

Morsel-driven Parallelism using rayon (apache#2199)

35d84ca

tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Apr 13, 2022

Morsel-driven Parallelism using rayon (apache#2199)

ac60247

tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Apr 13, 2022

Morsel-driven Parallelism using rayon (apache#2199)

e78854c

tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Apr 13, 2022

Morsel-driven Parallelism using rayon (apache#2199)

f43b114

tustvold mentioned this issue Apr 13, 2022

Introduce new optional scheduler, using Morsel-driven Parallelism + rayon (#2199) #2226

Merged

tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Apr 13, 2022

Morsel-driven Parallelism using rayon (apache#2199)

0cf6d72

This was referenced Apr 21, 2022

Make ExecutionPlan Sync #2307

Closed

Fix CrossJoinExec evaluating during plan #2310

Merged

alamb closed this as completed in #2226 May 4, 2022

tustvold mentioned this issue May 10, 2022

[EPIC]: Morsel-Driven Scheduler IO #2504

Closed

This was referenced May 29, 2022

Support pluggable async executor in physical plan execution #2643

Closed

Support logical plan compilation #2648

Closed

alamb mentioned this issue Aug 15, 2022

Parallel fetching of column chunks when reading parquet files #2949

Closed

tustvold mentioned this issue Sep 7, 2024

error decoding response body after upgrade to object store 0.10 apache/arrow-rs#5882

Open

tustvold mentioned this issue Nov 23, 2024

Add example for using a separate threadpool for CPU bound work #13424

Closed

7 tasks

tustvold mentioned this issue Dec 9, 2024

Move CPU Bound Tasks off Tokio Threadpool #13692

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Morsel-Driven Parallelism Using Rayon #2199

Morsel-Driven Parallelism Using Rayon #2199

tustvold commented Apr 11, 2022 •

edited by alamb

Loading

JasonLi-cn commented Sep 30, 2022

tustvold commented Sep 30, 2022

JasonLi-cn commented Sep 30, 2022

alamb commented Jun 19, 2024

Morsel-Driven Parallelism Using Rayon #2199

Morsel-Driven Parallelism Using Rayon #2199

Comments

tustvold commented Apr 11, 2022 • edited by alamb Loading

JasonLi-cn commented Sep 30, 2022

tustvold commented Sep 30, 2022

JasonLi-cn commented Sep 30, 2022

alamb commented Jun 19, 2024

tustvold commented Apr 11, 2022 •

edited by alamb

Loading