Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(execute): add method to parallelize aggregate transformations #4851

Merged
merged 1 commit into from
Jun 13, 2022

Conversation

jsternberg
Copy link
Contributor

@jsternberg jsternberg commented Jun 8, 2022

Aggregate transformations can now be given multiple parents and will
work in parallel. To define an aggregate transformation as
parallelizable, a Merge() method must be implemented on the
transformation.

This can only happen if a planner rule modifies the plan so that a node
which normally has one parent will have multiple parents instead. This
allows aggregate transformations that implement this interface to act as
merge nodes.

Done checklist

  • docs/SPEC.md updated
  • Test cases written

@jsternberg jsternberg requested a review from a team as a code owner June 8, 2022 19:48
@jsternberg jsternberg requested review from Marwes and removed request for a team June 8, 2022 19:48
@jsternberg jsternberg force-pushed the feat/parallel-aggregates branch from 1ee1fb6 to bc32180 Compare June 8, 2022 21:11
Comment on lines 253 to 263
done, err := t.mergeState(key, value)
if err != nil || !done {
return err
}

// We are done. Retrieve the state.
var state interface{}
if merged, ok := t.d.Delete(key); ok {
state = merged.(*aggregateParallelState).state
}
return t.computeFor(key, state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit; This looks almost identical to a section in flushKey. Maybe factor out to a function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original version of this didn't have these do the same thing. I've refactored this so mergeState can call computeFor so the done parameter isn't needed anymore.

tr, _, err := execute.NewAggregateParallelTransformation(
executetest.RandomDatasetID(),
parents,
&mock.AggregateParallelTransformation{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This mock.AggregateParallelTransformation{ seems to be the same in all(?) the added tests, refactor into something reusable?

parents map[DatasetID]*RandomAccessGroupLookup
finished int
err error
mu sync.RWMutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to make a separate parallelAggregateTransformation? The mixing of parallel and sequential concerns in the methods here makes it more difficult to read the code than necessary (IMO)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I think it's a bit cleaner now.

@jsternberg jsternberg force-pushed the feat/parallel-aggregates branch 2 times, most recently from 6744a09 to 1a7c988 Compare June 10, 2022 18:11
Aggregate transformations can now be given multiple parents and will
work in parallel. To define an aggregate transformation as
parallelizable, a `Merge()` method must be implemented on the
transformation.

This can only happen if a planner rule modifies the plan so that a node
which normally has one parent will have multiple parents instead. This
allows aggregate transformations that implement this interface to act as
merge nodes.
@jsternberg jsternberg force-pushed the feat/parallel-aggregates branch from 1a7c988 to 18ef575 Compare June 10, 2022 19:14
@jsternberg jsternberg requested a review from Marwes June 10, 2022 19:55
@jsternberg jsternberg merged commit a0f4b40 into master Jun 13, 2022
@jsternberg jsternberg deleted the feat/parallel-aggregates branch June 13, 2022 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants