Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work towards Batcher unification #553

Merged

Conversation

frankmcsherry
Copy link
Member

@frankmcsherry frankmcsherry commented Dec 8, 2024

The MergeBatcher implementation is shared across multiple implementations, that re-do fairly similar work. This PR attempts to extract their differences into compact support traits they can implement, largely about how to read from and write into containers, while in the process of merging.

The PR is meant to be largely behavior preserving, with some light glitches here and there around the elegance with which one can manipulate the general objects. I'm 100% up for discussing those as part of review, and either shipping or prioritizing any fixes for them. Examples include: 1. dropping some containers in the act of merging (the final containers in a chain), 2. calling unwrap() far too often, rather than destructuring.

No hurry to merge, but did want to put this in place so we can start looking at it.

Although the PR reads as fairly LOC positive, several hundred are a new columnar.rs example meant to exercise the non-privileged access to these traits and types. Also, the PR does not yet git rm the columnation merger which is now dead code, nor is it yet able to get the flat container merger to build (???) so it cannot be removed.

Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can land this change, but there's one comment around pre-sizing chunks. At the moment, we can use a different capacity than the one from Timely, with this change we'd fall back to Timely's ensure_capacity, which may or may not be what we want here. I think this can be mitigated by using a newtype pattern.

src/trace/implementations/merge_batcher.rs Outdated Show resolved Hide resolved
examples/columnar.rs Outdated Show resolved Hide resolved
examples/columnar.rs Outdated Show resolved Hide resolved
src/trace/implementations/merge_batcher.rs Show resolved Hide resolved
Co-authored-by: Petros Angelatos <petrosagg@gmail.com>
@frankmcsherry
Copy link
Member Author

Performance-wise, this PR seems to maintain parity with master, and potentially improve slightly (though, it could be noise). On this branch,

Running `target/release/examples/spines 1000000 1000 old -w4`
Running ["old"] arrangement
3.868985583s    loading complete
...
7.910668916s    queries complete
8.129606833s    shut down

Running `target/release/examples/spines 1000000 1000 new -w4`
Running ["new"] arrangement
1.49757225s     loading complete
...
3.173547834s    queries complete
3.176058584s    shut down

Running `target/release/examples/columnar 1000000 1000 -w4`
1.260987s       loading complete
...
2.485150291s    queries complete
2.486176791s    shut down

vs on master (examples/columnar does not exist)

Running `target/release/examples/spines 1000000 1000 old -w4`
Running ["old"] arrangement
4.156000333s    loading complete
...
8.666894917s    queries complete
8.894178375s    shut down

Running `target/release/examples/spines 1000000 1000 new -w4`
Running ["new"] arrangement
1.5104135s      loading complete
...
3.193643833s    queries complete
3.195464916s    shut down

In discussion with @antiguru we've concluded that it should be ok despite a few quirks that we think we can dial in, around buffer sizing and other nits. We'll make sure that we assess the performance on MZ, and "fail forward" from this work.

@frankmcsherry frankmcsherry merged commit 5976a2f into TimelyDataflow:master Dec 20, 2024
7 checks passed
@frankmcsherry frankmcsherry deleted the batcher_unification branch December 20, 2024 21:37
@github-actions github-actions bot mentioned this pull request Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants