Skip to content
This repository has been archived by the owner on Jun 21, 2024. It is now read-only.

capture: set kafka partitioner to murmur2_random #27

Merged
merged 1 commit into from
Apr 25, 2024
Merged

Conversation

xvello
Copy link
Contributor

@xvello xvello commented Apr 25, 2024

Working on PostHog/posthog#21850 showed a discrepancy on the message key -> partition computation between python and rust. The issue is that rdkafka's default partitionner uses CRC32, that is not supported in python-kafka. Let's align on python's murmur2 partitionner (CPU overhead should be negligible).

Tested locally on a topic with 10 partitions, the same partition is used now.

@xvello xvello requested a review from a team April 25, 2024 14:17
Copy link

@tiina303 tiina303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this might shuffle keys around? we don't have noticable lag atm seems safe to ship

@bretthoerner
Copy link
Contributor

So this might shuffle keys around? we don't have noticable lag atm seems safe to ship

It will, but it will shuffle them where they need to be anyway. :)

@xvello
Copy link
Contributor Author

xvello commented Apr 25, 2024

@tiina303 both PRs will indeed break ordering from both producers, to bring both producers into a third common place, where we expect them to agree on the partition.

I'll double-check processing lag before releasing both changes, as long as it's low the data risk seems acceptable

@xvello xvello merged commit 26a67e9 into main Apr 25, 2024
4 checks passed
@xvello xvello deleted the xvello/partitionner branch April 25, 2024 14:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants