Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix slow kafka sink when queue.buffering.max.ms is set to > 0. #585

Merged
merged 4 commits into from
Nov 12, 2020

Conversation

mfelsche
Copy link
Member

Pull request

Description

Our kafka sink was not only enqueueing every kafka record coming from an event, but also waiting for its delivery before returning. The setting queue.buffering.max.ms determined how the rdkafka broker thread waited upon receiving the message in its queue until it sends it off to the broker. The goal here is to possibly batch messages for more efficient transport. But we handle each sink in its own task, and this one task is blocked until message N is delivered. So there is no other message which can arrive before queue.buffering.max.ms expires. Ergo, we not only always waited for queue.buffering.max.ms (if set to > 0), we never applied any kafka internal batching and always sent out 1 message to the broker. Yes, indeed!

Now we only enqueue all messages coming from an event and then, in another spawned task, wait for the delivery to happen to send out acks/fails.

Related

Checklist

  • The RFC, if required, has been submitted and approved
  • Any user-facing impact of the changes is reflected in docs.tremor.rs
  • The code is tested
  • Use of unsafe code is reasoned about in a comment

Also fix librdkafka not batching, whatever value queue.buffering.max.ms had.

Signed-off-by: Matthias Wahl <mwahl@wayfair.com>
@mfelsche mfelsche added bug Something isn't working offramp Offramps performance performance enhancements or issues labels Nov 12, 2020
Matthias Wahl added 2 commits November 12, 2020 23:11
Signed-off-by: Matthias Wahl <mwahl@wayfair.com>
…flight messages.

Signed-off-by: Matthias Wahl <mwahl@wayfair.com>
Licenser
Licenser previously approved these changes Nov 12, 2020
Copy link
Member

@Licenser Licenser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@codecov
Copy link

codecov bot commented Nov 12, 2020

Codecov Report

Merging #585 (0c47a39) into main (1eb38ea) will increase coverage by 0.03%.
The diff coverage is 89.58%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #585      +/-   ##
==========================================
+ Coverage   80.51%   80.54%   +0.03%     
==========================================
  Files         101      101              
  Lines       13868    13916      +48     
==========================================
+ Hits        11166    11209      +43     
- Misses       2702     2707       +5     
Impacted Files Coverage Δ
tremor-pipeline/src/event.rs 92.02% <89.58%> (-1.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1eb38ea...0c47a39. Read the comment docs.

Signed-off-by: Matthias Wahl <mwahl@wayfair.com>
@Licenser Licenser merged commit c807ef5 into main Nov 12, 2020
@Licenser Licenser deleted the kafka-sink-hang branch November 12, 2020 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working offramp Offramps performance performance enhancements or issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate kafka egress issue
2 participants