Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KafkaConsumer runs 8x slower since v1.4.5 #1888

Closed
jutley opened this issue Aug 22, 2019 · 3 comments
Closed

KafkaConsumer runs 8x slower since v1.4.5 #1888

jutley opened this issue Aug 22, 2019 · 3 comments

Comments

@jutley
Copy link

jutley commented Aug 22, 2019

My org uses the prometheus-kafka-consumer-group-exporter project, which depends on this library. All this project does is read through the __consumer_offsets topic and generate metrics about the consumer groups. The latest version fails to read through the topic quickly enough to keep up.

With some digging, I found that this performance change started at this commit: 8c07925. I verified this with an experiment where I ran the following script against this commit, the commit preceding it (7a99013), and master:

from kafka import KafkaConsumer
import schedule
import datetime

consumer_config = {
    'bootstrap_servers': <redacted>,
    'auto_offset_reset': 'earliest',
    'group_id': None
}

consumer = KafkaConsumer(
    '__consumer_offsets',
    **consumer_config
)

iterations=0

def print_status():
    print(datetime.datetime.now(), iterations)

schedule.every(5).seconds.do(print_status)

while True:
    for message in consumer:
        iterations = iterations + 1
        schedule.run_pending()

I ran three 2 minute trials against each commit to test the throughput of the consumer. Here are the results:
image

As you can see, this commit cause the consumer to run 8 times slower (2,426,802 messages vs. 300,187 messages). This has not improved since then.

I do not understand the details around this commit, but it has rendered this project unusuable on the latest version of kafka-python.

@dpkp
Copy link
Owner

dpkp commented Sep 3, 2019

Hey, thanks for taking the time to investigate and run performance tests! We have some rudimentary benchmarking scripts in benchmarks/ that confirm your results. I'll spend some time investigating and see if we can get this improved. I expect some performance regression, but not 8x. My hunch is that the move towards more decoupled send/receive is pushing into the legacy blocking-send code and causing us to be IO-bound where we should be (and have traditionally been) CPU bound. Indeed, a very quick flamegraph comparison suggests that the current master branch is IO bound and spends most of its time blocked in _send_bytes_blocking , while 1.4.4 is CPU bound on decoding + processing messages.

FWIW, the referenced commit is one of many aimed at fixing deadlock failures and other concurrency related issues and unfortunately there are no plans to revert. The deadlocking issues are serious and are my highest priority in the short term. But this is also very important and hopefully I can put some time into this as well. Thanks again!

This was referenced Sep 16, 2019
@dpkp
Copy link
Owner

dpkp commented Sep 29, 2019

I believe this is fixed #1902 -- thanks for the detailed writeup!

As of 1.4.7 (and current master), the KafkaConsumer iterator will no longer use a custom implementation and just wrap consumer.poll(). This will simplify the internals and also should end up increasing performance a bit from the 1.4.4 numbers you measured.

If you have time and the inclination, I would love to see results from your tests against the upcoming 1.4.7 release as well.

@dpkp dpkp closed this as completed Sep 29, 2019
@jutley
Copy link
Author

jutley commented Oct 2, 2019

This is fantastic! I don't quite have the time for as rigorous of tests as I initially did, but I did some basic ones:

7a99013: 19,000 msg/s  # the fast but deadlocking commit
8c07925:  5,175 msg/s  # the slow but safe commit
0552b04: 24,380 msg/s  # 1.4.7

I'm not sure why I'm only seeing about a 4x decrease with the "breaking" commit, but the good news is that 1.4.7 seems to be the most performant of all of them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants