Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add two consumer benchmark #149

Merged
merged 27 commits into from
Mar 11, 2024

Conversation

blindspotbounty
Copy link
Collaborator

@blindspotbounty blindspotbounty commented Nov 28, 2023

This PR contains:

  1. Moved utilities from KafkaTests to Kafka with @_spi annotation
  2. Fix for some .finished state and waitForNewMessages() call
  3. Two consumer tests with automatic and manual commits ("SwiftKafkaConsumer...")
  4. 2 tests with pure librdkafka with same automatic and manual commits for comparsion ("librdkafka...")

The following results are for this baseline (1000 messages):

Host 'xxx-MacBook-Pro.local' with 12 'arm64' processors with 96 GB memory, running:
Darwin Kernel Version 23.1.0: Mon Oct  9 21:28:45 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6020

============================
SwiftKafkaConsumerBenchmarks
============================

SwiftKafkaConsumer - basic consumer (messages: 1000)
╒══════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                           │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞══════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ (Alloc + Retain) - Release Δ     │    2056 │    2057 │    2061 │    2063 │    2063 │    2063 │    2063 │       3 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Context switches                 │     168 │     168 │     170 │     174 │     174 │     174 │     174 │       3 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (allocated resident) (M)  │      21 │      21 │      21 │      22 │      22 │      22 │      22 │       3 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Object allocs                    │    4109 │    4111 │    4115 │    4117 │    4117 │    4117 │    4117 │       3 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Releases (K)                     │      24 │      24 │      24 │      24 │      24 │      24 │      24 │       3 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Retains (K)                      │      18 │      18 │      18 │      18 │      18 │      18 │      18 │       3 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Throughput (# / s)               │       1 │       1 │       1 │       1 │       1 │       1 │       1 │       3 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (ms)            │      19 │      19 │      19 │      20 │      20 │      20 │      20 │       3 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (ms)           │    1564 │    1564 │    1574 │    1677 │    1677 │    1677 │    1677 │       3 │
╘══════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

SwiftKafkaConsumer - with offset commit (messages: 1000)
╒══════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                           │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞══════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ (Alloc + Retain) - Release Δ     │    5039 │    5039 │    5039 │    5039 │    5039 │    5039 │    5039 │       1 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Context switches (K)             │      10 │      10 │      10 │      10 │      10 │      10 │      10 │       1 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (allocated resident) (M)  │      22 │      22 │      22 │      22 │      22 │      22 │      22 │       1 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Object allocs (K)                │      13 │      13 │      13 │      13 │      13 │      13 │      13 │       1 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Releases (K)                     │      47 │      47 │      47 │      47 │      47 │      47 │      47 │       1 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Retains (K)                      │      29 │      29 │      29 │      29 │      29 │      29 │      29 │       1 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (ms)            │     662 │     662 │     662 │     662 │     662 │     662 │     662 │       1 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (s)            │     104 │     104 │     104 │     104 │     104 │     104 │     104 │       1 │
╘══════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

librdkafka - basic consumer (messages: 1000)
╒══════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                           │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞══════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ (Alloc + Retain) - Release Δ     │       0 │       0 │       0 │       0 │       0 │       0 │       0 │       9 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Context switches                 │     118 │     127 │     133 │     135 │     149 │     149 │     149 │       9 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (allocated resident) (M)  │      21 │      32 │      44 │      55 │      64 │      64 │      64 │       9 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Object allocs                    │       0 │       0 │       0 │       0 │       0 │       0 │       0 │       9 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Releases                         │       0 │       0 │       0 │       0 │       0 │       0 │       0 │       9 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Retains                          │       0 │       0 │       0 │       0 │       0 │       0 │       0 │       9 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Throughput (# / s)               │       2 │       2 │       2 │       2 │       2 │       2 │       2 │       9 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (μs)            │    4571 │    6066 │    6684 │    7806 │    8607 │    8607 │    8607 │       9 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (ms)           │     611 │     614 │     619 │     624 │     626 │     626 │     626 │       9 │
╘══════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

librdkafka - with offset commit (messages: 1000)
╒══════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                           │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞══════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ (Alloc + Retain) - Release Δ     │       0 │       0 │       0 │       0 │       0 │       0 │       0 │       6 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Context switches                 │    5118 │    5123 │    5127 │    5139 │    5147 │    5147 │    5147 │       6 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (allocated resident) (M)  │      20 │      26 │      32 │      43 │      47 │      47 │      47 │       6 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Object allocs                    │       0 │       0 │       0 │       0 │       0 │       0 │       0 │       6 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Releases                         │       0 │       0 │       0 │       0 │       0 │       0 │       0 │       6 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Retains                          │       0 │       0 │       0 │       0 │       0 │       0 │       0 │       6 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Throughput (# / s)               │       1 │       1 │       1 │       1 │       1 │       1 │       1 │       6 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (ms)            │      73 │      74 │      76 │      77 │      80 │      80 │      80 │       6 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (ms)           │     980 │     983 │     987 │     995 │    1012 │    1012 │    1012 │       6 │
╘══════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

@blindspotbounty blindspotbounty marked this pull request as ready for review November 28, 2023 17:59
@blindspotbounty
Copy link
Collaborator Author

Previous CI failed with timeout but there is not much information, just the following:

20:27:28 Build timed out (after 10 minutes). Marking the build as failed.

I've disabled all but one test to check that one works correctly and reduced pollInterval from default (100) to 10ms

@blindspotbounty
Copy link
Collaborator Author

@FranzBusch, @felixschlegel just a kind reminder.

@blindspotbounty
Copy link
Collaborator Author

Just to note: I could not adjust thresholds and absolute values that they could adequately fit CI results. Seems that depending on machine hardware or its load, results difference may be up to several times faster or slower.
E.g. memory depends on how fast cpu performs.
So, I suggest to disable these benchmark tests from CI so far and test it locally. At some point when either code will perform more stable, either we can run main + pr branch within one CI run for benchmarks, I will try to enable it in CI as well.

@hassila
Copy link

hassila commented Mar 4, 2024

I think CI should only check things like mallocs or syscalls which are deterministic unless we have a dedicated CI runner. Then for local tests other metrics can be used.

@blindspotbounty
Copy link
Collaborator Author

Hi @hassila

Unfortunately, mallocs also depend on CPU as well in a tricky way. If there is more powerful machine (or less loaded one), it leads to less queues in librdkafka/AsyncStream thus less mallocs and memory consumption (as e.g. librdkafka seems re-use less messages).

While if machine is less powerful (or maybe loaded with other tasks) it leads to higher number of elements in queue. It is still not a lot of elements but relatively it differs by tens of percents.
I would say that if we even find a good way to fine tune librdkafka and swift-kafka for this lab test, it will be nice.

Unfortunately, after several days (in background) trying to tune benchmark results, librdkafka and swift kafka gave no luck. Let me know if you have any particular ideas to try adjust test in CI, I am happy to try them.

@blindspotbounty blindspotbounty merged commit 5ebb47a into swift-server:main Mar 11, 2024
4 of 6 checks passed
blindspotbounty added a commit to ordo-one/swift-kafka-client that referenced this pull request Aug 6, 2024
* Feature: expose librdkafka statistics as swift metrics (swift-server#92)

* introduce statistics for producer

* add statistics to new consumer with events

* fix some artefacts

* adjust to KeyRefreshAttempts

* draft: statistics with metrics

* make structures internal

* Update Sources/Kafka/Configuration/KafkaConfiguration+Metrics.swift

Co-authored-by: Felix Schlegel <fefefe152@gmail.com>

* Update Sources/Kafka/Configuration/KafkaConsumerConfiguration.swift

Co-authored-by: Felix Schlegel <fefefe152@gmail.com>

* Update Sources/Kafka/Configuration/KafkaConfiguration+Metrics.swift

Co-authored-by: Felix Schlegel <fefefe152@gmail.com>

* Update Sources/Kafka/Configuration/KafkaConfiguration+Metrics.swift

Co-authored-by: Felix Schlegel <fefefe152@gmail.com>

* address review comments

* formatting

* map gauges in one place

* move json mode as rd kafka statistics, misc renaming + docc

* address review comments

* remove import Metrics

* divide producer/consumer configuration

* apply swiftformat

* fix code after conflicts

* fix formatting

---------

Co-authored-by: Felix Schlegel <fefefe152@gmail.com>

* Add benchmark infratructure without actual tests (swift-server#146)

* add benchmark infratructure without actual test

* apply swiftformat

* fix header in sh file

* use new async seq methods

* Update to latest librdkafka & add a define for RAND_priv_bytes (swift-server#148)

Co-authored-by: Franz Busch <f.busch@apple.com>

* exit from consumer batch loop when no more messages left (swift-server#153)

* Lower requirements for consumer state machine (swift-server#154)

* lower requirements for kafka consumer

* add twin test for kafka producer

* defer source.finish (swift-server#157)

* Add two consumer benchmark (swift-server#149)

* benchmark for consumer

* attempty to speedup benchmarks

* check CI works for one test

* enable one more test

* try to lower poll interval

* adjust max duration of test

* remain only manual commit test

* check if commit is the reason for test delays

* try all with schedule commit

* revert max test time to 5 seconds

* dockerfiles

* test set threasholds

* create dummy thresholds from ci results

* disable benchmark in CI

* add header

* add stable metrics

* update thresholds to stable metrics only

* try use '1' instead of 'true'

* adjust thresholds to CI results (as temporary measure)

* set 20% threshold..

* move arc to unstable metrics

* try use 'true' in quotes for CI

* try reduce number of messages for more reliable results

* try upgrade bench

* disable benchmark in CI

* Update librdkafka for BoringSSL (swift-server#162)

* chore(patch): [sc-8379] use returned error (swift-server#163)

* [producer message] Allow optional key for initializer (swift-server#164)

Co-authored-by: Harish Yerra <hyerra@apple.com>

* Allow groupID to be specified when assigning partition (swift-server#161)

* Allow groupID to be specified when assigning partition

Motivation:

A Consumer Group can provide a lot of benefits even if the
dynamic loadbalancing features are not used.

Modifications:

Allow for an optional GroupID when creating a partition
consumer.

Result:

Consumer Groups can now be used when manual assignment is
used.

* fix format

---------

Co-authored-by: Ómar Kjartan Yasin <omarkj@apple.com>
Co-authored-by: blindspotbounty <127803250+blindspotbounty@users.noreply.github.com>
Co-authored-by: Franz Busch <f.busch@apple.com>

* Wrap rd_kafka_consumer_poll into iterator (use librdkafka embedded backpressure) (swift-server#158)

* remove message sequence

* test consumer with implicit rebalance

* misc + format

* remove artefact

* don't check a lot of messages

* fix typo

* slow down first consumer to lower message to fit CI timeout

* remove helpers

* use exact benchmark version to avoid missing thresholds error (as no thresholds so far)

* add deprecated marks for backpressure, change comment for future dev

* address comments

---------

Co-authored-by: Felix Schlegel <fefefe152@gmail.com>
Co-authored-by: Axel Andersson <axel@ordo.one>
Co-authored-by: Franz Busch <f.busch@apple.com>
Co-authored-by: Samuel M <samuel.mn77@yahoo.com>
Co-authored-by: Harish Yerra <hyerra@gmail.com>
Co-authored-by: Harish Yerra <hyerra@apple.com>
Co-authored-by: Omar Yasin <omarkj@gmail.com>
Co-authored-by: Ómar Kjartan Yasin <omarkj@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants