Add two consumer benchmark #149

blindspotbounty · 2023-11-28T17:44:59Z

This PR contains:

Moved utilities from KafkaTests to Kafka with @_spi annotation
Fix for some .finished state and waitForNewMessages() call
Two consumer tests with automatic and manual commits ("SwiftKafkaConsumer...")
2 tests with pure librdkafka with same automatic and manual commits for comparsion ("librdkafka...")

The following results are for this baseline (1000 messages):

Host 'xxx-MacBook-Pro.local' with 12 'arm64' processors with 96 GB memory, running:
Darwin Kernel Version 23.1.0: Mon Oct  9 21:28:45 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6020

============================
SwiftKafkaConsumerBenchmarks
============================

SwiftKafkaConsumer - basic consumer (messages: 1000)
╒══════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                           │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞══════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ (Alloc + Retain) - Release Δ     │    2056 │    2057 │    2061 │    2063 │    2063 │    2063 │    2063 │       3 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Context switches                 │     168 │     168 │     170 │     174 │     174 │     174 │     174 │       3 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (allocated resident) (M)  │      21 │      21 │      21 │      22 │      22 │      22 │      22 │       3 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Object allocs                    │    4109 │    4111 │    4115 │    4117 │    4117 │    4117 │    4117 │       3 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Releases (K)                     │      24 │      24 │      24 │      24 │      24 │      24 │      24 │       3 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Retains (K)                      │      18 │      18 │      18 │      18 │      18 │      18 │      18 │       3 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Throughput (# / s)               │       1 │       1 │       1 │       1 │       1 │       1 │       1 │       3 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (ms)            │      19 │      19 │      19 │      20 │      20 │      20 │      20 │       3 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (ms)           │    1564 │    1564 │    1574 │    1677 │    1677 │    1677 │    1677 │       3 │
╘══════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

SwiftKafkaConsumer - with offset commit (messages: 1000)
╒══════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                           │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞══════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ (Alloc + Retain) - Release Δ     │    5039 │    5039 │    5039 │    5039 │    5039 │    5039 │    5039 │       1 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Context switches (K)             │      10 │      10 │      10 │      10 │      10 │      10 │      10 │       1 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (allocated resident) (M)  │      22 │      22 │      22 │      22 │      22 │      22 │      22 │       1 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Object allocs (K)                │      13 │      13 │      13 │      13 │      13 │      13 │      13 │       1 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Releases (K)                     │      47 │      47 │      47 │      47 │      47 │      47 │      47 │       1 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Retains (K)                      │      29 │      29 │      29 │      29 │      29 │      29 │      29 │       1 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (ms)            │     662 │     662 │     662 │     662 │     662 │     662 │     662 │       1 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (s)            │     104 │     104 │     104 │     104 │     104 │     104 │     104 │       1 │
╘══════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

librdkafka - basic consumer (messages: 1000)
╒══════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                           │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞══════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ (Alloc + Retain) - Release Δ     │       0 │       0 │       0 │       0 │       0 │       0 │       0 │       9 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Context switches                 │     118 │     127 │     133 │     135 │     149 │     149 │     149 │       9 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (allocated resident) (M)  │      21 │      32 │      44 │      55 │      64 │      64 │      64 │       9 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Object allocs                    │       0 │       0 │       0 │       0 │       0 │       0 │       0 │       9 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Releases                         │       0 │       0 │       0 │       0 │       0 │       0 │       0 │       9 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Retains                          │       0 │       0 │       0 │       0 │       0 │       0 │       0 │       9 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Throughput (# / s)               │       2 │       2 │       2 │       2 │       2 │       2 │       2 │       9 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (μs)            │    4571 │    6066 │    6684 │    7806 │    8607 │    8607 │    8607 │       9 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (ms)           │     611 │     614 │     619 │     624 │     626 │     626 │     626 │       9 │
╘══════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

librdkafka - with offset commit (messages: 1000)
╒══════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                           │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞══════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ (Alloc + Retain) - Release Δ     │       0 │       0 │       0 │       0 │       0 │       0 │       0 │       6 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Context switches                 │    5118 │    5123 │    5127 │    5139 │    5147 │    5147 │    5147 │       6 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (allocated resident) (M)  │      20 │      26 │      32 │      43 │      47 │      47 │      47 │       6 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Object allocs                    │       0 │       0 │       0 │       0 │       0 │       0 │       0 │       6 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Releases                         │       0 │       0 │       0 │       0 │       0 │       0 │       0 │       6 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Retains                          │       0 │       0 │       0 │       0 │       0 │       0 │       0 │       6 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Throughput (# / s)               │       1 │       1 │       1 │       1 │       1 │       1 │       1 │       6 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (ms)            │      73 │      74 │      76 │      77 │      80 │      80 │      80 │       6 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (ms)           │     980 │     983 │     987 │     995 │    1012 │    1012 │    1012 │       6 │
╘══════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

blindspotbounty · 2023-11-29T08:43:30Z

Previous CI failed with timeout but there is not much information, just the following:

20:27:28 Build timed out (after 10 minutes). Marking the build as failed.

I've disabled all but one test to check that one works correctly and reduced pollInterval from default (100) to 10ms

blindspotbounty · 2024-01-30T18:01:06Z

@FranzBusch, @felixschlegel just a kind reminder.

blindspotbounty · 2024-03-04T16:25:53Z

Just to note: I could not adjust thresholds and absolute values that they could adequately fit CI results. Seems that depending on machine hardware or its load, results difference may be up to several times faster or slower.
E.g. memory depends on how fast cpu performs.
So, I suggest to disable these benchmark tests from CI so far and test it locally. At some point when either code will perform more stable, either we can run main + pr branch within one CI run for benchmarks, I will try to enable it in CI as well.

hassila · 2024-03-04T17:09:55Z

I think CI should only check things like mallocs or syscalls which are deterministic unless we have a dedicated CI runner. Then for local tests other metrics can be used.

blindspotbounty · 2024-03-05T08:45:37Z

Hi @hassila

Unfortunately, mallocs also depend on CPU as well in a tricky way. If there is more powerful machine (or less loaded one), it leads to less queues in librdkafka/AsyncStream thus less mallocs and memory consumption (as e.g. librdkafka seems re-use less messages).

While if machine is less powerful (or maybe loaded with other tasks) it leads to higher number of elements in queue. It is still not a lot of elements but relatively it differs by tens of percents.
I would say that if we even find a good way to fine tune librdkafka and swift-kafka for this lab test, it will be nice.

Unfortunately, after several days (in background) trying to tune benchmark results, librdkafka and swift kafka gave no luck. Let me know if you have any particular ideas to try adjust test in CI, I am happy to try them.

* Feature: expose librdkafka statistics as swift metrics (swift-server#92) * introduce statistics for producer * add statistics to new consumer with events * fix some artefacts * adjust to KeyRefreshAttempts * draft: statistics with metrics * make structures internal * Update Sources/Kafka/Configuration/KafkaConfiguration+Metrics.swift Co-authored-by: Felix Schlegel <fefefe152@gmail.com> * Update Sources/Kafka/Configuration/KafkaConsumerConfiguration.swift Co-authored-by: Felix Schlegel <fefefe152@gmail.com> * Update Sources/Kafka/Configuration/KafkaConfiguration+Metrics.swift Co-authored-by: Felix Schlegel <fefefe152@gmail.com> * Update Sources/Kafka/Configuration/KafkaConfiguration+Metrics.swift Co-authored-by: Felix Schlegel <fefefe152@gmail.com> * address review comments * formatting * map gauges in one place * move json mode as rd kafka statistics, misc renaming + docc * address review comments * remove import Metrics * divide producer/consumer configuration * apply swiftformat * fix code after conflicts * fix formatting --------- Co-authored-by: Felix Schlegel <fefefe152@gmail.com> * Add benchmark infratructure without actual tests (swift-server#146) * add benchmark infratructure without actual test * apply swiftformat * fix header in sh file * use new async seq methods * Update to latest librdkafka & add a define for RAND_priv_bytes (swift-server#148) Co-authored-by: Franz Busch <f.busch@apple.com> * exit from consumer batch loop when no more messages left (swift-server#153) * Lower requirements for consumer state machine (swift-server#154) * lower requirements for kafka consumer * add twin test for kafka producer * defer source.finish (swift-server#157) * Add two consumer benchmark (swift-server#149) * benchmark for consumer * attempty to speedup benchmarks * check CI works for one test * enable one more test * try to lower poll interval * adjust max duration of test * remain only manual commit test * check if commit is the reason for test delays * try all with schedule commit * revert max test time to 5 seconds * dockerfiles * test set threasholds * create dummy thresholds from ci results * disable benchmark in CI * add header * add stable metrics * update thresholds to stable metrics only * try use '1' instead of 'true' * adjust thresholds to CI results (as temporary measure) * set 20% threshold.. * move arc to unstable metrics * try use 'true' in quotes for CI * try reduce number of messages for more reliable results * try upgrade bench * disable benchmark in CI * Update librdkafka for BoringSSL (swift-server#162) * chore(patch): [sc-8379] use returned error (swift-server#163) * [producer message] Allow optional key for initializer (swift-server#164) Co-authored-by: Harish Yerra <hyerra@apple.com> * Allow groupID to be specified when assigning partition (swift-server#161) * Allow groupID to be specified when assigning partition Motivation: A Consumer Group can provide a lot of benefits even if the dynamic loadbalancing features are not used. Modifications: Allow for an optional GroupID when creating a partition consumer. Result: Consumer Groups can now be used when manual assignment is used. * fix format --------- Co-authored-by: Ómar Kjartan Yasin <omarkj@apple.com> Co-authored-by: blindspotbounty <127803250+blindspotbounty@users.noreply.github.com> Co-authored-by: Franz Busch <f.busch@apple.com> * Wrap rd_kafka_consumer_poll into iterator (use librdkafka embedded backpressure) (swift-server#158) * remove message sequence * test consumer with implicit rebalance * misc + format * remove artefact * don't check a lot of messages * fix typo * slow down first consumer to lower message to fit CI timeout * remove helpers * use exact benchmark version to avoid missing thresholds error (as no thresholds so far) * add deprecated marks for backpressure, change comment for future dev * address comments --------- Co-authored-by: Felix Schlegel <fefefe152@gmail.com> Co-authored-by: Axel Andersson <axel@ordo.one> Co-authored-by: Franz Busch <f.busch@apple.com> Co-authored-by: Samuel M <samuel.mn77@yahoo.com> Co-authored-by: Harish Yerra <hyerra@gmail.com> Co-authored-by: Harish Yerra <hyerra@apple.com> Co-authored-by: Omar Yasin <omarkj@gmail.com> Co-authored-by: Ómar Kjartan Yasin <omarkj@apple.com>

benchmark for consumer

98f17fa

blindspotbounty force-pushed the one-consumer-benchmark branch from 259ff65 to 98f17fa Compare November 28, 2023 17:53

blindspotbounty marked this pull request as ready for review November 28, 2023 17:59

blindspotbounty added 2 commits November 29, 2023 10:33

attempty to speedup benchmarks

57a349e

check CI works for one test

0123caf

blindspotbounty added 7 commits November 29, 2023 13:16

enable one more test

61b7aa5

try to lower poll interval

376b30c

adjust max duration of test

01a9448

remain only manual commit test

274e4d9

check if commit is the reason for test delays

1811752

try all with schedule commit

632b6b7

revert max test time to 5 seconds

a617092

blindspotbounty mentioned this pull request Dec 4, 2023

Consumer performance benchmark #140

Closed

Merge branch 'main' into one-consumer-benchmark

e0c7ae7

blindspotbounty mentioned this pull request Dec 11, 2023

Wrap rd_kafka_consumer_poll into iterator (use librdkafka embedded backpressure) #158

Merged

blindspotbounty requested review from FranzBusch and mr-swifter February 14, 2024 08:56

blindspotbounty added 4 commits February 14, 2024 14:40

Merge branch 'main' into one-consumer-benchmark

f9b1d55

dockerfiles

7de0be7

test set threasholds

6a2a1b7

create dummy thresholds from ci results

34e7b4d

blindspotbounty force-pushed the one-consumer-benchmark branch from cc54b75 to 34e7b4d Compare February 27, 2024 08:52

blindspotbounty added 6 commits February 27, 2024 12:06

disable benchmark in CI

7f4c8e2

add header

01779ed

add stable metrics

df17518

update thresholds to stable metrics only

3255efc

try use '1' instead of 'true'

d316aac

adjust thresholds to CI results (as temporary measure)

4b7bbdd

blindspotbounty added 6 commits March 4, 2024 15:15

set 20% threshold..

860e1d5

move arc to unstable metrics

e328b68

try use 'true' in quotes for CI

4702bbb

try reduce number of messages for more reliable results

7d2828b

try upgrade bench

fedbeff

disable benchmark in CI

2799c02

mr-swifter approved these changes Mar 7, 2024

View reviewed changes

blindspotbounty merged commit 5ebb47a into swift-server:main Mar 11, 2024
4 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add two consumer benchmark #149

Add two consumer benchmark #149

blindspotbounty commented Nov 28, 2023 •

edited

Loading

blindspotbounty commented Nov 29, 2023

blindspotbounty commented Jan 30, 2024

blindspotbounty commented Mar 4, 2024

hassila commented Mar 4, 2024

blindspotbounty commented Mar 5, 2024

Add two consumer benchmark #149

Add two consumer benchmark #149

Conversation

blindspotbounty commented Nov 28, 2023 • edited Loading

blindspotbounty commented Nov 29, 2023

blindspotbounty commented Jan 30, 2024

blindspotbounty commented Mar 4, 2024

hassila commented Mar 4, 2024

blindspotbounty commented Mar 5, 2024

blindspotbounty commented Nov 28, 2023 •

edited

Loading