add a benchmark and profile script and hook into CI #1028

danmayer · 2025-02-11T05:01:27Z

This will make it easier for us to start upstreaming some of the performance and latency related PRs we have been testing out. We should be able to compare profiles before and after applying change and post any benchmark differences as well...

current benchmark:

❯❯❯$ bundle exec bin/benchmark                                                                                        <bundler> [main]
yjit: false
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
Warming up --------------------------------------
          client set   691.000 i/100ms
        raw sock set     1.403k i/100ms
Calculating -------------------------------------
          client set      6.391k (± 9.7%) i/s -     63.572k in  10.035280s
        raw sock set     14.013k (± 2.0%) i/s -    140.300k in  10.016703s

Comparison:
        raw sock set:    14012.6 i/s
          client set:     6391.3 i/s - 2.19x  slower

while a profile can be view with https://vernier.prof/

Screenshot 2025-02-10 at 10 00 39 PM (2)

bin/benchmark

grcooper · 2025-02-11T14:43:30Z

bin/benchmark

+TERMINATOR = "\r\n"
+puts "yjit: #{RubyVM::YJIT.enabled?}"
+
+client = Dalli::Client.new('localhost', serializer: StringSerializer, compress: false, raw: true)


Standard client vs multi-client could be another param

yeah true, I will be getting this back to usage in CI soon once I port the set_multi feature.

grcooper · 2025-02-11T14:44:33Z

bin/benchmark

+sock.setsockopt(Socket::SOL_SOCKET, Socket::SO_KEEPALIVE, true)
+# Benchmarks didn't see any performance gains from increasing the SO_RCVBUF buffer size
+# sock.setsockopt(Socket::SOL_SOCKET, ::Socket::SO_RCVBUF, 1024 * 1024 * 8)
+# Benchamrks did see an improvement in performance when increasing the SO_SNDBUF buffer size


We should use the same buffer size that is in Dalli proper.

OK, yeah dalli can also take these adjustments, but you are correct we should default to the same and only apply if folks are passing in options to adjust dalli and then also adjust the socket.

for now just dropping setting this, but as we look at tweaking better defaults we will want to try a few things out

Suggested change

# Benchamrks did see an improvement in performance when increasing the SO_SNDBUF buffer size

# Benchmarks did see an improvement in performance when increasing the SO_SNDBUF buffer size

grcooper · 2025-02-11T14:45:06Z

bin/benchmark

+sock.setsockopt(Socket::SOL_SOCKET, Socket::SO_SNDBUF, 1024 * 1024 * 8)
+
+# ensure the clients are all connected and working
+client.set('key', payload)


Should we do a get after to confirm the key was set?

grcooper · 2025-02-11T14:46:18Z

bin/benchmark

+sock.flush
+sock.readline # clear the buffer
+
+# ensure we have basic data for the benchmarks and get calls


I don't quite get why we are doing this... And why is the payload so much smaller/not configurable?

so these are the defaults for the get_multi... I can make the payload adjustable but we don't typically see get multi with 1mb values so I picked something that was more in the normal range... I will make it configurable and just have it be 1/10th of the full get / set size by default.

bin/benchmark

bin/profile

petergoldstein · 2025-02-11T15:54:31Z

I'm good with this conceptually. You all can determine the details and merge when you think that it's ready.

nickamorim

I'm just getting around to reviewing this now after it's been merged but in general I find the code in bin/benchmark and bin/profile hard to parse and could be DRY'd up.

.github/workflows/benchmarks.yml

nickamorim · 2025-02-12T14:28:40Z

.github/workflows/profile.yml

@@ -0,0 +1,38 @@
+name: Profiles
+
+on: [push, pull_request]


Do we need these on every push / PR?

I think it is good to have on any PR as we can review the profile if we have any concerns that it might impact performance

nickamorim · 2025-02-12T14:29:05Z

.github/workflows/profile.yml

+    - name: Run Profiles
+      run: RUBY_YJIT_ENABLE=1 BENCH_TARGET=all bundle exec bin/profile
+    - name: Upload profile results
+      uses: actions/upload-artifact@v4


Where do these get uploaded to? Any instructions on how to pull them down?

good call added documentation on the action file for how to get them

bin/benchmark

nickamorim · 2025-02-13T16:35:24Z

bin/benchmark

+sock.setsockopt(Socket::SOL_SOCKET, Socket::SO_KEEPALIVE, true)
+# Benchmarks didn't see any performance gains from increasing the SO_RCVBUF buffer size
+# sock.setsockopt(Socket::SOL_SOCKET, ::Socket::SO_RCVBUF, 1024 * 1024 * 8)
+# Benchamrks did see an improvement in performance when increasing the SO_SNDBUF buffer size


Suggested change

# Benchamrks did see an improvement in performance when increasing the SO_SNDBUF buffer size

# Benchmarks did see an improvement in performance when increasing the SO_SNDBUF buffer size

nickamorim · 2025-02-13T16:36:53Z

bin/benchmark

+        count -= 1
+        tail = count.zero? ? '' : 'q'
+        sock.write(String.new("ms #{key} #{value_bytesize} c F0 T#{ttl} MS #{tail}\r\n",
+                              capacity: key.size + value_bytesize + 40) << value << TERMINATOR)


just needed to cover all the characters like ms, ' ', c, F0, etc.... I just picked a number as I was modifying this command a few times, a few extra unused bytes in the buffer didn't matter.

nickamorim · 2025-02-13T16:42:06Z

bin/benchmark

+  pairs.each do |key, value|
+    client.set(key, value, 3600, raw: true)
+  end
+end


Should there be a corresponding get call to make sure the keys were set successfully?

the benchmark verifies during the get bench that they are their or it will raise an exception. So I don't think we need to check before the script, it would fail faster this way but also duplicate more code.

so handled with raise 'mismatch' unless result == payload

danmayer · 2025-02-13T17:51:28Z

I'm just getting around to reviewing this now after it's been merged but in general I find the code in bin/benchmark and bin/profile hard to parse and could be DRY'd up.

I will open a PR to fix up some more of this, but it is a bit hard to fully DRY up as I didn't want to create classes or other files just to leverage in the benchmark and profile, so there is definitely duplication between the two. Like tests, I often find fully drying up profiles and benchmarks makes it harder to focus in and change only the piece you care about digging into, so I don't want to get to fancy with abstractions that might leak and impact the code we are trying to examine.

danmayer · 2025-02-13T19:44:01Z

@nickamorim since this was closed and my other PR already was updating profile/benchmark to support meta protocol I addressed your feedback as part of that PR #1029

danmayer added 3 commits February 10, 2025 21:48

add a benchmark and profile script and hook into CI

fb007f9

fix rubocop complaints

15cb11d

leave out set_multi until we send that PR

d872856

grcooper reviewed Feb 11, 2025

View reviewed changes

improvements to profile and benchmark based on feedback

f51fbcc

danmayer requested a review from grcooper February 11, 2025 23:17

grcooper approved these changes Feb 12, 2025

View reviewed changes

danmayer merged commit dba1337 into petergoldstein:main Feb 12, 2025
25 checks passed

danmayer deleted the add_benchmark_and_profile branch February 12, 2025 19:12

nickamorim reviewed Feb 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a benchmark and profile script and hook into CI #1028

add a benchmark and profile script and hook into CI #1028

danmayer commented Feb 11, 2025

grcooper Feb 11, 2025

danmayer Feb 11, 2025

grcooper Feb 11, 2025

danmayer Feb 11, 2025

danmayer Feb 11, 2025

nickamorim Feb 13, 2025

grcooper Feb 11, 2025

grcooper Feb 11, 2025

danmayer Feb 11, 2025

petergoldstein commented Feb 11, 2025

nickamorim left a comment

nickamorim Feb 12, 2025

danmayer Feb 13, 2025

nickamorim Feb 12, 2025

danmayer Feb 13, 2025

nickamorim Feb 13, 2025

nickamorim Feb 13, 2025

danmayer Feb 13, 2025

nickamorim Feb 13, 2025

danmayer Feb 13, 2025

danmayer Feb 13, 2025

danmayer commented Feb 13, 2025

danmayer commented Feb 13, 2025

	# Benchamrks did see an improvement in performance when increasing the SO_SNDBUF buffer size
	# Benchmarks did see an improvement in performance when increasing the SO_SNDBUF buffer size

add a benchmark and profile script and hook into CI #1028

add a benchmark and profile script and hook into CI #1028

Conversation

danmayer commented Feb 11, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

petergoldstein commented Feb 11, 2025

nickamorim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danmayer commented Feb 13, 2025

danmayer commented Feb 13, 2025