Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a benchmark and profile script and hook into CI #1028

Merged
merged 4 commits into from
Feb 12, 2025

Conversation

danmayer
Copy link
Collaborator

This will make it easier for us to start upstreaming some of the performance and latency related PRs we have been testing out. We should be able to compare profiles before and after applying change and post any benchmark differences as well...

current benchmark:

❯❯❯$ bundle exec bin/benchmark                                                                                        <bundler> [main]
yjit: false
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
Warming up --------------------------------------
          client set   691.000 i/100ms
        raw sock set     1.403k i/100ms
Calculating -------------------------------------
          client set      6.391k (± 9.7%) i/s -     63.572k in  10.035280s
        raw sock set     14.013k (± 2.0%) i/s -    140.300k in  10.016703s

Comparison:
        raw sock set:    14012.6 i/s
          client set:     6391.3 i/s - 2.19x  slower

while a profile can be view with https://vernier.prof/

Screenshot 2025-02-10 at 10 00 39 PM (2)

bin/benchmark Outdated Show resolved Hide resolved
bin/benchmark Show resolved Hide resolved
bin/benchmark Outdated
TERMINATOR = "\r\n"
puts "yjit: #{RubyVM::YJIT.enabled?}"

client = Dalli::Client.new('localhost', serializer: StringSerializer, compress: false, raw: true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard client vs multi-client could be another param

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah true, I will be getting this back to usage in CI soon once I port the set_multi feature.

sock.setsockopt(Socket::SOL_SOCKET, Socket::SO_KEEPALIVE, true)
# Benchmarks didn't see any performance gains from increasing the SO_RCVBUF buffer size
# sock.setsockopt(Socket::SOL_SOCKET, ::Socket::SO_RCVBUF, 1024 * 1024 * 8)
# Benchamrks did see an improvement in performance when increasing the SO_SNDBUF buffer size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the same buffer size that is in Dalli proper.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yeah dalli can also take these adjustments, but you are correct we should default to the same and only apply if folks are passing in options to adjust dalli and then also adjust the socket.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now just dropping setting this, but as we look at tweaking better defaults we will want to try a few things out

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Benchamrks did see an improvement in performance when increasing the SO_SNDBUF buffer size
# Benchmarks did see an improvement in performance when increasing the SO_SNDBUF buffer size

sock.setsockopt(Socket::SOL_SOCKET, Socket::SO_SNDBUF, 1024 * 1024 * 8)

# ensure the clients are all connected and working
client.set('key', payload)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do a get after to confirm the key was set?

sock.flush
sock.readline # clear the buffer

# ensure we have basic data for the benchmarks and get calls
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get why we are doing this... And why is the payload so much smaller/not configurable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so these are the defaults for the get_multi... I can make the payload adjustable but we don't typically see get multi with 1mb values so I picked something that was more in the normal range... I will make it configurable and just have it be 1/10th of the full get / set size by default.

bin/benchmark Outdated Show resolved Hide resolved
bin/profile Show resolved Hide resolved
@petergoldstein
Copy link
Owner

I'm good with this conceptually. You all can determine the details and merge when you think that it's ready.

@danmayer danmayer requested a review from grcooper February 11, 2025 23:17
@danmayer danmayer merged commit dba1337 into petergoldstein:main Feb 12, 2025
25 checks passed
@danmayer danmayer deleted the add_benchmark_and_profile branch February 12, 2025 19:12
Copy link
Collaborator

@nickamorim nickamorim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just getting around to reviewing this now after it's been merged but in general I find the code in bin/benchmark and bin/profile hard to parse and could be DRY'd up.

.github/workflows/benchmarks.yml Show resolved Hide resolved
@@ -0,0 +1,38 @@
name: Profiles

on: [push, pull_request]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these on every push / PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is good to have on any PR as we can review the profile if we have any concerns that it might impact performance

- name: Run Profiles
run: RUBY_YJIT_ENABLE=1 BENCH_TARGET=all bundle exec bin/profile
- name: Upload profile results
uses: actions/upload-artifact@v4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do these get uploaded to? Any instructions on how to pull them down?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call added documentation on the action file for how to get them

bin/benchmark Show resolved Hide resolved
bin/benchmark Show resolved Hide resolved
bin/benchmark Show resolved Hide resolved
bin/benchmark Show resolved Hide resolved
sock.setsockopt(Socket::SOL_SOCKET, Socket::SO_KEEPALIVE, true)
# Benchmarks didn't see any performance gains from increasing the SO_RCVBUF buffer size
# sock.setsockopt(Socket::SOL_SOCKET, ::Socket::SO_RCVBUF, 1024 * 1024 * 8)
# Benchamrks did see an improvement in performance when increasing the SO_SNDBUF buffer size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Benchamrks did see an improvement in performance when increasing the SO_SNDBUF buffer size
# Benchmarks did see an improvement in performance when increasing the SO_SNDBUF buffer size

count -= 1
tail = count.zero? ? '' : 'q'
sock.write(String.new("ms #{key} #{value_bytesize} c F0 T#{ttl} MS #{tail}\r\n",
capacity: key.size + value_bytesize + 40) << value << TERMINATOR)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's 40?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just needed to cover all the characters like ms, ' ', c, F0, etc.... I just picked a number as I was modifying this command a few times, a few extra unused bytes in the buffer didn't matter.

pairs.each do |key, value|
client.set(key, value, 3600, raw: true)
end
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a corresponding get call to make sure the keys were set successfully?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the benchmark verifies during the get bench that they are their or it will raise an exception. So I don't think we need to check before the script, it would fail faster this way but also duplicate more code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so handled with raise 'mismatch' unless result == payload

@danmayer
Copy link
Collaborator Author

I'm just getting around to reviewing this now after it's been merged but in general I find the code in bin/benchmark and bin/profile hard to parse and could be DRY'd up.

I will open a PR to fix up some more of this, but it is a bit hard to fully DRY up as I didn't want to create classes or other files just to leverage in the benchmark and profile, so there is definitely duplication between the two. Like tests, I often find fully drying up profiles and benchmarks makes it harder to focus in and change only the piece you care about digging into, so I don't want to get to fancy with abstractions that might leak and impact the code we are trying to examine.

@danmayer
Copy link
Collaborator Author

@nickamorim since this was closed and my other PR already was updating profile/benchmark to support meta protocol I addressed your feedback as part of that PR #1029

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants