-
Notifications
You must be signed in to change notification settings - Fork 15.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in Ruby gem with repeated recursive messages #19498
Comments
Thanks for this report and the detailed reproduction instructions! We are looking into the issue now. Were you able to see if this issue existed before (i.e. in 28.0) or on main? I see you said you couldn't use a newer version of protoc "because there appears to be a Ruby code generation bug in newer releases". Would you be able to open a new bug for that issue so we can look into it as well? |
The memory growth issue sounds like another expression of #10106 which leads me to believe this is working as intended. Applying the workaround suggested there to this issue would mean using require_relative "gen/protobuf/simple_pb"
datum = Proto::Leak::Recursive.new
100_000.times { Proto::Leak::Recursive.new(data: [Google::Protobuf.deep_copy(datum)]) } CC: @ahayworth |
Thanks. I hadn't come across that issue, so thank you for linking them.
I tried to strip this down to the bare minimum to illustrate the issue. But, the code in question is mostly in google-cloud-bigtable, having to do with chaining filters. The Bigtable client can hold long-lived references. I can file a different issue for that project, but I think users of Bigtable are going to be limited in what they can do to workaround this. Calling |
Is the issue with the misreported memory size a known problem? If not, I'll file a separate issue / open a PR to track that. I think that has a fairly straightforward solution. Having that in place would help application authors narrow down where all their memory is going. |
Please do open an issue for the memory reporting. |
@deannagarcia I'm sorry, that was a note from an earlier draft I forgot to delete. It stems from Ubuntu 24.04 shipping |
Thanks for the clarification! So did you download a newer version of protoc to use? I would expect you to see errors if working with a protoc version that doesn't match the runtime. |
Yes. I ended up using Linuxbrew (Homebrew for Linux) to unify the setup on my MacBook and my Linux machine. But, I've also used the available static builds successfully. I checked both Ubuntu 24.10 and the 25.04 pre-release and they're both shipping |
@nirvdrum Regarding the |
What version of protobuf and what language are you using?
Version: 4.29.0
Language: Ruby
What operating system (Linux, Windows, ...) and version?
What runtime / compiler are you using (e.g., python version or gcc version)
What did you do?
Steps to reproduce the behavior:
I've pulled together a small reproduction. To see the issue:
git clone https://github.com/nirvdrum/grpc-protobuf-experiments
cd grpc-protobuf-experiments
bundle install
bundle exec rake
bundle exec ruby leak-simple.rb
At the heart of the problem is a simple message definition:
The repository includes a Gemfile to lock down the versions I used to reproduce the problem and a Rakefile to ease the
protoc
instantiation.You must be using an olderUbuntu uses an older release ofprotoc
(e.g.,brew install protobuf@21
) because there appears to be a Ruby code generation bug in newer releases. This older version ofprotoc
has no bearing on the runtime, where we do use a modern version of the google-protobuf gem.protoc
(3.12.21), not a newer -- apologies for the confusion.The leak-simple.rb script in the linked repo creates local instances of
Proto::Leak::Recursive
in a loop with each object linking back to a global instance ofProto::Leak::Recursive
via the repeateddata
field. After an inner loop finishes executing we trigger GC and then print out the RSS of the process along with Ruby's view of how much memory is allocated. Increasing the inner loop iterations will print out larger RSS values.In simplified form, the loop looks like:
The repository version expands on that simple loop to print out the total memory growth (RSS) over the duration of the script execution. Running with
VERBOSE=true
will also print out periodic memory usage numbers. The repo also has leak-bigtable.rb and leak-bigtable-extracted.rb scripts. These can also be run withbundle exec ruby leak-bigtable.rb
andbundle exec ruby leak-bigtable-extracted.rb
. All three scripts are functionally similar, but vary on the source of the proto definitions.The leak-bigtable.rb file uses the google-cloud-bigtable gem to create instances of
Google::Cloud::Bigtable::V2::RowFilter::Chain
. This script reproduces the memory leak we have that's impacting production workloads.The leak-bigtable-extracted.rb file uses a subset of the proto definitions from the google-cloud-bigtable gem. The goal here was to use the source proto definitions but strip them down to the bare minimum.
The leak-simple.rb gets at the heart of the problem and reproduces without the large google-cloud-bigtable dependency.
What did you expect to see
I'd expect objects created in a loop that don't escape will not lead to a growth in memory consumption.
What did you see instead?
Memory continues to grow unbounded. However,
ObjectSpace
does not reflect this growth, suggesting that the memory is growing off-heap in the the google-protobuf extension. After debugging, however, we believe the problem has to do with the fuse operation forGoogle::Protobuf::Internal::Arena
. It looks like there is an arena that is held for a long time and constantly appended to. However, that would be memory on the Ruby heap andObjectSpace
is telling us that the heap isn't growing. We believe that's an artifact of a second bug where thememsize
hook (i.e.,Arena_memsize
or a related function) is not properly reporting the post-fuse size. In particular, dividing the size by the fused count looks like the wrong operation:protobuf/ruby/ext/google/protobuf_c/protobuf.c
Line 172 in 8f6afdc
The total memsize is divided by the total number of arenas fused into it, presumably because the other arenas would report their size, but those arenas are no longer alive and thus have no size to report.
The text was updated successfully, but these errors were encountered: