-
Notifications
You must be signed in to change notification settings - Fork 15.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ruby: segfault iterating over protobuf objects #11968
Comments
The fact that this only occurs with custom built Rubies is suspicious. Is it possible that you are mixing versions? eg. you are building against headers for Ruby version A and then loading into Ruby version B?
This is a very old and obsolete pre-release. We decided not to bump the major version at that time. |
It is suspicious. I don't think I'm mixing versions but I'm happy to run any commands that might help rule that out. It looks like I'm building native extensions for the correct ruby w/ native extensions (I'm using asdf, FWIW)
Here's some more info from the crash in case it helps:
|
We have the same issue, it only occurs on ARM - our x86 docker images have no issues and it looks likes it's an issue with the pre-packaged binaries. |
I'm facing probably something similar on M1 – 3.22.0 works fine, 3.22.2 throws segmentation fault: Log
|
same for me too, the 3.22.2 version throw the error when 3.22.0 is working |
3.22.3 unfortunately still the same. |
+1 on Docker on MacBook Air M1.
|
3.23.1 still not working. |
I was able to reproduce the crash from https://github.com/semanticart/proto-segfault-example. I got the following stack trace:
|
@haberman That's good to hear! |
This appears to be a race condition involving the object cache. With assertions enabled, I've isolated this to two different assertion failures:
And:
This looks really similar to what @casperisfine was debugging in #12216 |
Also I was able to make the test case more reproducible with less output by using require 'json'
require 'google/protobuf'
Google::Protobuf::DescriptorPool.generated_pool.build do
add_message 'example.Config' do
repeated :rows, :message, 1, 'example.ConfigRow'
end
add_message 'example.ConfigRow' do
repeated :values, :message, 1, 'example.Value'
end
add_message 'example.Value' do
optional :string, :string, 1
end
end
module Example
Config = Google::Protobuf::DescriptorPool.generated_pool.lookup('example.Config').msgclass
ConfigRow = Google::Protobuf::DescriptorPool.generated_pool.lookup('example.ConfigRow').msgclass
Value = Google::Protobuf::DescriptorPool.generated_pool.lookup('example.Value').msgclass
end
content = { "rows": [{ "values": [{ "string": 'hello' }] },
{ "values": [{ "string": 'thanks' }] }] }.to_json
config = Example::Config.decode_json(content)
thread_count = 2
iterations_per_thread = 100
GC.stress = 0x01 | 0x04
threads = []
(1..thread_count).each do |i|
threads << Thread.new do
(1..iterations_per_thread).each do |iter|
config.rows.each do |row|
row.values.each do |conditional_value|
conditional_value
end
end
end
end
end
threads.map(&:join)
puts 'DONE!' |
I bisected the latest repro to a001250 Can someone confirm? |
For what it's worth, the repro can be even further simplified, as it looks like simply fetching the size of the repeated field is sufficient to cause a segfault. It even happens if the values themselves are empty, but not if there's only one row. require 'json'
require 'google/protobuf'
Google::Protobuf::DescriptorPool.generated_pool.build do
add_message 'example.Config' do
repeated :rows, :message, 1, 'example.ConfigRow'
end
add_message 'example.ConfigRow' do
repeated :values, :string, 1
end
end
module Example
Config = Google::Protobuf::DescriptorPool.generated_pool.lookup('example.Config').msgclass
ConfigRow = Google::Protobuf::DescriptorPool.generated_pool.lookup('example.ConfigRow').msgclass
end
content = { "rows": [{}, {}] }.to_json
config = Example::Config.decode_json(content)
thread_count = 2
iterations_per_thread = 100
GC.stress = 0x04
threads = []
(1..thread_count).each do |i|
threads << Thread.new do
(1..iterations_per_thread).each do |iter|
config.rows.each do |row|
row.values.size
end
end
end
end
threads.map(&:join) Oddly enough, calling |
The weak map implementation of the object cache introduced here is not thread safe. Here's a timeline of the relevant commits and the results on Mac OS.
We could do one of these 3:
|
If |
The problem is that if I'm not mistaken |
Hmm, I don't want to make users choose between much higher memory usage and SEGV race conditions. @casperisfine from what you said before, it sounds like we could almost write our own weak map, but there is one missing function that we would need. Any possible workarounds for that? |
Yes, we'd need I don't think there's a way to work around that I'm afraid. We could make a feature request for it, but it would at best come with Ruby 3.3 at the end of the year, and IIRC all current issues are already fixed in WeakMap upstream (and will be backported eventually). I haven't dug in this issue though, not 100% sure if it's caused by the bugs I fixed in Ruby's WeakMap. |
Ah I see there is a minimal reproduction provided, it should be easy to check if it's fixed on ruby-head. (it's late here though, so I'm not gonna do it now) |
It looks like that symbol is available in Ruby 2.7 up to 3.0, but was removed in 3.1. :(
|
So apparently it's still crashing on
|
Fix: protocolbuffers#11968 Since calling a method on WeakMap allow the RubyVM to switch threads, adding a non-fully initialized object into the cache allow another thread from using it. Note: with this fix, the other two thread may create their own wrapper for the same underlying Array. We don't know if it's a big deal or not. Co-Authored-By: Peter Zhu <peter@peterzhu.ca>
We investigated this with @peterzhu2118 and found the fix: #13054 It's a fairly basic race condition. |
This is a great find. Thanks for doing it. I am going to take the discussion over to the PR itself |
Using the repro in https://github.com/semanticart/proto-segfault-example, I was able to confirm that 3.24.0.rc.2 (which contains the fix from #13204) fixes the issue. |
Unfortunately, 3.24.0.rc.2 is still not working for me on M1. Log
|
@krystof-k that stack trace looks unrelated to what is described in this bug. Could you open a new bug with a repro for this crash? |
I opened #13355. I have no idea how to create a repro, I'm just using a gem which has protobuf as a dependency. |
Iterating over Protobuf objects in threads causes a segfault.
What version of protobuf and what language are you using?
Version: 3.21.12
Language: Ruby
What operating system (Linux, Windows, ...) and version?
OSX darwin19 (Catalina) and darwin22 (Ventura) -- probably others, I've only been able to test these.
What runtime / compiler are you using (e.g., python version or gcc version)
What did you do?
Iterating over Protobuf objects in threads causes a segfault. Here's the minimal reproduction:
What did you expect to see
No segfault.
What did you see instead?
The segfault.
Example crash output for v3.21.12:
Example content from ~/Library/Logs/DiagnosticReports
Anything else we should know about your project / environment
It seems to affect any ruby version I build with ruby-build (via asdf) or homebrew on both intel and apple silicon.
System rubies seem unaffected.
I've got a repro repository here: https://github.com/semanticart/proto-segfault-example
I'll note that I've been unable to recreate the segfault reliably in github actions.
FWIW, the pre-release 4.0.0 gem seems unaffected.
The text was updated successfully, but these errors were encountered: