-
Notifications
You must be signed in to change notification settings - Fork 15.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python references that go out of scope might not release the memory allocated by _upb_Arena_SlowMalloc #14571
Comments
This is the proto we use: https://github.com/apache/beam/blob/47d0fd566f86aaad35d26709c52ee555381823a4/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L91 I am sure the repro can be simplified further. |
@tvalentyn can you help create a repo that not depend on apache-beam? Thanks |
I am OOO atm and likely won't get to it before next week, but cc'ing: @AnandInguva in case he has time and interested. |
Interesting. I gave it a try, and I was not able to recreate the leak in isolation while using essentially the same proto. I'll check whether this has to do with how beam compiles proto stubs or something else. |
That would be really helpful, thanks! |
Got a smaller repro. Appears that there some interaction between protobuf and dill. Beam uses old version of dill, but the issue is reproducible even with latest dill. Good news is there is some work underway for Beam to not depend on Dill or at least be able to use cloudpickle as default..
|
Leak correlates with this line: https://github.com/uqfoundation/dill/blob/f66ed3b602d6e9b942542491614dfd18622583d7/dill/_dill.py#L72C32-L72C32 Here is a repro that doesn't use dill:
|
Leak doesn't happen with protobuf==3.20.3 or with |
I think I see the root cause. In map lookup, we use the message's arena to convert the Python Lines 203 to 205 in 022d223
We should use a temporary arena to perform the conversion. Only at the point where we are allocating non-temporary data should we use the message's arena. |
That explains why |
Indeed, in |
|
And probably several other places in |
Previously we were allocating memory on the message's arena every time we performed a `map[key]` or `map.get(key)` operation. This is unnecessary, as the key's data is only needed ephemerally, for the duration of the lookup, and we can therefore alias the Python object's string data instead of copying it. This required fixing a bug in the convert.c operation. Previously in the `arena==NULL` case, if the user passes a bytes object instead of a unicode string, the code would return a pointer to a temporary Python object that had already been freed, leading to use-after-free. I fixed this by referencing the bytes object's data directly, and using utf8_range to verify the UTF-8. Fixes: #14571 PiperOrigin-RevId: 578311252
Previously we were allocating memory on the message's arena every time we performed a `map[key]` or `map.get(key)` operation. This is unnecessary, as the key's data is only needed ephemerally, for the duration of the lookup, and we can therefore alias the Python object's string data instead of copying it. This required fixing a bug in the convert.c operation. Previously in the `arena==NULL` case, if the user passes a bytes object instead of a unicode string, the code would return a pointer to a temporary Python object that had already been freed, leading to use-after-free. I fixed this by referencing the bytes object's data directly, and using utf8_range to verify the UTF-8. Fixes: #14571 PiperOrigin-RevId: 578311252
Previously we were allocating memory on the message's arena every time we performed a `map[key]` or `map.get(key)` operation. This is unnecessary, as the key's data is only needed ephemerally, for the duration of the lookup, and we can therefore alias the Python object's string data instead of copying it. This required fixing a bug in the convert.c operation. Previously in the `arena==NULL` case, if the user passes a bytes object instead of a unicode string, the code would return a pointer to a temporary Python object that had already been freed, leading to use-after-free. I fixed this by referencing the bytes object's data directly, and using utf8_range to verify the UTF-8. Fixes: #14571 PiperOrigin-RevId: 578563555
Repro:
pip install apache-beam==2.51.0
.memleak_repro.py
Run
python memleak_repro.py pipeline.pb
, observe RAM usage in top/htop, etc, increasing the number of iterations as necessary to keep the process running longer.For profiling, run:
pip install memray
memray run -o output.bin --force memleak_repro.py pipeline.pb ; memray table --leak output.bin -o table.html --force ; memray flamegraph --leak output.bin -o flamegraph.html -f
memleak_repro.py:20
Leak doesn't happen with
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
set.Leak can also be observed with C-stack tools, such as tcmalloc+pprof:
valgrind+massif:
Using memray in
--native
mode also points to_upb_Arena_SlowMalloc
.Command line:
memray run --native -o output.bin --force memleak_repro.py pipeline.pb ; memray table --leak output.bin -o table.html --force ; memray flamegraph --leak output.bin -o flamegraph.html -f
The text was updated successfully, but these errors were encountered: