You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are several tools one can use to investigate memory leaks in Beam Python, but they are not straightforward to use, especially for people who don't work on Beam.
If you set the --experiments=enable_heap_dump option, heap dumps will be appended to the SDK status responses, which SDK can provide to the runner. Dataflow workers serve the SDK status page on localhost:8081/sdk_status, and can be queried via: gcloud compute ssh --zone "xx-somezone-z" "some-dataflow-gce-worker-01300848-wqox-harness-bvf7" --project "some-project-id" --command "curl localhost:8081/sdk_status" .
help='A number between 0 and 1 indicating the ratio '
'of bundles that should be profiled.')
could be used to inspect the objects that are left in the heap after a bundle execution.
Attaching off-the-shelf profiler is possible but requires instrumentation, and fetching profiles and analyzing is not convenient, example: #28246 (comment)
We should see whether we can instrument Beam to make profile collection easier, both for leaks in pure Python as well as leaks in native code that can be caught. We should make it possible to easily integrate beam with external profiler like memray
Ideally, it should also be possible to export memory profiles to a cloud profiler with as little effort from the user as possible.
The text was updated successfully, but these errors were encountered:
There are several tools one can use to investigate memory leaks in Beam Python, but they are not straightforward to use, especially for people who don't work on Beam.
If you set the
--experiments=enable_heap_dump
option, heap dumps will be appended to the SDK status responses, which SDK can provide to the runner. Dataflow workers serve the SDK status page onlocalhost:8081/sdk_status
, and can be queried via:gcloud compute ssh --zone "xx-somezone-z" "some-dataflow-gce-worker-01300848-wqox-harness-bvf7" --project "some-project-id" --command "curl localhost:8081/sdk_status"
.The per-workitem heap profiling options
beam/sdks/python/apache_beam/options/pipeline_options.py
Lines 1280 to 1293 in 3172736
Attaching off-the-shelf profiler is possible but requires instrumentation, and fetching profiles and analyzing is not convenient, example: #28246 (comment)
We should see whether we can instrument Beam to make profile collection easier, both for leaks in pure Python as well as leaks in native code that can be caught. We should make it possible to easily integrate beam with external profiler like memray
Ideally, it should also be possible to export memory profiles to a cloud profiler with as little effort from the user as possible.
The text was updated successfully, but these errors were encountered: