Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a[bugfix] [ir_utils] d-hoc solution to memory leak #3

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

egebeysel
Copy link

@egebeysel egebeysel commented Dec 3, 2024

Ad-Hoc Bugfix against Memory Leak

The default current behaviour in the upstream keeps all (or a big portion) of the torch.Tensors and np.ndarrays (that are kept as memorviews) in memory until the process is killed. An upstream issue is opened, a proper and more well-rounded solution will come at one point.

The source of the problem is the ModuleBuilder object attributes: global_ref_tracker and fx_py_attr_tracker, their class is imported from iree.compile.

RefTracker class that is defined in iree.compiler.extras.fx_importer has these lines in it's track method:

if referrent is not Empty:
            weakref.finalize(referrent, self._ref_finalizer, ref_id)

referrent corresponds to a torch.Tensor (or a memoryview of a np.ndarray) and this line causes it to be kept in memory until the process dies. Normally, the function passed into the finalize is called automatically, sort of as an additional finalization callback, when referrent is garbage collected. Unfortunately, if these lines are present, both the weakref.finalize objects and their corresponding tensors are not released. In the abscence of the above lines however, the tensors and the arrays are garbage collected automatically, except for the last one that used the aot.export path, which can though be garbage collected with gc.collect() in the end.

The memory usage w/o the changes:

image

The memory usage w/ the changes:

image

Last drop is with gc.collect() and drops the last pair of tensors and arrays.

  • Report this to the upstream issue

Copy link
Author

egebeysel commented Dec 3, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

@egebeysel egebeysel mentioned this pull request Dec 3, 2024
@egebeysel egebeysel marked this pull request as ready for review December 3, 2024 12:15
@egebeysel egebeysel changed the base branch from beysel.update_turbine to main December 3, 2024 15:34
Copy link

@chrsmcgrr chrsmcgrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I just remembered we should merge this into a feature branch until it gets merged to upstream main. I will quickly create a branch and you can change the base target.

@@ -19,7 +20,7 @@
ContextCache,
Empty,
EmptyType,
RefTracker,
# RefTracker,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove

@@ -150,6 +151,50 @@ def infer_external_from_tensor(
return bool(self.external), self.external_scope, None


###############################################################################
# Reference Trackers - ad-hoc solution to the memory leaks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a small description about why this RefTracker is here and what makes it different from the torch.RefTracker implementation?

Copy link

Can you retarget to integrate-iree-turbine-20241203

Copy link

chrsmcgrr commented Dec 9, 2024

And can you open a PR to update the requirements for attic to point to this branch (integrate-iree-turbine-20241203): https://github.com/RooflineAI/roof-mlir/blob/main/packages/attic/requirements.txt#L12

pip should be able to install this package as a git link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants