-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a[bugfix] [ir_utils] d-hoc solution to memory leak #3
base: main
Are you sure you want to change the base?
Conversation
191280c
to
dee2006
Compare
dee2006
to
39eb7dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I just remembered we should merge this into a feature branch until it gets merged to upstream main. I will quickly create a branch and you can change the base target.
@@ -19,7 +20,7 @@ | |||
ContextCache, | |||
Empty, | |||
EmptyType, | |||
RefTracker, | |||
# RefTracker, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove
@@ -150,6 +151,50 @@ def infer_external_from_tensor( | |||
return bool(self.external), self.external_scope, None | |||
|
|||
|
|||
############################################################################### | |||
# Reference Trackers - ad-hoc solution to the memory leaks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a small description about why this RefTracker is here and what makes it different from the torch.RefTracker
implementation?
Can you retarget to |
And can you open a PR to update the requirements for attic to point to this branch ( pip should be able to install this package as a git link |
Ad-Hoc Bugfix against Memory Leak
The default current behaviour in the upstream keeps all (or a big portion) of the
torch.Tensor
s andnp.ndarray
s (that are kept asmemorview
s) in memory until the process is killed. An upstream issue is opened, a proper and more well-rounded solution will come at one point.The source of the problem is the
ModuleBuilder
object attributes:global_ref_tracker
andfx_py_attr_tracker
, their class is imported fromiree.compile
.RefTracker
class that is defined iniree.compiler.extras.fx_importer
has these lines in it'strack
method:referrent
corresponds to atorch.Tensor
(or amemoryview
of anp.ndarray
) and this line causes it to be kept in memory until the process dies. Normally, the function passed into thefinalize
is called automatically, sort of as an additional finalization callback, whenreferrent
is garbage collected. Unfortunately, if these lines are present, both theweakref.finalize
objects and their corresponding tensors are not released. In the abscence of the above lines however, the tensors and the arrays are garbage collected automatically, except for the last one that used theaot.export
path, which can though be garbage collected withgc.collect()
in the end.The memory usage w/o the changes:
The memory usage w/ the changes:
Last drop is with
gc.collect()
and drops the last pair of tensors and arrays.