-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Soft References #63113
Comments
Tagging subscribers to this area: @dotnet/gc Issue DetailsBackground and motivationMemoryCache is the "standard" way to cache things in .NET, but its behavior is unintuitive and it does not guarantee that it will evict cache entries quickly enough to prevent OutOfMemoryExceptions, among other issues. Plus, it apparently relies on some kind of black magic (the documentation for which I have not been able to locate) to detect the size in bytes of objects in the cache, so that using it correctly is difficult: even if I am using it correctly, it's difficult to be confident of that! I would like to be able to put objects in a cache that contain two kinds of references: (1) references to "owned" subobjects that should be counted as part of the parent object, and (2) references to (large) shared objects that can never be evicted. I can't imagine how anything except the garbage collector would be able to detect that (1) is only reachable via the cache and so should be counted for "eviction" purposes, while (2) cannot be GC'd. Finally, if the goal is to prevent memory exhaustion, Weak references tend to be collected far too quickly to be used in caches. Soft references would solve this problem. Soft references are like weak references, but garbage-collected much less aggressively. API ProposalAn obvious interface would be to replicate namespace System;
public sealed class SoftReference<T> : IWeakReference<T>
{
public SoftReference(T target);
~SoftReference();
public void SetTarget(T target);
public bool TryGetTarget([MaybeNullWhen(false)][NotNullWhen(true)] out T target);
// If updateLastUsed is true, LastUsed is set to DateTime.UtcNow
public bool TryGetTarget([MaybeNullWhen(false)][NotNullWhen(true)] out T target, bool updateLastUsed);
}
// An interface implemented by WeakReference<T> and SoftReference<T>
public interface IWeakReference<T>
{
bool TryGetTarget([MaybeNullWhen(false)][NotNullWhen(true)] out T target);
void SetTarget(T target);
} API UsageSoftReference<MyEntity>> _entity;
void Cache(MyEntity entity) => _entity = new SoftReference<MyEntity>(entity);
// Later on...
if (_entity != null && _entity.TryGetTarget(out MyEntity e))
Console.WriteLine("We've still got it!");
else
Console.WriteLine("Ain't got it!"); Alternative DesignsAnother obvious design is to define WeakReferences have a "track resurrection" feature. It's not clear to me that this would add value to a soft reference, but IMO this feature should be supported if it does not add significant complexity to the GC. It should be kept in mind that after this feature is introduced, many applications could be dominated by soft references (i.e. at any given time, most objects are reachable only through soft references). Therefore, perhaps there should be a property of // Sets a limit for memory usage that the GC should attempt to enforce
// by collecting more aggressively near the limit. This particularly affects
// the degree to which objects referenced via soft references are collected.
public long SoftMemoryLimit { get; set; } (I would very much like a hard memory limit too, but that's another story.) Ideally, the GC would not collect all soft-referenced object when memory pressure is encountered, but instead prioritize which objects to collect first according to some kind of "priority". I believe the most commonly-desired way to prioritize would be by recency: to first get rid of objects that have not been used recently. To that end there could be a // Controls garbage collection priority; unreachable objects with
// lower values for LastUsed tend to be collected first.
// - To artificially delay GC for a soft reference, increase it (eg add 24 hours)
// - To artificially encourage GC for a soft reference, decrease it (eg subtract
// 24 hours; or use DateTime.MinValue to treat soft ref like WeakReference)
// - Setter could convert all dates to UTC so that the GC can directly compare
// LastUsed.Ticks of different soft references.
public DateTime LastUsed { get; set; }
// Variant of TryGetTarget that sets LastUsed = DateTime.UtcNow if target is alive
public bool TryGetTargetAndSetLastUsed([MaybeNullWhen(false)][NotNullWhen(true)] out T target) {
if (TryGetTarget(out target)) {
LastUsed = DateTime.UtcNow;
return true;
}
return false;
} Rather than adding It is possible that there are multiple soft references to the same object. It is tempting to put the last-used timestamp on the object itself so that there can only be a single timestamp: // Any class could implement this interface in order to control GC priority
public interface IGCSoftReferencePriority
{
DateTime LastUsed { get; set; }
} However this approach would have major disadvantages:
RisksI have no idea how difficult it would be to implement this in the GC.
|
Should it be a variant of |
How do you propose to implement "much less aggressively"? |
Soft references could be considered as strong references until memory pressure gets high where they would be converted to weak references. Under the hood they could (maybe I'm wrong I have no idea) be implemented as a new type of There are two major questions. The first is what does "memory pressure gets high" mean, and it's kind of solved as evidenced in the shared In the meanwhile, using constructs like the |
ArrayPool uses number of strategies. In addition to checking for memory pressure, it also uses timestamp to track when the pooled array was used last time and releases the pooled array once it was not used for a while.
It is not obvious to me that the soft references have lower overhead than checking memory pressure. For example, I expect that we would see regression in ArrayPool if it was switched to use the soft references proposed here. #53895 has a lot of additional discussion of this problem space. |
@jkotas I'm no GC expert and don't even know how weak references work, so I'd leave it up to the GC team to choose an implementation. I like how @teo-tsirpanis frames it, as soft refs should act like strong refs until some memory pressure threshold is crossed (such as the |
... you're asking for some sort of
It doesn't guarantee that cache entries will ever be evicted (ie, they can all be held).
Why would they? Being independent is usually a benefit.
It sounds like you want two caches tuned differently, or a regular cache and a dictionary. If necessary, implement a wrapper that divides them appropriately for you. |
@Clockwork-Muse I want a cache that uses as much memory as possible with (for all practical purposes) no chance of OutOfMemoryException (and also avoiding using swap space, if available). If you can tell me how to do that without soft references, please do. |
What I was getting at was - are you planning on writing one, or would a different implementation of an in-memory cache be sufficient (provided by a third-party library, or due to changes to
Side note: this may be outside application notice/control, depending on OS/settings. |
I do not want to write another cache, and I do not know how a cache could (even in principle) have the characteristics that soft references would provide.
Yes, but to the extent it is possible, it would involve OS-specific metrics/mechanisms that maybe the CLR already uses to help choose the GC interval. I expect it is possible to ask the OS how much physical memory there is, and how much memory is being used by other apps and by the current app; this seems like all that is needed to choose a soft cap. |
Soft reference is actively used in JVM for caching but personally I'm not a fan of this mechanism because |
... When I did some cursory research for this issue earlier, I found documentation that for at least one JVM implementation, |
@Clockwork-Muse , Client JVM treats SoftReferences as WeakReferences. Server JVM does not. |
That may be an outdated information because the HotSpot uses GraalVM under the hood. Anyway, JVM memory management strategy differs from .NET CLR which doesn't try to occupy maximum possible heap size. According to that, generation-based approach can be better and much more understandable. |
@sakno My suggestion is that Edit: "upper bound" is the wrong term, as it is possible for the amount of allocated memory to exceed it. What I meant was that the GC would choose its own target memory usage during GCs, and |
Having said all that, if a soft reference is merely a weak reference that can only be collected in Gen2, that sounds like a feature that the team could do quickly and easily, and if that means the feature could be available in .NET 7 I would be very happy. |
@qwertie , a reference that can only be collected in Get2 can be implemented without runtime support using existing weak references. The main idea is to use finalizer as a callback from GC to track object generation: public readonly struct SoftReference<T>
where T : class
{
private sealed class Tracker
{
internal readonly T Target;
private readonly WeakReference parent;
internal Tracker(T target, WeakReference parent)
{
Target = target;
this.parent = parent;
}
~Tracker()
{
if (GC.GetGeneration(Target) < GC.MaxGeneration)
GC.ReRegisterForFinalize(this);
else
parent.Target = Target; // downgrade from soft to weak reference
}
}
private readonly WeakReference? reference;
public SoftReference(T? target)
{
if (target is null)
{
reference = null;
}
else if (GC.GetGeneration(target) == GC.MaxGeneration)
{
reference = new(target, trackResurrection: false);
}
else
{
var tracker = new Tracker(target);
reference = new(tracker, trackResurrection: true);
GC.KeepAlive(tracker);
}
}
public void Clear()
{
if (reference?.Target is Tracker tracker)
{
GC.SuppressFinalize(tracker);
reference.Target = null;
}
}
public bool IsAlive => reference?.IsAlive ?? false;
public T? Target => reference?.Target switch
{
Tracker tracker => tracker.Target,
T target => target,
_ => null
}
} The code inside of finalizer can be used to analyze your memory limit using GC.GetGCMemoryInfo method. Edit: Soft reference must keep reference to the object even if the object reaches Gen2 but remains alive due to presence of strong references. In this case we need to "downgrade" the reference from soft to weak (see else branch in finalizer). |
Thanks, I'll give that I try when I have time. |
@sakno I found an issue with your implementation that would require allocating a new public class SoftReference<T> where T : class
{
private sealed class Tracker
{
private T? target;
private readonly WeakReference<T> reference;
internal Tracker(T? target, bool trackResurrection)
{
this.target = target;
reference = new(target, trackResurrection);
}
internal void SetTarget(T? target)
{
this.target = target;
reference.SetTarget(target);
}
internal bool TryGetTarget(out T? target)
{
target = this.target;
return target != null || reference.TryGetTarget(out target);
}
~Tracker()
{
if (target != null && GC.GetGeneration(target) == GC.MaxGeneration)
{
target = null; // downgrade from soft to weak reference
}
GC.ReRegisterForFinalize(this);
}
}
private readonly WeakReference<Tracker> reference; // WeakReference allows finalizer to run, but it always resurrects itself until this is finalized.
public SoftReference(T? target, bool trackResurrection)
{
var tracker = new Tracker(target, trackResurrection);
reference = new(tracker, trackResurrection: true);
GC.KeepAlive(tracker);
}
~SoftReference()
{
reference.TryGetTarget(out Tracker tracker);
GC.SuppressFinalize(tracker);
}
internal void SetTarget(T? target)
{
reference.TryGetTarget(out Tracker tracker);
tracker.SetTarget(target);
}
internal bool TryGetTarget(out T? target)
{
reference.TryGetTarget(out Tracker tracker);
return tracker.TryGetTarget(out target);
}
} Also, is there any reason for the runtime to not have this type? Maybe it's not quite in the spirit of [Edit] I imagine this behavior of only collecting on a certain GC generation could easily be added to |
@timcassell , you can reduce 1 internal allocation. Here is the code: https://github.com/dotnet/dotNext/blob/develop/src/DotNext/Runtime/SoftReference.cs. The code in your example doesn't handle some specific situations like setting P.S.: Provided implementation also includes an option to control memory pressure in Gen2. |
@sakno Your implementation does not include the ability to overwrite the |
Why would the |
Assume that you have a target object with two references:
Soft reference downgrades to weak reference when it In my implementation, |
@sakno Once the Also, it is true that a reference located elsewhere in the code will keep the target alive in the |
Oh, I got it. Your implementation should work fine as well. My implementation allows to reduce one internal allocation. Also, from my personal view, the tracker is needed to keep the strong reference as long as needed. When the reference is downgraded, no need to keep the reference to the tracker itself because it is no longer useful. |
That's only true because you don't support overwriting the target. It is necessary to keep it alive to overwrite the target without extra allocations. |
I have used a similar system to track liveness, but I had an issue that the referenced object (and all objects referenced only by it) will be finalized and resurrected. This will be a problem if those objects do finalization work in finalizer, because C# doesn't have a mechanism for an object to detect resurrection. After resurrection, the state of the object will be invalid. From the disucssion and code in this thread, I am not clear whether this problem is addressed. Could anyone explain briefly if I missed something? |
@acaly Thanks for bringing that oddity to attention. I have adjusted my implementation to fix that issue. public class SoftReference<T> where T : class
{
private sealed class Tracker
{
private readonly SoftReference<T> parent;
internal Tracker(SoftReference<T> parent)
{
this.parent = parent;
}
~Tracker()
{
parent.OnGC();
GC.ReRegisterForFinalize(this);
}
internal void StopTracking()
{
GC.SuppressFinalize(this);
}
}
private T? target;
private readonly WeakReference<T?> targetReference;
private readonly WeakReference<Tracker> callbackReference; // WeakReference allows finalizer to run, but it always resurrects itself until this is finalized.
public SoftReference(T? target, bool trackResurrection)
{
this.target = target;
targetReference = new(target, trackResurrection);
var tracker = new Tracker(this);
callbackReference = new(tracker, trackResurrection: true);
GC.KeepAlive(tracker);
}
~SoftReference()
{
callbackReference.TryGetTarget(out Tracker tracker);
tracker.StopTracking();
}
private void OnGC()
{
if (target != null && GC.GetGeneration(target) == GC.MaxGeneration)
{
target = null; // downgrade from soft to weak reference
}
}
public void SetTarget(T? target)
{
this.target = target;
targetReference.SetTarget(target);
}
public bool TryGetTarget(out T? target)
{
target = this.target;
return target != null || targetReference.TryGetTarget(out target);
}
} |
@timcassell , you need to suppress finalization of the target object to avoid the problem mentioned by @acaly . In |
No, that doesn't make sense at all. Moving the target and weak reference out of the tracker resolves the issue. Also, since my implementation supports tracking resurrection, we absolutely do not want to override what the user expects (and even if we don't support tracking resurrection, we still don't want to force a re-register finalization on an object we don't own). [Edit] Also, suppressing finalization of the target doesn't also suppress finalization of objects that it references, and it will still have an invalid state when we resurrect it. That's why it must be moved out of the tracker to prevent finalization at all. |
Btw, this implementation does not guarantee the target will live until a gen 2 collection, it only guarantees it will live until it is promoted to gen 2. To guarantee life until a gen 2 collection will require internal APIs. I believe there is a Gen2Callback internally of some sort. [Edit] Do resurrected objects get promoted to higher generations? If so, we could check the generation of the tracker object in its finalizer before calling [Edit2] I just reread the GC documentation, and it seems I was incorrect here. Objects that are promoted to gen 2 will only be collected in a gen 2 collection, even if they are eligible for collection during a gen 0 or gen 1 collection. |
There is another problem - the implementation is not thread safe. |
Which part? SetTarget and TryGetTarget are as thread safe as WeakReference is. [Edit] Actually I take that back. TryGetTarget should cache the target in a local before returning instead of overwriting the out variable. I thought about thread safety for the OnGC, but I wasn't sure if it really matters. Isn't the GC usually stop-the-world and single threaded? Is a concurrent GC really an issue to be concerned about? |
Ok, here's a thread-safer version. I think there's no need to try to synchronize I also don't believe public class SoftReference<T> where T : class
{
private sealed class Tracker
{
private readonly SoftReference<T> parent;
internal Tracker(SoftReference<T> parent)
{
this.parent = parent;
}
~Tracker()
{
parent.OnGC();
GC.ReRegisterForFinalize(this);
}
internal void StopTracking()
{
GC.SuppressFinalize(this);
}
}
volatile private T? target;
private readonly WeakReference<T?> targetReference;
private readonly WeakReference<Tracker> callbackReference; // WeakReference allows finalizer to run, but it always resurrects itself until this is finalized.
public SoftReference(T? target, bool trackResurrection)
{
this.target = target;
targetReference = new(target, trackResurrection);
var tracker = new Tracker(this);
callbackReference = new(tracker, trackResurrection: true);
GC.KeepAlive(tracker);
}
~SoftReference()
{
callbackReference.TryGetTarget(out Tracker tracker);
tracker.StopTracking();
}
private void OnGC()
{
T? _target = target;
if (_target != null && GC.GetGeneration(_target) == GC.MaxGeneration)
{
Interlocked.CompareExchange(ref target, null, _target); // downgrade from soft to weak reference
}
}
internal void SetTarget(T? target)
{
this.target = target;
targetReference.SetTarget(target);
}
internal bool TryGetTarget(out T? target)
{
return targetReference.TryGetTarget(out target);
}
} But if you really wanted, you could just [Edit] I removed the strong reference read in |
One bit of feedback on this and WeakReference from someone much like the proposer that uses WeakReference extensively for caching/GC control applications. |
@timcassell One lesson I learned previously from playing with resurrection is to never use it. One issue of your code is, when the Also I don't want to assume that checking the referenced object's generation can reflect the overall memory pressure. As jkotas said, it's probably much easier to explicitly check memory usage. The worst thing you can do is to add a separate background thread and check periodically. Even though, it will still be better, because GC no longer needs to handle those weak references and resurrections repeatedly, especially when the number of tracked objects increases. |
You can implement weak references and strong references yourself and add whatever interfaces to it. The standard |
Does According to the documentation, it will not be called.
I agree. I don't particularly like this approach, I was just piggy-backing off @sakno's idea. |
@timcassell @acaly Even if would not be removed from finalizer's queue, why cannot we just apply dispoable pattern? Code in finalizer will be not evaluated further. Also, shouldn't tracker inherit from CriticalFinalizerObject? using System.Runtime.ConstrainedExecution;
public class SoftReference<T>
where T : class
{
private sealed class Tracker : CriticalFinalizerObject, IDisposable
{
private bool disposed;
private readonly SoftReference<T> parent;
internal Tracker(SoftReference<T> parent)
{
this.parent = parent;
}
~Tracker()
{
if (disposed)
{
return;
}
parent.OnGC();
GC.ReRegisterForFinalize(this);
}
public void Dispose()
{
disposed = true;
GC.SuppressFinalize(this);
}
}
private volatile T? target;
private readonly WeakReference<T?> targetReference;
private readonly WeakReference<Tracker> callbackReference;
public SoftReference(T? target, bool trackResurrection)
{
this.target = target;
targetReference = new WeakReference<T?>(target, trackResurrection);
var tracker = new Tracker(this);
callbackReference = new WeakReference<Tracker>(tracker, trackResurrection: true);
GC.KeepAlive(tracker);
}
~SoftReference()
{
callbackReference.TryGetTarget(out Tracker tracker);
tracker.Dispose();
}
private void OnGC()
{
T? _target = target;
if (_target != null && GC.GetGeneration(_target) == GC.MaxGeneration)
{
Interlocked.CompareExchange(ref target, null, _target);
}
}
internal void SetTarget(T? target)
{
this.target = target;
targetReference.SetTarget(target);
}
internal bool TryGetTarget(out T? target)
{
return targetReference.TryGetTarget(out target);
}
} |
Yes, you could do that.
There is nothing critical about this. If the framework wants to clean up the domain without calling the finalizer, we don't care. |
Could be that in my perspective soft reference should be more proactive. However, CriticalFinalizerObject seems to be gurantee for freeing memory even under pressure. Isn't it a goal of SoftReference to be cleaned up under pressure? |
|
Background and motivation
MemoryCache is the "standard" way to cache things in .NET, but its behavior is unintuitive and it does not guarantee that it will evict cache entries quickly enough to prevent OutOfMemoryExceptions, among other issues such as badly bloated cache entries. Plus, it apparently relies on some kind of black magic (the documentation for which I have not been able to locate) to detect the size in bytes of objects in the cache, so that using it correctly is difficult: even if I am using it correctly, it's difficult to be confident of that! I would like to be able to put objects in a cache that contain two kinds of references: (1) references to "owned" subobjects that should be counted as part of the parent object, and (2) references to (large) shared objects that can never be evicted. I can't imagine how anything except the garbage collector would be able to detect that (1) is only reachable via the cache and so should be counted for "eviction" purposes, while (2) cannot be GC'd.
Finally, if the goal is to prevent memory exhaustion,
MemoryCache
is problematic because multiple cache instances can exist that do not coordinate with one another.Weak references tend to be collected far too quickly to be used in caches. Soft references would solve this problem. Soft references are like weak references, but garbage-collected much less aggressively.
API Proposal
An obvious interface would be to replicate
WeakReference<T>
:API Usage
Alternative Designs
Another obvious design is to define
SoftReference<T>
as a derived class ofWeakReference<T>
. A third possibility is to add "softness" as a feature of the existing WeakReference class.WeakReferences have a "track resurrection" feature. It's not clear to me that this would add value to a soft reference, but IMO this feature should be supported if it does not add significant complexity to the GC.
It should be kept in mind that after this feature is introduced, many applications could be dominated by soft references (i.e. at any given time, most objects are reachable only through soft references). Therefore, perhaps there should be a property of
GC
to control the preferred total memory usage of the process, which would affect the aggressiveness of soft-reference collection.(I would very much like a hard memory limit too, but that's another story.)
Ideally, the GC would not collect all soft-referenced object when memory pressure is encountered, but instead prioritize which objects to collect first according to some kind of "priority". I believe the most commonly-desired way to prioritize would be by recency: to first get rid of objects that have not been used recently. To that end there could be a
LastUsed
property:Rather than adding
bool updateLastUsed
as a parameter onTryGetTarget
, it could be a separate boolean property so that it is possible to configureIWeakReference.TryGetTarget()
to updateLastUsed
.It is possible that there are multiple soft references to the same object. It is tempting to put the last-used timestamp on the object itself so that there can only be a single timestamp:
However this approach would have major disadvantages:
IGCSoftReferencePriority
IGCSoftReferencePriority
is implemented might be too expensive inside the GCRisks
I have no idea how difficult it would be to implement this in the GC.
The text was updated successfully, but these errors were encountered: