New caching logic may inadvertently release objects that are still needed #257

samschott · 2023-01-23T23:01:05Z

Describe the bug

The current logic to synchronise object cache access, introduced in #246, may lead to premature releases, because the return values from alloc() and init() calls will point to the same memory address for the allocated and the initialed object but the class name may have changed between those calls.

The code below will then result in the ObjCInstance returned by alloc() to be a different Python object than the one returned from the following init().

rubicon-objc/src/rubicon/objc/api.py

Lines 898 to 902 in e198bb0

    
           if cached_class_name != current_class_name: 
        
               # There has been a cache hit, but the object is a different class. 
        
               # Purge the cache for this address, and treat it as a cache miss. 
        
               del cls._cached_objects[object_ptr.value] 
        
               raise KeyError(object_ptr.value)

Because the object returned by alloc() is typically not stored in any variable in a chained alloc().init() call, it will be garbage collected, leading to a release() call from our own __del__ override.

rubicon-objc/src/rubicon/objc/api.py

Lines 964 to 970 in e198bb0

    
               def __del__(self): 
        
                   """Release the corresponding objc instance if we own it, i.e., if it 
        
                   was returned by by a method starting with 'alloc', 'new', 'copy', or 
        
                   'mutableCopy' and it wasn't already explicitly released by calling 
        
                   :meth:`release` or :meth:`autorelease`.""" 
        
                   if self._needs_release: 
        
                       send_message(self, "release", restype=objc_id, argtypes=[])

Steps to reproduce

Run the following script:

from toga_cocoa.libs import NSWindow
from rubicon.objc.runtime import libobjc

w_pre_init = NSWindow.alloc()

ptr_pre_init = w_pre_init.ptr.value
name_pre_init = libobjc.class_getName(libobjc.object_getClass(w_pre_init.ptr))

w_post_init = w_pre_init.init()

ptr_post_init = w_post_init.ptr.value
name_post_init = libobjc.class_getName(libobjc.object_getClass(w_post_init.ptr))

print(f"name_pre_init = {name_pre_init}")
print(f"name_post_init = {name_post_init}")
print(f"pointers equal: {ptr_pre_init == ptr_post_init}")

This can in practice lead to segfaults when creating a window early but showing it only later, e.g., on button press.

Expected behavior

We should not send release() immediately after an init() call, because it will be sent to the same memory address of an object that we still want.

Screenshots

No response

Environment

Operating System: macOS 13.1
Python version: 3.11.1
Software versions:
- rubicon-objc: 0.4.4

Logs

No response

Additional context

I've observed such class-name changes in particular with NSWindow, NSView and NSApplication but no other classes so far.

The text was updated successfully, but these errors were encountered:

samschott · 2023-01-23T23:09:20Z

@freakboy3742, I see that you even have left a comment about class-changes happening during initialisation:

rubicon-objc/src/rubicon/objc/api.py

Lines 785 to 792 in e198bb0

    
           # 1. Objective-C runtime provides a function object_setClass that 
        
           #    can change an object's class after creation, and some code 
        
           #    manipulates objects' isa pointers directly (although the latter 
        
           #    is no longer officially supported by Apple). This is not 
        
           #    commonly done in practice, and even then it is usually only 
        
           #    done during object creation/initialization, so it's basically 
        
           #    safe to assume that an object's class will never change after 
        
           #    it's been wrapped in an ObjCInstance.

I guess you did not heed your own warning ;)

freakboy3742 · 2023-01-24T06:21:55Z

Thanks for the report - this one's a doozy.

I can reproduce the results you've described with the test code; I haven't been able to reproduce a segfault, but that's going to be unpredictable by definition. Interestingly, in the process of writing #246, I saw cache evictions for NSWindow/NSKVONotifying_NSWindow, but didn't catch the implications on memory allocation (and I didn't see a segfault either).

Three possible fixes I can think of:

Widen the 'stale class name' criteria

Instead of using a strict classname match, we broaden the criteria to identify common renames that don't constitute a an actual stale class. For example, we could use current_class_name.endswith(f'_{cached_class_name}') as a class name match, rather than ==. That would be sufficient for the NSWindow case; I haven't seen what NSView or NSApplication name changes can occur, so I'm not sure if this approach would be sufficient in the general case.

Preserve `_needs_release`

When evicting a "stale" object from the instance cache, check the _needs_release value on the stale instance. If it's True, set it to False on the stale instance, but ensure that the replacement cache value has _needs_release=True.

In the NSWindow.alloc().init() case from your example, this means the instance created around the alloc() will be set _needs_release=True; the updated instance generated for the init() call will inherit the _needs_release=True value, but the alloc() wrapper will be set back to _needs_release=False, so when it is disposed of, it won't trigger a release on the still-valid object.

The problem I can see here is that I'm not 100% sure what will happen for the stale object case at the core of #246 - if the stale object was set "_needs_release = True", that means the stale object won't be released, but the new object will think it needs to be released, even though that may not be warranted.

Something akin to #256

Instead of Rubicon trying to directly mirror the ownership of ObjC objects, we move to a model where we try to control memory allocation on the ObjC side by waiting for Python to garbage collect. Essentially every Python-side ObjCInstance instance gets a retain on creation, and a release on __del__. At the very least, we'd know that if an ObjCInstance exists, the memory address is valid; but obviously there's a big risk of a memory leak here unless we make sure Python is disposing of objects.

@samschott @dgelessus Any thoughts on any of these?

mhsmith · 2023-01-24T07:49:35Z

Essentially every Python-side ObjCInstance instance gets a retain on creation, and a release on __del__. At the very least, we'd know that if an ObjCInstance exists, the memory address is valid; but obviously there's a big risk of a memory leak here unless we make sure Python is disposing of objects.

This is the approach that Chaquopy has always taken, with the intention of making it simply impossible for Python code using the library to cause an invalid pointer access. In practice it's always worked very well, and I'm not aware of it ever causing a memory leak.

As I understand it, CPython always calls __del__ immediately when an object's reference count drops to zero, and augments this with a periodic garbage collection of unreachable objects contained in reference cycles. In most cases, there will be no cycle, and the object will be released deterministically.

As mentioned in #256, the main concern would be "cyclic references between Python objects and Objective C objects". I'll have to leave it to those who understand the code better than me to judge how common that would be.

samschott · 2023-01-24T21:13:07Z

I can reproduce the results you've described with the test code; I haven't been able to reproduce a segfault, but that's going to be unpredictable by definition.

I've also struggled to reproduce the segfault in a minimal example, but I'm seeing it consistently in full apps. This includes my own, but also toga's window example app: click "Create Window" and three windows pop up. Try to close the top one and you consistently get a segfault.

Regarding the options which you presented:

Widen the 'stale class name' criteria

This approach would certainly work for the class-name changes that I have observed, but it is hacky because it relies on deducing a rule from a pattern of three. IMO, it makes an already hacky workaround even worse. Gripes that I already have with the current approach:

Just because the class name and memory address are the same, it doesn't mean that we are still looking at the same instance. As you have noted in the in-line comment, this is a fair compromise.
The lock guards against race conditions on the Python side, but we still might have something on the Obj-C side deallocating the object and replacing with another after our checks pass but before we return the cache-hit.

Preserve `_needs_release`

That seems to set us up for more chaos and segfaults. As you write:

if the stale object was set "_needs_release = True", that means the stale object won't be released, but the new object will think it needs to be released, even though that may not be warranted.

Indeed, there is no guarantee that the new object needs to be released. In the alloc().init() case, this is certainly true, but in the more general case which prompted those changes, it won't be.

Something akin to #256

The more I think about it, the more I like this option. We initially introduced the _needs_release logic to ensure that objects returned by alloc() and related calls, which are automatically owned by us, do get released eventually without requiring explicit release() calls by the user. The latter would be very challenging to get right in Python.

This was the minimally-invasive solution. The alternative would be to manually issue retain() calls whenever an ObjCInstance is created, except for alloc() and related calls, and always release objects on __del__(). This should work just as well.

Cyclic references between Python objects and Obj-C objects, or on the Obj-C, are already a problem which the user must work around by using objc_property(weak=True) whenever those may be introduced.

That being said, we might run to a whole number of corner cases and memory leaks which we did not anticipate. We will need to carefully think through different code paths. The problem with memory leaks: they are harder to notice and debug than segfaults.

One path could be to implement option 1 for the next release, because the current thread-safety issues do need to be fixed, while preparing option 3 in a release candidate and carefully testing it with a number of toga apps.

freakboy3742 · 2023-01-24T23:39:01Z

Completely agreed that option (1) is a hack stacked on a hack; however, it's also the fastest path forward right now. I've pushed #258 as an implementation of that approach, and a solution to this immediate issue (it seems to fix the window example); a deeper investigation of #256 is likely the "real" fix here.

samschott added the bug A crash or error in behavior. label Jan 23, 2023

freakboy3742 mentioned this issue Jan 24, 2023

Allow for some class renames between alloc and init #258

Merged

4 tasks

freakboy3742 closed this as completed in #258 Jan 30, 2023

samschott mentioned this issue Nov 19, 2024

Force cache eviction when instance classes don't match. #540

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New caching logic may inadvertently release objects that are still needed #257

New caching logic may inadvertently release objects that are still needed #257

samschott commented Jan 23, 2023 •

edited

Loading

samschott commented Jan 23, 2023 •

edited

Loading

freakboy3742 commented Jan 24, 2023

mhsmith commented Jan 24, 2023

samschott commented Jan 24, 2023 •

edited

Loading

freakboy3742 commented Jan 24, 2023

New caching logic may inadvertently release objects that are still needed #257

New caching logic may inadvertently release objects that are still needed #257

Comments

samschott commented Jan 23, 2023 • edited Loading

Describe the bug

Steps to reproduce

Expected behavior

Screenshots

Environment

Logs

Additional context

samschott commented Jan 23, 2023 • edited Loading

freakboy3742 commented Jan 24, 2023

Widen the 'stale class name' criteria

Preserve _needs_release

Something akin to #256

mhsmith commented Jan 24, 2023

samschott commented Jan 24, 2023 • edited Loading

Widen the 'stale class name' criteria

Preserve _needs_release

Something akin to #256

freakboy3742 commented Jan 24, 2023

samschott commented Jan 23, 2023 •

edited

Loading

samschott commented Jan 23, 2023 •

edited

Loading

Preserve `_needs_release`

samschott commented Jan 24, 2023 •

edited

Loading

Preserve `_needs_release`