[BUG]: Relinquished instance exception for still alive object #879

WillFroom · 2025-01-24T15:17:19Z

Problem description

I am hitting an issue where nanobind is raising an exception when I am trying to use a C++ object that has already been on a round-trip to python and back but is now wrapped by a outer class.

I have attached a MRE, I get the following error on the use_inner_class line: RuntimeWarning: nanobind: attempted to access a relinquished instance of type ....

I believe what I am doing is valid, but if there is a way around this please let me know.

I haven't tried reverting it but I think that this change: #591 may be being to strict?

Reproducible example code

Extension:

struct InnerClass {};

struct OuterClass {
  std::unique_ptr<InnerClass> inner_class;
};

NB_MODULE(_extension, module_) {
  namespace nb = nanobind;

  nb::class_<InnerClass>(module_, "InnerClass");

  nb::class_<OuterClass>(module_, "OuterClass")
      .def("__init__",
           [](OuterClass* self, std::unique_ptr<InnerClass> inner_class) {
             new (self) OuterClass();
             self->inner_class = std::move(inner_class);
           })
      .def(
          "get_inner_class",
          [](OuterClass* self) { return self->inner_class.get(); },
          nb::rv_policy::reference_internal);

  module_.def("make_inner_class",
              []() { return std::make_unique<InnerClass>(); });

  module_.def("use_inner_class", [](const InnerClass* inner_class) {});
}

Failing test:

  import _extension

  inner_class = _extension.make_inner_class()
  outer_class = _extension.OuterClass(inner_class)
  _extension.use_inner_class(outer_class.get_inner_class()) // RuntimeWarning: nanobind: attempted to access a relinquished instance of type InnerClass

The text was updated successfully, but these errors were encountered:

wjakob · 2025-01-27T01:33:00Z

Hello @WillFroom. First, a quick check before I look deeper into this: did you read about the nb_deleter in the documentation, and does the use of that deleter fix the issue?

Using unique casters without this special deleter has some inherent limitations that are discussed in the documentation sections on ownership.

WillFroom · 2025-01-27T08:47:20Z

Hi @wjakob, I did read the documentation about nb::deleter and did try updating the __init__ method for OuterClass to consume a std::unique_ptr<InnerClass, nb::deleter<InnerClass>> (among other combinations) and I still hit the same issue when I try and access the wrapped inner class.

wjakob · 2025-01-28T08:00:05Z

Ah. I think I understand. What you are trying to do is currently not supported by the library.

What is important to keep in mind is that nanobind associates at most one Python instance with each C++ object.

_extension.make_inner_class() creates a Python wrapper around the existing C++ pointer and transfers ownership from the C++ unique pointer to this wrapper.

outer_class = _extension.OuterClass(inner_class) transfers ownership back to C++ and marks the Python wrapper as invalid so that it cannot be used anymore.

outer_class.get_inner_class() returns a C++ pointer. nanobind checks to see if an instance wrapping this pointer exists. And indeed, there is one. It simply returns that instance.

.. but that instance is marked as invalid, so the use in _extension.use_inner_class fails.

The behavior that is tested and documented is that a type is consistently accessed using unique ownership. For example, if get_inner_class returned a unique pointer, we would revert back ownership to Python, and then this would work.

No proposal on how to fix this yet, this is just to explain the cause.

wjakob · 2025-01-28T08:19:48Z

Is it possible that your experiment works after adding:

del inner_class

just before the last line? That would confirm the hypothesis.

WillFroom · 2025-01-28T09:01:23Z

Adding del inner_class before trying to use the wrapped inner class does indeed stop the error being raised.

@vfdev-5

This commit addresses an issue arising when multiple threads want to access the Python object associated with the same C++ instance, which does not exist yet and therefore must be created. @vfdev-5 reported that TSAN detects a race condition in code that uses this pattern, caused by concurrent unprotected reads/writes of internal ``nb_inst`` fields. There is also a larger problem: depending on how operations are sequenced, it is possible that two threads simultaneously create a Python wrapper, which violates the usual invariant that each (C++ instance pointer, type) pair maps to at most one Python object. This PR updates nanobind to preserve this invariant. When registering a newly created wrapper object in the internal data structures, nanobind checks if another equivalent wrapper has been created in the meantime. If so, we destroy the thread's instance and return the registered one. This requires some extra handling code, that, however, only runs with very low probability. It also adds a new ``registered`` bit flag to ``nb_inst``, which makes it possible to have ``nb_inst`` objects that aren't registered in the internal data structures. I am planning to use that feature to fix the (unrelated) issue #879.

wjakob · 2025-01-28T13:24:22Z

Hello @WillFroom,

I thought about this issue some more, and it's complicated. I think it will need a significant redesign of how unique pointers are handed in nanobind.

Currently, instances have a "relinquished" flag to indicate that ownership has been transferred away. However, the data structures of nanobind still map the C++ instance pointer to this relinquished object, which prevents return via rv_policy::reference or rv_policy::reference_internal. To support your use case, we'd need a new mode for nb_inst instances that keeps them from being listed in the instance map, so that they cannot be found. Subsequently returning via rv_policy::reference_internal can then return a second Python object. However, the details of this are gnarly, likely has tons of corner cases, and it's possible that the whole notion of the relinquished flag needs to be dispensed with. The requirement for any fix would also be that it doesn't impact runtime performance of the general case (code that doesn't use unique pointers) or add huge amounts of code to the headers.

This isn't related to anything I need for my own work, and it will take a lot of time to understand and fix. I think I prefer to close it as "out of of (my) support scope" with an express invitation for external contributions.

WillFroom changed the title ~~[BUG]: Relinquished instance exception for still alive type~~ [BUG]: Relinquished instance exception for still alive object Jan 24, 2025

wjakob mentioned this issue Feb 2, 2025

WIP: Handle dangling objects and rv_policy changes robustly #889

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Relinquished instance exception for still alive object #879

[BUG]: Relinquished instance exception for still alive object #879

WillFroom commented Jan 24, 2025 •

edited

Loading

wjakob commented Jan 27, 2025 •

edited

Loading

WillFroom commented Jan 27, 2025 •

edited

Loading

wjakob commented Jan 28, 2025

wjakob commented Jan 28, 2025

WillFroom commented Jan 28, 2025

wjakob commented Jan 28, 2025 •

edited

Loading

[BUG]: Relinquished instance exception for still alive object #879

[BUG]: Relinquished instance exception for still alive object #879

Comments

WillFroom commented Jan 24, 2025 • edited Loading

Problem description

Reproducible example code

wjakob commented Jan 27, 2025 • edited Loading

WillFroom commented Jan 27, 2025 • edited Loading

wjakob commented Jan 28, 2025

wjakob commented Jan 28, 2025

WillFroom commented Jan 28, 2025

wjakob commented Jan 28, 2025 • edited Loading

WillFroom commented Jan 24, 2025 •

edited

Loading

wjakob commented Jan 27, 2025 •

edited

Loading

WillFroom commented Jan 27, 2025 •

edited

Loading

wjakob commented Jan 28, 2025 •

edited

Loading