Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Relinquished instance exception for still alive object #879

Open
WillFroom opened this issue Jan 24, 2025 · 6 comments
Open

[BUG]: Relinquished instance exception for still alive object #879

WillFroom opened this issue Jan 24, 2025 · 6 comments

Comments

@WillFroom
Copy link

WillFroom commented Jan 24, 2025

Problem description

I am hitting an issue where nanobind is raising an exception when I am trying to use a C++ object that has already been on a round-trip to python and back but is now wrapped by a outer class.

I have attached a MRE, I get the following error on the use_inner_class line: RuntimeWarning: nanobind: attempted to access a relinquished instance of type ....

I believe what I am doing is valid, but if there is a way around this please let me know.

I haven't tried reverting it but I think that this change: #591 may be being to strict?

Reproducible example code

Extension:

struct InnerClass {};

struct OuterClass {
  std::unique_ptr<InnerClass> inner_class;
};

NB_MODULE(_extension, module_) {
  namespace nb = nanobind;

  nb::class_<InnerClass>(module_, "InnerClass");

  nb::class_<OuterClass>(module_, "OuterClass")
      .def("__init__",
           [](OuterClass* self, std::unique_ptr<InnerClass> inner_class) {
             new (self) OuterClass();
             self->inner_class = std::move(inner_class);
           })
      .def(
          "get_inner_class",
          [](OuterClass* self) { return self->inner_class.get(); },
          nb::rv_policy::reference_internal);

  module_.def("make_inner_class",
              []() { return std::make_unique<InnerClass>(); });

  module_.def("use_inner_class", [](const InnerClass* inner_class) {});
}

Failing test:

  import _extension

  inner_class = _extension.make_inner_class()
  outer_class = _extension.OuterClass(inner_class)
  _extension.use_inner_class(outer_class.get_inner_class()) // RuntimeWarning: nanobind: attempted to access a relinquished instance of type InnerClass
@WillFroom WillFroom changed the title [BUG]: Relinquished instance exception for still alive type [BUG]: Relinquished instance exception for still alive object Jan 24, 2025
@wjakob
Copy link
Owner

wjakob commented Jan 27, 2025

Hello @WillFroom. First, a quick check before I look deeper into this: did you read about the nb_deleter in the documentation, and does the use of that deleter fix the issue?

Using unique casters without this special deleter has some inherent limitations that are discussed in the documentation sections on ownership.

@WillFroom
Copy link
Author

WillFroom commented Jan 27, 2025

Hi @wjakob, I did read the documentation about nb::deleter and did try updating the __init__ method for OuterClass to consume a std::unique_ptr<InnerClass, nb::deleter<InnerClass>> (among other combinations) and I still hit the same issue when I try and access the wrapped inner class.

@wjakob
Copy link
Owner

wjakob commented Jan 28, 2025

Ah. I think I understand. What you are trying to do is currently not supported by the library.

What is important to keep in mind is that nanobind associates at most one Python instance with each C++ object.

_extension.make_inner_class() creates a Python wrapper around the existing C++ pointer and transfers ownership from the C++ unique pointer to this wrapper.

outer_class = _extension.OuterClass(inner_class) transfers ownership back to C++ and marks the Python wrapper as invalid so that it cannot be used anymore.

outer_class.get_inner_class() returns a C++ pointer. nanobind checks to see if an instance wrapping this pointer exists. And indeed, there is one. It simply returns that instance.

.. but that instance is marked as invalid, so the use in _extension.use_inner_class fails.

The behavior that is tested and documented is that a type is consistently accessed using unique ownership. For example, if get_inner_class returned a unique pointer, we would revert back ownership to Python, and then this would work.

No proposal on how to fix this yet, this is just to explain the cause.

@wjakob
Copy link
Owner

wjakob commented Jan 28, 2025

Is it possible that your experiment works after adding:

del inner_class

just before the last line? That would confirm the hypothesis.

@WillFroom
Copy link
Author

Adding del inner_class before trying to use the wrapped inner class does indeed stop the error being raised.

wjakob added a commit that referenced this issue Jan 28, 2025
This commit addresses an issue arising when multiple threads want to
access the Python object associated with the same C++ instance, which
does not exist yet and therefore must be created. @vfdev-5 reported that
TSAN detects a race condition in code that uses this pattern, caused by
concurrent unprotected reads/writes of internal ``nb_inst`` fields.

There is also a larger problem: depending on how operations are
sequenced, it is possible that two threads simultaneously create a
Python wrapper, which violates the usual invariant that each (C++
instance pointer, type) pair maps to at most one Python object.

This PR updates nanobind to preserve this invariant. When registering a
newly created wrapper object in the internal data structures, nanobind
checks if another equivalent wrapper has been created in the meantime.
If so, we destroy the thread's instance and return the registered one.

This requires some extra handling code, that, however, only runs with
very low probability. It also adds a new ``registered`` bit flag to
``nb_inst``, which makes it possible to have ``nb_inst`` objects that
aren't registered in the internal data structures. I am planning to use
that feature to fix the (unrelated) issue #879.
@wjakob
Copy link
Owner

wjakob commented Jan 28, 2025

Hello @WillFroom,

I thought about this issue some more, and it's complicated. I think it will need a significant redesign of how unique pointers are handed in nanobind.

Currently, instances have a "relinquished" flag to indicate that ownership has been transferred away. However, the data structures of nanobind still map the C++ instance pointer to this relinquished object, which prevents return via rv_policy::reference or rv_policy::reference_internal. To support your use case, we'd need a new mode for nb_inst instances that keeps them from being listed in the instance map, so that they cannot be found. Subsequently returning via rv_policy::reference_internal can then return a second Python object. However, the details of this are gnarly, likely has tons of corner cases, and it's possible that the whole notion of the relinquished flag needs to be dispensed with. The requirement for any fix would also be that it doesn't impact runtime performance of the general case (code that doesn't use unique pointers) or add huge amounts of code to the headers.

This isn't related to anything I need for my own work, and it will take a lot of time to understand and fix. I think I prefer to close it as "out of of (my) support scope" with an express invitation for external contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants