Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow prealloc and registration before calling C++ ctor #4116

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

adriendelsalle
Copy link

@adriendelsalle adriendelsalle commented Aug 3, 2022

Description

This is a attempt to allow access to a Python derived class instance from C++ constructor.
At this time, the constructor is called before allocation and registration so py::cast(this) inside constructor doesn't return the expected Python object.

This PR:

  • adds a new preallocate annotation to be passed when declaring a constructor to select new specializations
    • this extra is added to keep the current behavior, e.g. not rely on placement new operator
  • make construct_or_initialize to construct C++ object inplace and add specializations (relying on SFINAE) to handle both preallocated and not preallocated (current and unchanged behavior) constructors:
    • allocate the memory
    • register the C++ object
    • call the constructor using a placement new operator

Closes #4114

@rwgk
Copy link
Collaborator

rwgk commented Aug 4, 2022

First impression:

  • the kind of magic you're aiming for is likely to be surprising / confusing
  • I think there will not be a lot of use cases
  • but it adds quite a bit of complexity to already very complex code

I'll click the Approve CI run button here in case you want to keep experimenting.

@adriendelsalle
Copy link
Author

@rwgk thanks for having triggered the CI and for the feedback!

I'm targeting to get the same flexibility as Python __init__: not only to set new attributes but also be able to mutate, do some validation steps or some other costly operation dynamically at initialization of the instance. Which is the point of initialization.

For now, someone how wants to initialize Python object the same way from C++ code should do something like this (taking the same example as I took in the issue/discussion):

  • C++ side
#include "pybind11/pybind11.h"
#include "pybind11/stl.h"

namespace py = pybind11;

class Base {
public:
  Base() = default;
  virtual ~Base() = default;
};

class BaseTrampoline : public Base {
public:
  BaseTrampoline() = default;

  void do_some_modifications() {
    auto pyobj = py::cast(this);
    pyobj.attr("bar") = 10.;
    for (auto &n : pyobj.attr("__inputs__").cast<std::vector<std::string>>())
      pyobj.attr(n.data()) = n;
  }
};

PYBIND11_MODULE(test, m) {
  py::class_<Base, BaseTrampoline>(m, "Base")
      .def(py::init_alias<>() /*, py::preallocate()*/)
      .def("do_some_modifications",
           [](BaseTrampoline &t) { t.do_some_modifications(); });
}
  • Python side
from test import Base


class Derived(Base):
    __inputs__ = ["a", "b"]

    def __init__(self):
        Base.__init__(self)
        self.do_some_modifications()


d = Derived()

assert d.bar == 10.0
assert d.a == "a"
assert d.b == "b"

or to simplify the syntax for end users:

from test import Base as Base_

class Base(Base_):
    def __init__(self):
        Base_.__init__(self)
        self.do_some_modifications()

class Derived(Base):
    __inputs__ = ["a", "b"]

d = Derived()

assert d.bar == 10.0
assert d.a == "a"
assert d.b == "b"

It requires:

  • an extra class member function
  • an extra binded method
  • to overload the __init__ in all derived class, call the base __init__ and this new binded method
  • to pay the cost of 2 calls instead of one

I'll also try to improve the PR to get minimal changes, and add tests!

@rwgk
Copy link
Collaborator

rwgk commented Aug 5, 2022

General high-level cost-benefit thinking:

  • cost: increase of complexity in library (include/pybind11 in this case)
  • benefit: decreased complexity in user code

The benefit must justify the cost.

If a change only shifts complexity in user code (e.g. from Python user code to C++ user code) it is not a net benefit, and I'd be very skeptical about accepting a cost.

I'll look at the final version with that in mind.

@adriendelsalle adriendelsalle force-pushed the allow-cast-in-ctor branch 3 times, most recently from 52cce3f to 8332ca1 Compare August 8, 2022 11:31
allocate memory before calling the ctor
register the instance before calling the ctor
use a placement new call to set the value_ptr
use `prealloacte` ennotation to select placement new
refactor construct_or_initialize to construct inplace
add a type trait to check if a template param pack contains a type
add tests
add documentation
@adriendelsalle
Copy link
Author

adriendelsalle commented Aug 8, 2022

Fair enough @rwgk, I just :

  • removed the template specialization from constructor::execute and alias_constructor::execute to simplify, and now delegate to construct_or_initialize the inplace construction, with or without preallocation and registration
  • added tests
  • added documentation

It looks ready for review from my perspective, let me know if you have any question I'm happy to discuss about those changes!

@adriendelsalle adriendelsalle marked this pull request as ready for review August 8, 2022 11:44
typename... Args,
detail::enable_if_t<std::is_constructible<Class, Args...>::value && Preallocate, int> = 0>
inline void construct_or_initialize(value_and_holder &v_h, Args &&...args) {
v_h.value_ptr() = malloc(sizeof(Class));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure we only 'Pymem_alloc*' apis for portability reasons.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I need to find time to look at this carefully, but a quick search for "free" didn't match anything in the diff.

@adriendelsalle did you look already how the memory is deallocated? We want to be sure it is compatible with the allocation mechanism.

Purely from memory and hand-waving: I believe for old-style __init__ the current code can wind up with a delete that has no matching new (some other allocation mechanism is used). If you happen to stumble over that, too, adding a comment would be great. (I wanted to do that but never got back to it.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome to have your comments!
I'm using the lib for years but only started to go in the details few days ago..so I'm not really comfortable with alloc/dealloc.
Is there some dev or design/architecture doc for that? I'll do my best to digest that tomorrow!

Copy link
Collaborator

@rwgk rwgk Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some dev or design/architecture doc for that?

Not to my knowledge.
At least I found it necessary to learn the hard way, experimenting a lot with the code.
Feel free to ask questions here. Maybe we can learn together.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing the only now, the valgrind failure seems to be spot-on:

https://github.com/pybind/pybind11/runs/7732180285?check_suite_focus=true

==7330== Mismatched free() / delete / delete []

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that! Hopefully I added tests to cover the changes. I'll reproduce locally an debug that

Feel free to ask questions here. Maybe we can learn together.

👍

Copy link
Author

@adriendelsalle adriendelsalle Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a change to replace malloc with new allocation to be fully consistent with delete used in dealloc. It should be then very similar to what's done in the allocation+construction overload of construct_or_initialize.

It's now all green using Valgrind on the tests locally. I used act to play the CI job.

I am pretty sure we only 'Pymem_alloc*' apis for portability reasons.

I don't really get the point, could you please elaborate a bit? I don't see why allocation and construction of a C++ object should use some Py* alloc.

Here is my understanding of how the changes integrate with the implem:

  • passing py::preallocate() as an extra argument of a new style constructor will preallocate and register just before actually calling the constructor of the class
  • if the call is fine, the holder is still not registered making this piece of code to be called (only to init the holder then):
    if (overloads->is_constructor && !self_value_and_holder.holder_constructed()) {
    auto *pi = reinterpret_cast<instance *>(parent.ptr());
    self_value_and_holder.type->init_instance(pi, nullptr);
    }
  • if the call is not successful (because of cast error AFAIU), the next overload try (if any) will deallocate the memory via operator delete if already allocated (no holder constructed yet):
    if (func.is_new_style_constructor) {
    // The `value` may have been preallocated by an old-style `__init__`
    // if it was a preceding candidate for overload resolution.
    if (self_value_and_holder) {
    self_value_and_holder.type->dealloc(self_value_and_holder);
    • maybe it would worth to update the comment because now it's not only in case of old-style __init__ that the memory is already allocated?
    • I don't really understand the comment:
      } else if (rec->is_new_style_constructor && arg_index == 0) {
      // A new-style `__init__` takes `self` as `value_and_holder`.
      // Rewrite it to the proper class type.

      since it looks like to me the opposite: a new-style __init__ takes value_and_holder as self, isn't it?
      call.init_self = PyTuple_GET_ITEM(args_in, 0);
      call.args.emplace_back(reinterpret_cast<PyObject *>(&self_value_and_holder));

@rwgk
Copy link
Collaborator

rwgk commented Aug 11, 2022

I spent a few minutes to look through the code, without too much attention to details. Technically this is a undoubtedly a high-quality PR, but ...

From the new documentation:

In advanced cases, it's really handy to access the Python sibling of a C++ object to get/set attributes, do some heavy computations, just hide the implementation details, or even dynamically create a new attribute!

To me this reads like an invitation to create a mess. (Even the existing py::dynamic_attr has pitfalls that often make me turn over in my mind how we could get rid of it.)

The general worry: mismatches between the C++ side and the Python side of a given type.

Similar to what I wrote before, mismatches are likely to be surprising and confusing, ultimately bug prone.

Another metaphor that often crosses my mind: this PR creates another wrinkle in the carpet. Someone is bound to stumble over it, at scale every day.

I'm not exactly opposed to adding py::preallocate(), the net delta in the production code is only about +30 lines, but I'd want the documentation to be a big fat warning: you can do this if you really need to, but better don't, you'll likely regret it in the long run. Better to use pybind11 for what it is, a bindings library, rather than mixing in "heavy computations", "hiding implementation details" (which obfuscate mismatches), and "dynamically create a new attribute" (more mismatches).

This PR is at a scale that at least one or two other maintainers will have to approve.

@henryiii, @Skylion007 what do you think?

@adriendelsalle
Copy link
Author

Thanks for taking time and the detailed feedback @rwgk , I understand the worries about creating complexity.

I'm just trying to make it more straightforward for people having a quite intensive use of the lib to take advantage of C++ code in initialization of the Python object (vs the inefficient workaround I previously posted in this PR).

The general worry: mismatches between the C++ side and the Python side of a given type.

What do you mean? Here the C++ constructor will act just as a regular Python base class would do (create new attributes, set existing ones, do some processing/checks/whatever)

But I can only speak from my experience and about my use cases, far from me the idea of adding confusion :).
I can surely improve the documentation as you mentioned, let me know!

@rwgk
Copy link
Collaborator

rwgk commented Aug 11, 2022

Here the C++ constructor will act just as a regular Python base class would do

That's not common and not obvious. It will come as a surprise to anyone but the author of the code. People will wonder what's going on, and will have to dig in to find out that some non-bindings logic is hidden in the bindings layer.

I can see that the feature this PR is enabling could be useful occasionally when migrating from one bindings tool to another, e.g. migrating from SWIG to pybind11, when both the C++ API and the Python API are more-or-less frozen already and there is no practical alternative to fully emulating the established behavior. If the documentation was written to emphasize that, but generally discourage packing non-bindings logic into the bindings layer, it would look much better to me.

Another thought that crossed my mind in the meantime: do we actually need py:: preallocate, vs changing the behavior to preallocate universally? Does something break if we do that?

@adriendelsalle
Copy link
Author

If the documentation was written to emphasize that, but generally discourage packing non-bindings logic into the bindings layer, it would look much better to me.

will do!

Another thought that crossed my mind in the meantime: do we actually need py:: preallocate, vs changing the behavior to preallocate universally? Does something break if we do that?

yes it would break classes with custom operators, see #948. specific tests have been written for that. I guess I also have to add a big warning about this non working combo.

@Skylion007
Copy link
Collaborator

BTW, have we tested if this works with __new__ construction as well? Additionally, do we still need to preallocate the attrs when using a __new__ construction? Or is that a valid workaround? Or perhaps we should do the preallocation by override the new method? I know it's not that well documented compared to the py::init() method, and I believe it run before the __init__ function. There is an example of how to define the __new__ method in test.

@adriendelsalle
Copy link
Author

I don't know..I'll take a look, thanks for the pointer!

@adriendelsalle
Copy link
Author

have we tested if this works with new construction as well?

__new__ construction, if I understand correcly, allows a factory to return a constructed C++ object, which in turn will generate a Python instance and go back in C++ no-op __init__, that is never called because caught in

if (self_value_and_holder.instance_registered()) {
return none().release().ptr();
}

So this PR is not changing anything on that: it's not possible to register a C++/Python couple if the Python or C++ instance doesn't exist yet

Additionally, do we still need to preallocate the attrs when using a new construction? Or is that a valid workaround?

I don't get it sorry :/. Could you please elaborate on that?
If you want to know if defining __new__ allows to access attributes, nope (see the previous bullet)

Or perhaps we should do the preallocation by override the new method?

Yes it could be an option, it also mean that the Python object needs to be allocated there to do the job of the pybind11_object_new that has been overriden

type->tp_new = pybind11_object_new;

It can be also quite tricky to keep the existing behavior with C++ construction in __init__ while moving the allocation in __new__. At least the flag/extra option should be passed and stored on the class record (then with much more impacts on the code base) to make sure the behavior in both methods are consistent.

There is an example of how to define the new method in test.

I don't really see how it's useful since there is no capability to act on the Python object instantiation from C++. For now, the C++ alloc/init is something totally independent from the Python one. Maybe to init the C++ object in __init__ without having to call the base init..

Another point is that __new__ needs to return an instance that is not necessarily of the same type than the current class.
I'm pretty sure that subclassing in Python a base redefining __new__ that way would not work: the subclass instance object would have the same type as the closest C++ base.. I'll open a specific issue for that.

@adriendelsalle
Copy link
Author

Also note that I currently didn't implemented the support of preallocation for detail::initimpl::factory and detail::initimpl::pickle_factory. I need to take a look how to do this with minimum changes if necessary.

@adriendelsalle
Copy link
Author

Gentle ping @Skylion007 @rwgk
What do you recommend to make it possible to merge? So that way I can improve this PR in the direction you would like. Thanks

@rwgk
Copy link
Collaborator

rwgk commented Sep 3, 2022

Coming back to this:

Another thought that crossed my mind in the meantime: do we actually need py:: preallocate, vs changing the behavior to preallocate universally? Does something break if we do that?

yes it would break classes with custom operators, see #948. specific tests have been written for that. I guess I also have to add a big warning about this non working combo.

Is that something we could handle with SFINAE (enable_if)?
Enabling the feature you're after, without introducing another thing users have to learn about (py::preallocate()), would be much better. You could cut the new documentation almost in half, simply stating that it also works for constructors if the wrapped C++ type does not have a custom operator new.

Other thoughts:

  • We'll need approvals from two maintainers.
  • I can run Google-global testing only for changes on top of the smart_holder branch.

Not being sure if there is at least one other maintainer to support this PR, I'm reluctant investing the time applying this pretty tricky PR to the smart_holder branch. Is that something you could try? I'm not sure if it will just work, or get us deep into the weeds. Note that the smart_holder tests are run twice: 1. with unique_ptr as the default holder (like master), 2. with smart_holder as the default holder. I think 1. will probably just work, but I'm not sure about 2.

Currently the smart_holder branch is only a couple minor commits behind master. I'll update again sometime soon, but I don't think it'll matter for applying this PR to the branch.

If you get this working without the py::preallocate() type and it passes global testing, I think this PR will be a nice to have.

@Skylion007
Copy link
Collaborator

I was thinking about this some more and really the main reason you want to prealloc is add Py::Objects to a dynamic dictionary after construction. So wouldn't a better solution would be to have an entry point for the python construction like we do the C++ construction. Or in other words, have some function that is evaluated lazily after the C++ object is constructed/initialized? This is already possible with factory methods. Really it sounds like the py::init() is doing the C++ initialization, which it maybe shouldn't be. __new__ should be, but we abuse it for other things. If we could define a post__init similar to how attrs does, that would be ideal: https://www.attrs.org/en/stable/init.html

@adriendelsalle
Copy link
Author

Thanks both of you for the help/feedbacks!
100% agree with your comment about __new__ and __init__, and I imagined few solutions around that but didn't spent enough time to make it fully working.

The post constructor hook would probably make it, I'll investigate!
I'll probably take a look in ~few weeks at that, especially if some tricky rebasing is necessary.

@rwgk
Copy link
Collaborator

rwgk commented Sep 9, 2022

I'll probably take a look in ~few weeks at that, especially if some tricky rebasing is necessary.

If you had rebase on smart_holder in mind: it's not a requirement, but it would help me a lot running our (Google's) global testing for this PR sooner rather than later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants