Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using C extention base class for every message #193

Open
wants to merge 7 commits into
base: rolling
Choose a base branch
from

Conversation

almaslov
Copy link

@almaslov almaslov commented Feb 7, 2023

Added C extension base class for every message. These base classes define native fields and corresponding slots(aka struct PyMemberDef). During deserialization they allow to call __new__(which is C function) instead of expensive __init__(which is pure Python) and initialize base class C fields instead of Python class members.

Details are in #192

@almaslov
Copy link
Author

almaslov commented Feb 8, 2023

As far as I can understand errors in CI, my code depends on commit from rosidl_parser that is not tagged yet, and so not available in latest package from http://repo.ros2.org/ubuntu/testing. So CI will fill for all branches forked from current rolling.

@clalancette
Copy link
Contributor

So CI will fill for all branches forked from current rolling.

Yeah, that's correct. We need to do some releases, hopefully later this week. We have other CI that we can run (by hand), so after this goes through review we can rely on that.

@almaslov almaslov force-pushed the c-extention-base-class branch from e8b82d4 to b305037 Compare February 14, 2023 08:57
@Voldivh
Copy link
Contributor

Voldivh commented Feb 15, 2023

Hi @almaslov, I was trying to test out your solution. However, I encountered an error during compilation once I switched to your branch. The error was:

make[2]: *** No rule to make target 'rosidl_generator_py/rosidl_generator_py/_rosidl_generator_py_bases.c', needed by 'CMakeFiles/rosidl_generator_py_custom__bases.dir/rosidl_generator_py/rosidl_generator_py/_rosidl_generator_py_bases.c.o'.  Stop.

I believe this line is not generating the appropriate files or is not generating them on the proper path. Could you take a look at it?

@almaslov
Copy link
Author

Hello! Thank you for your time!
I've just tried to build rosidl_generator_py from scratch in fresh ubuntu:jammy docker container and had no problems. I've walked through this guide to setup environment for my build: https://docs.ros.org/en/rolling/Installation/Alternatives/Ubuntu-Development-Setup.html .

_bases.c file is generated in build/rosidl_generator_py/rosidl_generator_py/rosidl_generator_py/_rosidl_generator_py_bases.c as expected.
I believe I had this issue while development and cleaning package build directory fixed it. If it doesn't help in your case, could you please describe your environment.

@Voldivh
Copy link
Contributor

Voldivh commented Feb 17, 2023

Hi! After cleaning the environment it compiled without issue, I did notice that the problem happens when you try to compile on top of a previous build from other branches.

The test I was trying to do was to publish a message (in particular a Image from the sensor_msgs) and measure the time it took from publishing the message until the subscriber received it. I believe that the message should pass through the serialization and deserealization of the message. I tried out that test with the rolling branch of this repository and the branch from your PR as well and both yielded basically the same amount of time. I thought I was going to be able to notice a difference in the latency of the messages when switching the branches. Could you perhaps point me in the right direction to test out your branch improvement? I'm not sure if perhaps the message was too small (The image was something around 200Kb) as to notice a difference in performance or if I should go with a custom message with several nested data structures.
The way I was measuring the time was using the clock from the node, store that value in the header timestamp and checking the time again on the subscriber's callback.

@almaslov
Copy link
Author

As I can't see from same experiment on my local machine, deserialization takes about 150 microseconds and message transferring - 1150 microseconds. Deserialization on my feature branch takes about 130 microseconds, so optimization gain is less than 2%. I think the reason is that most of time is spent in memcpy'ing 200kb from C array to numpy array. I believe here is an area for optimization too, but it is out of current changes scope.

or if I should go with a custom message with several nested data structures.

I think this case will be more demonstrative. For example array of geographic_msgs/GeoPointStamped(as in my project) or TestArrayComplex from this issue

@Voldivh
Copy link
Contributor

Voldivh commented Feb 17, 2023

I think this case will be more demonstrative. For example array of geographic_msgs/GeoPointStamped(as in my project) or TestArrayComplex from this issue

I like this idea, I'll go ahead and test it out with those messages and post the results here.
As for the PR, it seems to be compiling without any issues and is passing all the test on on my docker container with ubuntu::focal, I'll try it out on ubuntu::jammy as well but I don't think I would get any difference. We should wait for the next release and start another CI after that.

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-02-16/29927/1

@Voldivh
Copy link
Contributor

Voldivh commented Feb 24, 2023

Hi @almaslov, sorry for the late response. I tested out the PR taking into account the message types and ubuntu distros as we talked before and I did notice an improvement as the data structures were more complex and had nested structures, as we expected it to happen. As for the building, everything went fine without any issues during compilation and testing.

Friendly ping to @clalancette , is the next rolling release out? I believe we can trigger a new CI in order to confirm everything is fine with merging this with upstream.

@clalancette
Copy link
Contributor

Friendly ping to @clalancette , is the next rolling release out? I believe we can trigger a new CI in order to confirm everything is fine with merging this with upstream.

Yes, it is out. I'll run the Rpr job again, but note that we'll also want to review this code and run full CI on it.

@clalancette
Copy link
Contributor

@ros-pull-request-builder retest this please

Copy link
Contributor

@Voldivh Voldivh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a some minor comments after the first pass of the changes. In overall I believe the PR is fine, however, I would like your opinion @wjwwood @sloretz .

@
static PyObject * @(func_name)()
{
// TODO(almaslov) proper deinitialization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get this TODO done?

Copy link
Author

@almaslov almaslov Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late response.
I’ve researched a little about best practice for this case, and in short, I think it will reqire changing public API.
First let me explain what is this comment about. Deserialization function should import a message class to be able to construct its instance. In current upstream solution it is imported on every call. So if large message contains 1000 submessages, than import machinery is called 1000 times too. And thought repeated calls of module import is relatively cheap, in total 1000 imports are quite expensive. _import_type.c.em is a naive implementation of cache which stores cached value in static C variable. First problem here is that imported type objects are not decref’ed if support module unimported and will be garbage collected only on interpreter exit. The second problem is potential issues with python subinterpreters (some info here).

I think these imported type objects should be stored in some state-object to be finalized later. It could be implemented in some ways

  • Python3 has recommende way to store global state - PyModule_GetState, which is described here: https://peps.python.org/pep-3121/.
  • rclpy bindings could pass some special state-object and manage its lifetime to implement more complicated logic.

As I can see, additional parameter with state-object required by convert_to_py and convert_from_py C methods in any way.
What do you think about it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response as well, I now understand a little better the use of _import_type.c.em and the issue at hand. The general idea of the module seems fine to me. In regards to the issue at hand, after reading a bit about the solution you propose I like the idea in order to avoid memory leaks. The only thing that worries me is the modification of the public API, Do you have a general idea of how much changes this would imply?

Another possible solution could be to manually handle the decref of the imported type objects by using Py_DECREF() at the moment of unloading the module. Do you think it could work? Let me know what you think.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve dived even deeper to python documentation, sources and mail threads to make modules importing and deinitialization the best I can. So here are some points:

  • Python extention modules can be initialized in a single-phase and multi-phase way.
  • The way typesupport modules initialized right now is single-phase. Also these modules have m_size field set to -1 which means: «Setting m_size to -1 means that the module does not support subinterpreters, because it has global state». In fact it does not have any global state, by my changes introduce it.
  • Acording to this PEP (note it is python3.12): https://peps.python.org/pep-0554/ , importing such module in subinterpreter will cause ImportError in future.
  • To support subinterpretes, python module should implement multi-phase initialization and manage its state independent from other modules.
  • I couldn’t find any code in python3.10 that checks compatibility of module with subinterpreter, so i’m not sure if current implementation(main branch) of typesupport in particular, and whole ros in general work with sub interpreters.

The only thing that worries me is the modification of the public API, Do you have a general idea of how much changes this would imply?

It depends. If subinterpreters support is not required, then I think current API is OK.
Otherwise, i belive a few changes will be enough:

  • Support multi-phase initialization
  • THIS IS API EXTENSION. Add new capsule to _msg_pkg_typesupport_entry_point.c.em. It should contain pointer to state object owned by pymodule
  • THIS IS API EXTENSION. Save this capsule to cls object in __import_type_support__ near the _CONVERT_FROM_PY and _CONVERT_TO_PY fields
  • THIS IS API CHANGE. Add new parameter to @(type_name)__convert_to_py and @(type_name)__convert_from_py functions in _msg_support.c.em
  • Pass saved state as a parameter in convert_from_py and convert_to_py methods in rclpy/utils.cpp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@almaslov and I discussed this a bit offline. I can't think of any instance where we are using sub-interpreters, so this should likely work. I think that we should add a comment in the code to indicate that this is unsupported in the case of sub-interpreters.

I think that the likely scenario is that you would end up with a double-free in that case (though that is a guess), and we should save someone some time in the future if they run into this unsupported use case.

]
# This field is modified after class creation.
# See the comment to Metaclass_@(message.structure.namespaced_type.name).__new__
__slots__ = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check if I understand correctly. You are creating __slots__ as an empty list in order to prevent adding __dict__ to class instances and you add the values to __slots__ only when constructing the message, i.e. when the message object is going to be used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description from sloretz is very accurate.

Comment on lines +59 to +65
@[ if repeated_header_file]@
// already included above
// @
@[ else]@
@{include_directives.add(header_file)}@
@[ end if]@
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@[ if repeated_header_file]@
// already included above
// @
@[ else]@
@{include_directives.add(header_file)}@
@[ end if]@
@[ if not repeated_header_file]@
@{include_directives.add(header_file)}@
@[ end if]@

NIT: Can we modify this to be more explicit on when the if is used?

@@ -156,8 +156,49 @@ PyObject * @('__'.join(type_.namespaces + [convert_camel_case_to_lower_case_unde
@[ end if]@
@[end for]@

@# TODO(aamaslov) duplicated struct definition here
Copy link
Contributor

@Voldivh Voldivh Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the duplicate struct definition?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Struct is moved to @(package_name)_decl.h header file.

Copy link
Contributor

@sloretz sloretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general idea is a good one. It will take me a bit to give it a thorough review though.

# overriden and no longer can be used. On the other side, any __slots__ value
# should be defined to guaratee #2.
# So we define empty list in message class body and modify it after type object
# constructed. This is neither prohibited, nor encouraged by CPython documentation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, normally __slots__ requires subclasses to only define new slots:

"The action of a slots declaration is not limited to the class where it is defined. slots declared in parents are available in child classes. However, child subclasses will get a and weakref unless they also define slots (which should only contain names of any additional slots)."

This defines an empty __slots__ attribute on the subclass prior to calling type.__new__, which is where cPython makes the decision to add __dict__ or not, and where the quoted note seems to matter. It then adds to __slots__ afterwards with the assumption that cPython won't use it for any other purpose, but it allows users to get a list of the members of a message.

I'm hesitant because it seems not very well defined when cPython will stop using __slots__.

An alternative option would be to make a different API for retrieving the members of a message. That would require downstream users to change their code in Rolling, but would give us freedom to put __slots__ in base classes.

Another alternative would be to implement the entire class in C so that there's only one class.

Maybe yet another option would be to do monkey patching instead of a metaclass. Say there was a class Example: that had an Example.__new__ method that returned an instance of ExampleBase. The first time it ran it would monkey patch ExampleBase with the methods from Example.

Out of all the options, my favorite is making a new API for retrieving members of the message. @clalancette What do you think? Pinging you because it would change a (maybe unintentionally) public API without a good way to tick/tock it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have to agree with @sloretz ; while this might currently work for __slots__, it seems like something that could break in future versions of Python. I don't think we should rely on that behavior here.

Out of all the options, my favorite is making a new API for retrieving members of the message. @clalancette What do you think? Pinging you because it would change a (maybe unintentionally) public API without a good way to tick/tock it.

I'm curious what you have in mind here? We already have property/setter methods for the real class that the user interacts with. Are you suggesting adding a new API to the Metaclass? And what would that be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what you have in mind here? We already have property/setter methods for the real class that the user interacts with. Are you suggesting adding a new API to the Metaclass? And what would that be?

I was suggesting a new API to get the message fields, followed by replacing any downstream use of __slots__ with that API. Then we no longer need to modify __slots__ here as there's already a supported way to get message fields.

Turns out that API already exists 🎉 , so I think we can get rid of modifying __slots__ altogether.

>>> from sensor_msgs.msg import JointState
>>> JointState.get_fields_and_field_types()
{'header': 'std_msgs/Header', 'name': 'sequence<string>', 'position': 'sequence<double>', 'velocity': 'sequence<double>', 'effort': 'sequence<double>'}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that API already exists tada , so I think we can get rid of modifying slots altogether.

I may be misunderstanding you, but there are other benefits to slots. Mostly that we don't create the __dict__ and __weakref__ methods, which in my experience has substantial memory savings. So I think we want to preserve using __slots__ if we can.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying we get rid of __slots__. I'm saying we don't need to make sure the subclass contains all the message fields. They're already in the __slots__ of the base class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying we get rid of slots. I'm saying we don't need to make sure the subclass contains all the message fields. They're already in the slots of the base class.

Yes! Thank you for the clear explanation in person, that helped. Now I understand and I agree.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Than I'll take a look at all the packages dependent on message __slots__ value and come back with more merge requests or suggestions.

Copy link
Contributor

@Voldivh Voldivh Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this @almaslov! Just as a comment, on #194 we encountered the issue of other packages using __slots__ instead of the method get_fields_and_field_types(). I've been working on some PRs on those packages to solve that issue. All the information regarding which packages have been solved or not is on the conversation of #194.

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-03-16/30432/1

@almaslov
Copy link
Author

I’ve made some changes to make deinitialization for imported modules in assumption, that subinterpreters support is not expected yet. There is now a single function @(package_name)__lazy_import which accepts integer parameter - index of module to be imported. Implementation is in @(package_name)_import.c souce file.
Also I’ve added examples of generated files to example directory for review purpose. I will delete it before merging.

Маслов Александр Александрович and others added 6 commits July 1, 2024 11:27
Signed-off-by: Маслов Александр Александрович <aamaslov@sberautotech.ru>
Signed-off-by: Маслов Александр Александрович <aamaslov@sberautotech.ru>
Signed-off-by: Маслов Александр Александрович <aamaslov@sberautotech.ru>
Signed-off-by: Маслов Александр Александрович <aamaslov@sberautotech.ru>
Signed-off-by: Маслов Александр Александрович <aamaslov@sberautotech.ru>
Add '_check_fields' as a default value in a __slots__ list for generated messages.

Signed-off-by: EsipovPA <esipov.p@mail.ru>
@EsipovPA EsipovPA force-pushed the c-extention-base-class branch from 5ff6598 to a12e6c4 Compare July 7, 2024 00:03
@EsipovPA
Copy link

EsipovPA commented Jul 7, 2024

Hello everyone!

I've took the liberty of updating this PR.
I've notices that there were some conflicts with the rolling branch.
Looks like these are done with.

Is it OK to request a new review?
@almaslov @Voldivh

@cottsay cottsay assigned cottsay and sloretz and unassigned cottsay Jul 18, 2024
Signed-off-by: EsipovPA <esipov.p@mail.ru>
@EsipovPA
Copy link

EsipovPA commented Aug 12, 2024

@cottsay @sloretz
Hello guys!

it seems like tests have finished successfully.
May I ask you to take a look at this PR, when you have some time?

Just a gentle reminder 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants