-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using C extention base class for every message #193
base: rolling
Are you sure you want to change the base?
Conversation
6375caf
to
093fb33
Compare
As far as I can understand errors in CI, my code depends on commit from |
Yeah, that's correct. We need to do some releases, hopefully later this week. We have other CI that we can run (by hand), so after this goes through review we can rely on that. |
e8b82d4
to
b305037
Compare
Hi @almaslov, I was trying to test out your solution. However, I encountered an error during compilation once I switched to your branch. The error was: make[2]: *** No rule to make target 'rosidl_generator_py/rosidl_generator_py/_rosidl_generator_py_bases.c', needed by 'CMakeFiles/rosidl_generator_py_custom__bases.dir/rosidl_generator_py/rosidl_generator_py/_rosidl_generator_py_bases.c.o'. Stop. I believe this line is not generating the appropriate files or is not generating them on the proper path. Could you take a look at it? |
Hello! Thank you for your time!
|
Hi! After cleaning the environment it compiled without issue, I did notice that the problem happens when you try to compile on top of a previous build from other branches. The test I was trying to do was to publish a message (in particular a Image from the sensor_msgs) and measure the time it took from publishing the message until the subscriber received it. I believe that the message should pass through the serialization and deserealization of the message. I tried out that test with the rolling branch of this repository and the branch from your PR as well and both yielded basically the same amount of time. I thought I was going to be able to notice a difference in the latency of the messages when switching the branches. Could you perhaps point me in the right direction to test out your branch improvement? I'm not sure if perhaps the message was too small (The image was something around 200Kb) as to notice a difference in performance or if I should go with a custom message with several nested data structures. |
As I can't see from same experiment on my local machine, deserialization takes about 150 microseconds and message transferring - 1150 microseconds. Deserialization on my feature branch takes about 130 microseconds, so optimization gain is less than 2%. I think the reason is that most of time is spent in memcpy'ing 200kb from C array to numpy array. I believe here is an area for optimization too, but it is out of current changes scope.
I think this case will be more demonstrative. For example array of |
I like this idea, I'll go ahead and test it out with those messages and post the results here. |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-02-16/29927/1 |
Hi @almaslov, sorry for the late response. I tested out the PR taking into account the message types and ubuntu distros as we talked before and I did notice an improvement as the data structures were more complex and had nested structures, as we expected it to happen. As for the building, everything went fine without any issues during compilation and testing. Friendly ping to @clalancette , is the next rolling release out? I believe we can trigger a new CI in order to confirm everything is fine with merging this with upstream. |
Yes, it is out. I'll run the Rpr job again, but note that we'll also want to review this code and run full CI on it. |
@ros-pull-request-builder retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ | ||
static PyObject * @(func_name)() | ||
{ | ||
// TODO(almaslov) proper deinitialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get this TODO done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for late response.
I’ve researched a little about best practice for this case, and in short, I think it will reqire changing public API.
First let me explain what is this comment about. Deserialization function should import a message class to be able to construct its instance. In current upstream solution it is imported on every call. So if large message contains 1000 submessages, than import machinery is called 1000 times too. And thought repeated calls of module import is relatively cheap, in total 1000 imports are quite expensive. _import_type.c.em
is a naive implementation of cache which stores cached value in static C variable. First problem here is that imported type objects are not decref’ed if support module unimported and will be garbage collected only on interpreter exit. The second problem is potential issues with python subinterpreters (some info here).
I think these imported type objects should be stored in some state-object to be finalized later. It could be implemented in some ways
- Python3 has recommende way to store global state - PyModule_GetState, which is described here: https://peps.python.org/pep-3121/.
- rclpy bindings could pass some special state-object and manage its lifetime to implement more complicated logic.
As I can see, additional parameter with state-object required by convert_to_py
and convert_from_py
C methods in any way.
What do you think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response as well, I now understand a little better the use of _import_type.c.em
and the issue at hand. The general idea of the module seems fine to me. In regards to the issue at hand, after reading a bit about the solution you propose I like the idea in order to avoid memory leaks. The only thing that worries me is the modification of the public API, Do you have a general idea of how much changes this would imply?
Another possible solution could be to manually handle the decref of the imported type objects by using Py_DECREF()
at the moment of unloading the module. Do you think it could work? Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve dived even deeper to python documentation, sources and mail threads to make modules importing and deinitialization the best I can. So here are some points:
- Python extention modules can be initialized in a single-phase and multi-phase way.
- The way typesupport modules initialized right now is single-phase. Also these modules have
m_size
field set to -1 which means: «Setting m_size to -1 means that the module does not support subinterpreters, because it has global state». In fact it does not have any global state, by my changes introduce it. - Acording to this PEP (note it is python3.12): https://peps.python.org/pep-0554/ , importing such module in subinterpreter will cause
ImportError
in future. - To support subinterpretes, python module should implement multi-phase initialization and manage its state independent from other modules.
- I couldn’t find any code in python3.10 that checks compatibility of module with subinterpreter, so i’m not sure if current implementation(main branch) of typesupport in particular, and whole ros in general work with sub interpreters.
The only thing that worries me is the modification of the public API, Do you have a general idea of how much changes this would imply?
It depends. If subinterpreters support is not required, then I think current API is OK.
Otherwise, i belive a few changes will be enough:
- Support multi-phase initialization
- THIS IS API EXTENSION. Add new capsule to
_msg_pkg_typesupport_entry_point.c.em
. It should contain pointer to state object owned bypymodule
- THIS IS API EXTENSION. Save this capsule to
cls
object in__import_type_support__
near the_CONVERT_FROM_PY
and_CONVERT_TO_PY
fields - THIS IS API CHANGE. Add new parameter to
@(type_name)__convert_to_py
and@(type_name)__convert_from_py
functions in_msg_support.c.em
- Pass saved state as a parameter in
convert_from_py
andconvert_to_py
methods inrclpy/utils.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@almaslov and I discussed this a bit offline. I can't think of any instance where we are using sub-interpreters, so this should likely work. I think that we should add a comment in the code to indicate that this is unsupported in the case of sub-interpreters.
I think that the likely scenario is that you would end up with a double-free in that case (though that is a guess), and we should save someone some time in the future if they run into this unsupported use case.
] | ||
# This field is modified after class creation. | ||
# See the comment to Metaclass_@(message.structure.namespaced_type.name).__new__ | ||
__slots__ = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to check if I understand correctly. You are creating __slots__
as an empty list in order to prevent adding __dict__
to class instances and you add the values to __slots__
only when constructing the message, i.e. when the message object is going to be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description from sloretz
is very accurate.
@[ if repeated_header_file]@ | ||
// already included above | ||
// @ | ||
@[ else]@ | ||
@{include_directives.add(header_file)}@ | ||
@[ end if]@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@[ if repeated_header_file]@ | |
// already included above | |
// @ | |
@[ else]@ | |
@{include_directives.add(header_file)}@ | |
@[ end if]@ | |
@[ if not repeated_header_file]@ | |
@{include_directives.add(header_file)}@ | |
@[ end if]@ |
NIT: Can we modify this to be more explicit on when the if is used?
@@ -156,8 +156,49 @@ PyObject * @('__'.join(type_.namespaces + [convert_camel_case_to_lower_case_unde | |||
@[ end if]@ | |||
@[end for]@ | |||
|
|||
@# TODO(aamaslov) duplicated struct definition here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the duplicate struct definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Struct is moved to @(package_name)_decl.h
header file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the general idea is a good one. It will take me a bit to give it a thorough review though.
# overriden and no longer can be used. On the other side, any __slots__ value | ||
# should be defined to guaratee #2. | ||
# So we define empty list in message class body and modify it after type object | ||
# constructed. This is neither prohibited, nor encouraged by CPython documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, normally __slots__
requires subclasses to only define new slots:
This defines an empty __slots__
attribute on the subclass prior to calling type.__new__
, which is where cPython makes the decision to add __dict__
or not, and where the quoted note seems to matter. It then adds to __slots__
afterwards with the assumption that cPython won't use it for any other purpose, but it allows users to get a list of the members of a message.
I'm hesitant because it seems not very well defined when cPython will stop using __slots__
.
An alternative option would be to make a different API for retrieving the members of a message. That would require downstream users to change their code in Rolling, but would give us freedom to put __slots__
in base classes.
Another alternative would be to implement the entire class in C so that there's only one class.
Maybe yet another option would be to do monkey patching instead of a metaclass. Say there was a class Example:
that had an Example.__new__
method that returned an instance of ExampleBase
. The first time it ran it would monkey patch ExampleBase
with the methods from Example
.
Out of all the options, my favorite is making a new API for retrieving members of the message. @clalancette What do you think? Pinging you because it would change a (maybe unintentionally) public API without a good way to tick/tock it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I have to agree with @sloretz ; while this might currently work for __slots__
, it seems like something that could break in future versions of Python. I don't think we should rely on that behavior here.
Out of all the options, my favorite is making a new API for retrieving members of the message. @clalancette What do you think? Pinging you because it would change a (maybe unintentionally) public API without a good way to tick/tock it.
I'm curious what you have in mind here? We already have property
/setter
methods for the real class that the user interacts with. Are you suggesting adding a new API to the Metaclass? And what would that be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what you have in mind here? We already have property/setter methods for the real class that the user interacts with. Are you suggesting adding a new API to the Metaclass? And what would that be?
I was suggesting a new API to get the message fields, followed by replacing any downstream use of __slots__
with that API. Then we no longer need to modify __slots__
here as there's already a supported way to get message fields.
Turns out that API already exists 🎉 , so I think we can get rid of modifying __slots__
altogether.
>>> from sensor_msgs.msg import JointState
>>> JointState.get_fields_and_field_types()
{'header': 'std_msgs/Header', 'name': 'sequence<string>', 'position': 'sequence<double>', 'velocity': 'sequence<double>', 'effort': 'sequence<double>'}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that API already exists tada , so I think we can get rid of modifying slots altogether.
I may be misunderstanding you, but there are other benefits to slots. Mostly that we don't create the __dict__
and __weakref__
methods, which in my experience has substantial memory savings. So I think we want to preserve using __slots__
if we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not saying we get rid of __slots__
. I'm saying we don't need to make sure the subclass contains all the message fields. They're already in the __slots__
of the base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not saying we get rid of slots. I'm saying we don't need to make sure the subclass contains all the message fields. They're already in the slots of the base class.
Yes! Thank you for the clear explanation in person, that helped. Now I understand and I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Than I'll take a look at all the packages dependent on message __slots__
value and come back with more merge requests or suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pushing this @almaslov! Just as a comment, on #194 we encountered the issue of other packages using __slots__
instead of the method get_fields_and_field_types()
. I've been working on some PRs on those packages to solve that issue. All the information regarding which packages have been solved or not is on the conversation of #194.
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-03-16/30432/1 |
I’ve made some changes to make deinitialization for imported modules in assumption, that subinterpreters support is not expected yet. There is now a single function |
Signed-off-by: Маслов Александр Александрович <aamaslov@sberautotech.ru>
Signed-off-by: Маслов Александр Александрович <aamaslov@sberautotech.ru>
Signed-off-by: Маслов Александр Александрович <aamaslov@sberautotech.ru>
Signed-off-by: Маслов Александр Александрович <aamaslov@sberautotech.ru>
Signed-off-by: Маслов Александр Александрович <aamaslov@sberautotech.ru>
Add '_check_fields' as a default value in a __slots__ list for generated messages. Signed-off-by: EsipovPA <esipov.p@mail.ru>
5ff6598
to
a12e6c4
Compare
Signed-off-by: EsipovPA <esipov.p@mail.ru>
Added C extension base class for every message. These base classes define native fields and corresponding slots(aka
struct PyMemberDef
). During deserialization they allow to call__new__
(which is C function) instead of expensive__init__
(which is pure Python) and initialize base class C fields instead of Python class members.Details are in #192