-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPC Contribution Request Initial Version #85
Conversation
Documentation artifact: https://github.com/eclipse-score/score/actions/runs/12316734790/artifacts/2317420650 |
Documentation artifact: https://github.com/eclipse-score/score/actions/runs/12316876942/artifacts/2317465283 |
a1afbb8
to
533ae71
Compare
Documentation artifact: https://github.com/eclipse-score/score/actions/runs/12316930452/artifacts/2317483118 |
533ae71
to
8db7e9f
Compare
Documentation artifact: https://github.com/eclipse-score/score/actions/runs/12316990309/artifacts/2317503218 |
8db7e9f
to
f088043
Compare
Documentation artifact: https://github.com/eclipse-score/score/actions/runs/12317010841/artifacts/2317509754 |
Issue-ref: see #69
f088043
to
7c80a1f
Compare
Documentation artifact: https://github.com/eclipse-score/score/actions/runs/12317098130/artifacts/2317535823 |
Issue-ref: see #69
Documentation artifact: https://github.com/eclipse-score/score/actions/runs/12319864212/artifacts/2318446003 |
Documentation artifact: https://github.com/eclipse-score/score/actions/runs/12337163662/artifacts/2322404081 |
Documentation artifact: https://github.com/eclipse-score/score/actions/runs/12689474723/artifacts/2406814745 |
Issue-ref: see #69
8a955cb
to
8b1e99b
Compare
Documentation artifact: https://github.com/eclipse-score/score/actions/runs/12689600993/artifacts/2406852971 |
Signed-off-by: Nico Hartmann <14351007+HartmannNico@users.noreply.github.com>
Signed-off-by: Nico Hartmann <14351007+HartmannNico@users.noreply.github.com>
Signed-off-by: Nico Hartmann <14351007+HartmannNico@users.noreply.github.com>
CFT to comment on the topics in the initial list Signed-off-by: Nico Hartmann <14351007+HartmannNico@users.noreply.github.com>
* the same or different compute devices, built into | ||
* the same or different devices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the distinction between a "compute device" and a "device" in this context?
Compute device = CPU, GPU, etc.
Device = SoC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. Better terms welcome. The compute device executes instructions -> CPU, MCU, GPU, NPU; "Chip" would be too broad.
The "device" is referring to the ECU or board the compute device is bonded to.
|
||
In current communication designs (SomeIP, Protobuf, Zenoh) communication on networks is at the centerpoint of considerations. We believe this to be the wrong approach for software defined machines. Network based communication circles around protocols based on a wire based communication paradigm, requiring serialization and segmentation of data. | ||
|
||
Instead, we promote a memory centered core paradigm and put Inter-Process Communication (IPC) into our conceptional focus. This allows for true zero copy design, does not segment or serialize data and fosters significantly higher performance in algorithms. As a bonus, it provides easy gateway mechanisms into serial communication like Ethernet and CAN. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems IPC (general concept) is being conflated with Zero-Copy Communication (specific mechanism). Could you adjust the phrasing to make this distinction clearer? For example, you might specify "Zero-Copy Inter-Process Communication (IPC)" instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPC is the communication between processes (or better: communication endpoints) with data transport by means of memory. Zero-Copy is the desired mode of implementation to execute IPC. IPC is a prerequisite to enable Zero-Copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"In computer science, interprocess communication (IPC) is the sharing of data between running processes in a computer system." (see https://en.wikipedia.org/wiki/Inter-process_communication)
There are different approaches (https://en.wikipedia.org/wiki/Inter-process_communication#Approaches) of implementing IPC, where Shared Memory is one of them, in order to achieve a better performance.
So I know that you mean "IPC via Shared-Memory", when you use the term "IPC", but for readers who read that the first time it is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Then we should be clear here. We agreed in the architecture workshop we do not talk about pipes, message queues and the like from OS perspective.
* **Remote Procedures**: A remote procedure is the information carrier for **execution** progress. A Id handle identifies the remote procedure together with an ordered set of named *parameters*. Each parameter defined by a data type. A caller of a remote procedure can cause it's activation by invoking the remote procedure with passed *arguments*. An argument is a single value instance for a parameter. See `Remote Procedures`_, `Names`_, `Data Types`_. | ||
* **Events**: An event is the information carrier for runtime **synchronization**. A unique Id identifies the event. It signals the change of state. There is no data conveyed with the event. See `Events`_, `Names`_. | ||
|
||
While the Id uniquely identifies an information element within the communication framework, it can also have a *name* as alias to conveniently identify the element. While the Id may not be publicly known, the *name* allows for public lookup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the name also needs to be unique, it seems like the Id might be redundant. It would be helpful to clarify the distinct roles of the Id and the name in order to highlight the different purposes of both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is unique within the namespace it is defined.
The two items Id and Name were conveived to declare the Id to be something that is more compute-like (a number) while a name should foster human interaction. The Id could be a hash from the name, or any other unique identifier for an element.
Furthermore, many elements do not need to have a name, the idea is to allow anonymous elements with no name. Those would have an Id, but no mapping to a name.
We define three fundamental building elements: | ||
|
||
* **Endpoints**: Endpoints are both the source and the target of every information exchange in teh communication framework. An endpoint providing information is consequently called a *provider*. With the same logic a *consumer* is an endpoint consuming information. Endpoints have an *Id* that uniquely identifies the endpoint within a node. | ||
* **Nodes**: A node is an entity in the communication system that hosts several endpoints. It is the central element of the communication fabric by connecting endpoints and routing data. Nodes have an *Id* that uniquely identifies the node within a fabric. A node itself is also an endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last sentence "A node itself is also an endpoint." implies according to the definition of endpoints that a node can contain other nodes ("Endpoints have an Id that uniquely identifies the endpoint within a node."). Is that the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In communication there will be messages that are addressing a node. With the definition that only endpoints initiate and consume information exchanged (aka messages), follows that a node must be an endpoint.
The logic that because of this a node can have sub-nodes I do not see, as a node is an endpoint but an endpoint is not a node.
A node in my opinion should not have "sub-nodes".
* **Links**: A link is the fundamental abstraction of a connection between any two nodes. A link conveys information between nodes. | ||
|
||
The combination of NodeId and EndpointId we also refer to as *address*. As nodes are also endpoints, they implicitly have an address. | ||
|
||
Nodes and endpoints may also be identified by a *name* that resolves into references to these elements. See `Names`_, `References`_. | ||
|
||
Connecting nodes though links creates a mesh of nodes that can mutually exchange information utilizing the above concepts. The boundary of the mesh is at the sole discretion of the deployment and may span from a single application into a connected cloud environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the use case of "only" connecting nodes via links? I understand the use case behind connecting endpoints of different nodes, but not the use case of "only" connecting nodes. Is it if a node, e.g. subscribes to a topic and the "link" then is the subscription?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Practical consideration when designing a network. Locally, endpoints attach to their node. Nodes interconnect through links with each other. So an endpoint initiates a message Send to the attached node with a target address (Node+Endpoint). Node resolves the address, if it is local: dispatch to local endpoint. Otherwise: Find a proper Route to the target node, utilize the Link referenced in this Route and dispatch through the link. Semantically, an endpoint communicates to another endpoint, the nodes and links in between are invisible.
---------- | ||
|
||
Data types describe the inner structure of data entities known as values. | ||
A specific data type will always have the identical memory layout, independent from compiler, operating system and controller architecture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should that be possible regarding different alignment, padding, endianess, etc. of the different platform, OSs, and so on? Or is "only" the serialized format meant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible. And your points are valid.
That is why we cannot use unspecified memory layouts as they are used by C++ or Rust. Instead, we need formalized layouts like C.
This again constrains the choices of data structures we can exchange. Not too much though.
Pointers and references are forbidden. Serialization is slow, often requires code generation and hence is bad.
Instead, data has to be coherent and relocatable for zero-copy. For collections this requires individual implementations, for primitives, tuple, array and struct we are good with C layout.
Mixed endianess is a show-stopper for zero-copy as it requires data conversion one way or the other. All mixed endian designs with memory data exchange are messy and should be avoided at all cost. Meaning: Hardware requirement. We are, after all, software-defined here.
The consistent way to cope with mixed endianess is serialization. Expensive and not Zero-Copy. Unless we entertain the view that serializing into and from a buffer is zero-copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples to control the interpretation/layout of a respective memory location and therefore enable such a type system are e.g. Cap'n Proto or rkyv.
Lifetime | ||
```````` | ||
|
||
Once created, the topic belongs to the communication framework which determines it's lifetime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo "it's" should be "its"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once created, the topic belongs to the communication framework which determines it's lifetime. | |
Once created, the topic belongs to the communication framework which determines its lifetime. |
@olivembo for typos, could you please include the suggestion as shown above? This makes including the fix straight forward to the PR owner.
|
||
.. note:: | ||
|
||
Instead of passing back the result from the procedure the caller may pass a result-return reference that is a remote procedure itself. This way the framework may have a straight-forward way of implementing a Future mechanism that completes upon reception of the response call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comma after "back the result from the procedure"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of passing back the result from the procedure the caller may pass a result-return reference that is a remote procedure itself. This way the framework may have a straight-forward way of implementing a Future mechanism that completes upon reception of the response call. | |
Instead of passing back the result from the procedure, the caller may pass a result-return reference that is a remote procedure itself. This way the framework may have a straight-forward way of implementing a Future mechanism that completes upon reception of the response call. |
`````````````````````` | ||
|
||
Attaching a name to a remote procedure means to publish the remote procedure. | ||
The communication framework owns both the name and the namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question: If the communication framework owns everything (name, namespace of the procedure, topics, etc.) what happens, if there is a failure in the communication framework? Isn't it then the single point of failure and FFI according to ISO 26262-1:2018 cannot be fulfilled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--> Discussion point.
|
||
* **Topics**: A topic is the information carrier for **data**. A unique Id identifies a topic while a *data type* defines it's memory layout. The topic carries zero or multiple *values*. A value represents a single instances of the data type. See `Topics`_, `Names`_, `Data Types`_. | ||
* **Remote Procedures**: A remote procedure is the information carrier for **execution** progress. A Id handle identifies the remote procedure together with an ordered set of named *parameters*. Each parameter defined by a data type. A caller of a remote procedure can cause it's activation by invoking the remote procedure with passed *arguments*. An argument is a single value instance for a parameter. See `Remote Procedures`_, `Names`_, `Data Types`_. | ||
* **Events**: An event is the information carrier for runtime **synchronization**. A unique Id identifies the event. It signals the change of state. There is no data conveyed with the event. See `Events`_, `Names`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What state will change when a corresponding event will "occur"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The state is an abstractum here. The point to make is that an event shall occur. The meaning of this occurrence is the change of any related state. The state is only semantically connected to the event. Example: an interrupt is abstracted in an event that occurs/fires/triggers because some data was received which caused a state change in the receiving chip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vevmaki could you please resolve the conversation if your question is answered? Otherwise we can also add it to the agenda on the IPC CFT meeting.
|
||
From a perspective of safety, a node also encapsulates a single safety domain. Links provide the means for separating safety domains and thus allow for mixed criticality applications. | ||
|
||
^^^^ End of Big Picture ^^^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with the approach of endpoints and nodes as long as we are talking about network infrastructure. However, if we want to apply this to shared memory concepts for IPC I wonder whether a shared memory element is a node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A node would be the communication dispatcher within a process. Here it does not matter if that is IPC or Serialized communication.
In iceoryx2 you also open a "node" and on the node "services" are defined.
So, my proposal is to stick to "Node" as a general communication hub thing.
```````````````````````````````` | ||
|
||
An event is designed to convey an immediate notification of the associated state change. However for cases where a subscriber cannot react immediately an event occurrence may be latched in a queue for deferred processing. This is called an 'Event Queue'. The framework may opt to offer event queues on top of immediate event propagation. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of queued events, is there are there ordering criteria like time...
Is there a chance to associate topics with RPCs or topics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume queued items will have timestamps. My opinion: The timestamp is part of the data and not part of the queue item.
What do you mean by "associate" topics with RPCs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine when the sorting of elements in a queue is done via a timestamp of the element.
I am wondering if queued events can trigger functions. So an event element would have a timestamp and a reference to function which will be executed by the queue's server.
…e/score into hartmannnico-cr-ipc
Co-authored-by: Lars Bauhofer <155632781+qor-lb@users.noreply.github.com> Signed-off-by: Nico Hartmann <14351007+HartmannNico@users.noreply.github.com>
Consolidated into #229 by @HartmannNico & @LittleHuba. |
After setting up the IPC CR in it's intial version merge CR into main.