Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.2 fails with "Unexpected FDs during handshake" on 32 bit linux #813

Closed
sophie-h opened this issue May 25, 2024 · 7 comments · Fixed by #815
Closed

4.2 fails with "Unexpected FDs during handshake" on 32 bit linux #813

sophie-h opened this issue May 25, 2024 · 7 comments · Fixed by #815

Comments

@sophie-h
Copy link
Contributor

Upgrade from 4.0.1 to 4.2.2 seems to make zbus::ConnectionBuilder::unix_stream(unix_stream).p2p() fail with Handshake("Unexpected FDs during handshake") in my 32 bit CI.

I can add more details and tests if the issue is not obvious.

@zeenix
Copy link
Contributor

zeenix commented May 26, 2024

Interesting. Since you provided a recent good tag, I can bisect to find out what broke this but unfortunately I don't have a 32-bit machine handy so it might not be as easy. I guess I'll manually go through the commits to try to find the culprit.

I'll also see if we can easily add a 32 bit CI too.

@sophie-h
Copy link
Contributor Author

Turns out that it was probably by accident that the 32-bit CI failed. I got the same error with amd64 and aarch64 as well now. It looks like it's not deterministic. So far no error on zbus 4.1.2 appeared. I will try to reproduce this locally.

@zeenix
Copy link
Contributor

zeenix commented May 26, 2024

Thanks for the info. Just looking into the code right now.

zbus::ConnectionBuilder::unix_stream(unix_stream).p2p()

I thought it's our own p2p test that's failing but reading this again, it's against your code. Can you please point me to it?

@sophie-h
Copy link
Contributor Author

Spawning server: https://gitlab.gnome.org/sophie-h/glycin/-/blob/main/glycin/src/dbus.rs

Connecting to server: https://gitlab.gnome.org/sophie-h/glycin/-/blob/main/glycin-utils/src/instruction_handler.rs

Running the same tests I wasn't able to reproduce it on my system so far :/

@zeenix
Copy link
Contributor

zeenix commented May 26, 2024

Connecting to server: https://gitlab.gnome.org/sophie-h/glycin/-/blob/main/glycin-utils/src/instruction_handler.rs

Thanks. Now I know that at least there is one users of Anonymous auth as well. :) This is useful for #726. Although I'm sure switching to External auth will be easy and non-problematic. Also, you might be able to make use of our new support for pre-authenticated socket here (and I'm you sure can't reproduce this issue with that so it could be an easy workaround as a positive side-effect).

Anyway getting back to the topic.. I think I know the source of the issue. I know you won't like to hear this but it's the damn pipelining again. 🙈 I didn't think p2p case through when I made the server side be pipelining-friendly (note that I had to do this anyways for busctl or any systemd dbus client, regardless of whether or not zbus does it).

I'm looking at the code to see what's the best way to solve it but I'm confident it shouldn't be a biggie to resolve.

@zeenix
Copy link
Contributor

zeenix commented May 26, 2024

BTW, do we know which side gets the error? If not, can you please provide the logs with RUST_LOG=zbus=trace env set?

@zeenix
Copy link
Contributor

zeenix commented May 26, 2024

Anyway getting back to the topic.. I think I know the source of the issue. I know you won't like to hear this but it's the damn pipelining again. 🙈 I didn't think p2p case through when I made the server side be pipelining-friendly (note that I had to do this anyways for busctl or any systemd dbus client, regardless of whether or not zbus does it).

Actually after reading the code more, I realized it's not directly related to pipelinning itself but rather the size of the read buffer, which we changed from 64 to 1024. Moreover, while making the buffer larger does indeed make is a lot more likely for this issue to happen, there is no guaranteed that it won't happen if we were to revert to 64 bytes. I need to think of a good reliable solution here..

zeenix added a commit to zeenix/zbus that referenced this issue May 28, 2024
If the client pipelines the handshake or is super fast to send the first
message after the handshake, the server may receive FDs while still
reading the handshake commands from the client. This is not possible to
happen if the server is a bus impl but can happen in case of a pure p2p
server.

We fix this by not erroring out on receiving FDs from the socket during
the handshake but instead storing them in a list and processing them
later when we receive/process the associated message.

Unfortunately, this requires an API break but given that there is no way
to fix this without an API break and it's extremely unlikely that anyone
is using/implementing these brand new low-level trait methods in question,
directly, I think it's OK.

Fixes dbus2#813.
zeenix added a commit to zeenix/zbus that referenced this issue May 28, 2024
If the client pipelines the handshake or is super fast to send the first
message after the handshake, the server may receive FDs while still
reading the handshake commands from the client. This is not possible to
happen if the server is a bus impl but can happen in case of a pure p2p
server.

We fix this by not erroring out on receiving FDs from the socket during
the handshake but instead storing them in a list and processing them
later when we receive/process the associated message.

Unfortunately, this requires an API break but given that there is no way
to fix this without an API break and it's extremely unlikely that anyone
is using/implementing these brand new low-level trait methods in question,
directly, I think it's OK.

Fixes dbus2#813.
zeenix added a commit to zeenix/zbus that referenced this issue May 28, 2024
If the client pipelines the handshake or is super fast to send the first
message after the handshake, the server may receive FDs while still
reading the handshake commands from the client. This is not possible to
happen if the server is a bus impl but can happen in case of a pure p2p
server.

We fix this by not erroring out on receiving FDs from the socket during
the handshake but instead storing them in a list and processing them
later when we receive/process the associated message.

Unfortunately, this requires an API break but given that there is no way
to fix this without an API break and it's extremely unlikely that anyone
is using/implementing these brand new low-level trait methods in question,
directly, I think it's OK.

Fixes dbus2#813.
zeenix added a commit to zeenix/zbus that referenced this issue May 29, 2024
If the client pipelines the handshake or is super fast to send the first
message after the handshake, the server may receive FDs while still
reading the handshake commands from the client. This is not possible to
happen if the server is a bus impl but can happen in case of a pure p2p
server.

We fix this by not erroring out on receiving FDs from the socket during
the handshake but instead storing them in a list and processing them
later when we receive/process the associated message.

Unfortunately, this requires an API break but given that there is no way
to fix this without an API break and it's extremely unlikely that anyone
is using/implementing these brand new low-level trait methods in question,
directly, I think it's OK.

Fixes dbus2#813.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants