Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix utp outgoing FIN #7232

Merged
merged 9 commits into from
Jan 18, 2023
Merged

Fix utp outgoing FIN #7232

merged 9 commits into from
Jan 18, 2023

Conversation

arvidn
Copy link
Owner

@arvidn arvidn commented Jan 2, 2023

This is a series of patches to uTP cleaning up the handling of closing the streams. It addresses this comment: #7162 (comment)

The correct behavior when the other end sends a FIN packet is to record that the stream has been closed and at which sequence number it was closed. Then deliver any undelivered payload to the upper layer and finally deliver the eof error. The uTP layer should not respond with a FIN to close down the outgoing direction until the upper layer says to do so.

In the case of bittorrent, this will cause the upper layer to close its end of the connection as well.

These patches are separated to be somewhat independent. The last commit is the main one that makes the uTP layer not respond with a FIN, but instead let the upper layer close the socket before doing so.

With these patches, the socket is kept alive until un-acked data has been acked, and is kept open for one time-out period in order to ACK the other end's FIN packet as well, even after the socket has been closed by the upper layer.

The rules that have been made stricter are:

  • we now correctly check to ensure the upper layer doesn't send more data after closing it
  • the check for incoming payload at sequence numbers past the EOF seq nr was incorrect, and has been corrected
  • FIN packets are not supposed to contain payload (this can be deduced from libutp, but it's not well documented). With this patch, FIN packets with payload is supported, but logs a warning.
  • When closing a socket that has a pending nagle packet, that packet will be force-sent before the FIN packet. This ensures sequence numbers remain correct and that the FIN packet does not carry payload.

@arvidn arvidn changed the title Fix utp outgoing fin Fix utp outgoing FIN Jan 2, 2023
@arvidn
Copy link
Owner Author

arvidn commented Jan 2, 2023

@thrnz ping if you're interested in reviewing

@glassez
Copy link
Contributor

glassez commented Jan 2, 2023

@arvidn
Could it be related to fixing crash reported in qbittorrent/qBittorrent#18157 (comment)?

@arvidn
Copy link
Owner Author

arvidn commented Jan 2, 2023

While fixing this, I don't think I saw any misbehavior that could cause a crash or use-after-free. Just issues at the uTP protocol level.

@thrnz
Copy link
Contributor

thrnz commented Jan 2, 2023

I'm not too familiar with how the upper layer interacts with the uTP layer - the recent uTP performance improvements were mostly at the uTP level itself - but the reasoning behind these changes seems to make sense.

I can't spot anything obviously wrong. I might try throwing some live torrents at this later just to make sure there's no weirdness going on.

@Seeker2
Copy link

Seeker2 commented Jan 3, 2023

I feel compelled to mention a scenario I've been painfully experiencing for a long time that may tie into this:
#3542 (comment)

At the least, it should be one of the tests done against this.

@markmdscott
Copy link

I feel compelled to mention a scenario I've been painfully experiencing for a long time that may tie into this: #3542 (comment)

At the least, it should be one of the tests done against this.

#3542 was reportedly fixed by @thrnz 's recent set of uTP patches a little while ago.

@thrnz
Copy link
Contributor

thrnz commented Jan 4, 2023

I feel compelled to mention a scenario I've been painfully experiencing for a long time that may tie into this: #3542 (comment)

At the least, it should be one of the tests done against this.

Prior to 2.0.8, uTP streams weren't closing cleanly and were instead timing out. That might help explain the issue of lingering libtorrent based uTP peers, and should now hopefully be fixed, though doesn't explain those other issues.

I know of one more issue related to #3542 that probably still needs addressing - it helps explain incoming packet loss on Windows 7 in particular - though would be surprised if it resolved everything.

@Seeker2
Copy link

Seeker2 commented Jan 4, 2023

Prior to 2.0.8, uTP streams weren't closing cleanly and were instead timing out. That might help explain the issue of lingering libtorrent based uTP peers, and should now hopefully be fixed, though doesn't explain those other issues.

The long-lived uTP seed-to-seed connections has another issue with the unknown (to me!) traffic about 5 seconds after the connection is made. TCP seed-to-seed connections typically disconnect in under a second. Cleaning up the closing may help prevent a 30+ second delay but not a 4+ second one.

I know of one more issue related to #3542 that probably still needs addressing - it helps explain incoming packet loss on Windows 7 in particular - though would be surprised if it resolved everything.

I saw a test of uTP traffic both to and from Windows and Linux...and the results suggest only the source needed to be a Windows PC to have the extremely high packet loss result and low artificial speed limit on a per-peer basis. Even Windows 10 (or 11?) probably exhibits the same symptoms, but I haven't bothered to check it. The family PC that has Windows 10 on it...other family members may not appreciate me doing that level of testing on it!

Oh, has the 2.0.8 changes been back-ported to libtorrent v1.x as well?

@arvidn arvidn force-pushed the fix-utp-outgoing-fin branch from 1ba55da to 1cc49dc Compare January 6, 2023 01:55
@arvidn arvidn marked this pull request as ready for review January 6, 2023 01:55
@arvidn arvidn force-pushed the fix-utp-outgoing-fin branch 3 times, most recently from 69c51c1 to b69a736 Compare January 14, 2023 12:41
@arvidn
Copy link
Owner Author

arvidn commented Jan 15, 2023

I believe this patch is pretty solid now. Has anyone else tested it?

@thrnz
Copy link
Contributor

thrnz commented Jan 15, 2023

I've tried it out with live torrents as of the most recent push without noticing any obvious problems, though I guess any FIN issues might not be that noticeable to begin with.

Is the utp_fin_resends setting still used after these changes? It looks like we're now always giving up after the first timeout after sending the FIN, so the utp_fin_resends check might be redundant.

@rafi-d
Copy link

rafi-d commented Jan 16, 2023

without noticing any obvious problems

How do you see/check for how many re-transmits are done? Maybe adding adebug/logged counter of those can be of help?

@arvidn
Copy link
Owner Author

arvidn commented Jan 16, 2023

Is the utp_fin_resends setting still used after these changes? It looks like we're now always giving up after the first timeout after sending the FIN, so the utp_fin_resends check might be redundant.

good point. I'll take a look at this. I should probably restore that behavior.

@arvidn arvidn force-pushed the fix-utp-outgoing-fin branch from b69a736 to a2100e4 Compare January 16, 2023 15:45
if (m_out_eof
&& !m_nagle_packet
&& m_write_buffer_size == 0
&& state() == state_t::connected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When send_fin() fails due to stalling, it still progresses to state_t::fin_sent. Should this check be state() == state_t::fin_sent (as it was prior to this patch) in order to pick it up?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. Maybe the state shouldn't even be transitioned in case the socket is stalled, failing to send the FIN.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not obvious to me that checking the socket state here is important either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit might actually be able to be dropped entirely. If sending the FIN packet stalls, it gets marked for resend and ends up in m_outbuf, and I think should get resent in one of the previous send_pkt() calls above.

It looks like the second commit in #7153 might have been a bit wonky and tried to address something that was already solved by it's first commit, so the logic here no longer seems to makes sense.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. I hesitate dropping this until there are proper unit tests for these cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding it correctly, the old conditions would never be met, so functionally nothing was happening here anyway. When state() == state_t::fin_sent, m_outbuf.at(m_seq_nr) would be the FIN packet. Once it's ACK'd and removed from m_outbuf, we'll no longer be in the fin_sent state. The assumption was that a stalled FIN packet wouldn't have made its way into m_outbuf (which now happens) and could be detected here and re-sent - it instead gets resent in the send_pkt() calls above along with any other packets needing to be resent.

I'm not sure what effect the new conditions here using the connected state instead of the fin_sent state might have.

Reproducing stalled FIN packets in a test would be ideal.

@arvidn arvidn merged commit 32e3698 into RC_2_0 Jan 18, 2023
@arvidn arvidn deleted the fix-utp-outgoing-fin branch January 18, 2023 01:18
@arvidn
Copy link
Owner Author

arvidn commented Jan 19, 2023

How do you see/check for how many re-transmits are done? Maybe adding adebug/logged counter of those can be of help?

I think the next step in actually making this uTP implementation robust would be to add unit tests (as opposed to the existing end-to-end simulations). Where exact packets and their timings are crafted and injected by the test itself. The incoming packets callback and the send-packet callback could already be hooked into by a test, the main missing interface is the clock. The utp code currently calls into the chrono clock directly right now, to ask what time it is. If the time could be overridden, I think that would enable proper unit tests.

Perhaps the timestamp could be passed in to the incoming packet callback.

@thrnz
Copy link
Contributor

thrnz commented Jan 19, 2023

How do you see/check for how many re-transmits are done? Maybe adding adebug/logged counter of those can be of help?

There's a utp_packet_resend performance counter that increments when a packet is resent.

I wonder if adding a 'utp stall' counter might be worthwhile. It could be handy for picking up too shallow buffers.

@rafi-d
Copy link

rafi-d commented Jan 19, 2023

Sounds like a good idea. Is this resend counter visible in the app?or only with the debugger?

@thrnz
Copy link
Contributor

thrnz commented Jan 20, 2023

Sounds like a good idea. Is this resend counter visible in the app?or only with the debugger?

There's some more info here about the available counters and gauges and how to access them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants