Skip to content
This repository has been archived by the owner on Oct 28, 2021. It is now read-only.

Aleth should wait 2 seconds to close socket after sending disconnect to peer #5650

Closed
halfalicious opened this issue Jul 4, 2019 · 7 comments · Fixed by #5707
Closed

Aleth should wait 2 seconds to close socket after sending disconnect to peer #5650

halfalicious opened this issue Jul 4, 2019 · 7 comments · Fixed by #5707
Assignees

Comments

@halfalicious
Copy link
Contributor

halfalicious commented Jul 4, 2019

According to the RLPX handshake spec, one should give clients 2 seconds to disconnect after sending a disconnect packet:

Inform the peer that a disconnection is imminent; if received, a peer should disconnect immediately. When sending, well-behaved hosts give their peers a fighting chance (read: wait 2 seconds) to disconnect to before disconnecting themselves.

However, Aleth closes the socket immediately (via Session::drop). This also means that the disconnect packet may not be sent - this is because Session::disconnect (and all of the functions it calls) is executed on the network thread, whereas the async_write may not execute immediately in which case I suspect it will execute after Session::disconnect / Session::drop:

aleth/libp2p/Session.cpp

Lines 299 to 310 in c630593

void Session::disconnect(DisconnectReason _reason)
{
cnetdetails << "Disconnecting (our reason: " << reasonOf(_reason) << ") from " << m_logSuffix;
if (m_socket->ref().is_open())
{
RLPStream s;
prep(s, DisconnectPacket, 1) << (int)_reason;
sealAndSend(s);
}
drop(_reason);
}

Here's where the actual socket write takes place (in Session::write):

aleth/libp2p/Session.cpp

Lines 237 to 256 in c630593

auto self(shared_from_this());
ba::async_write(m_socket->ref(), ba::buffer(*out),
[this, self](boost::system::error_code ec, std::size_t /*length*/) {
// must check queue, as write callback can occur following dropped()
if (ec)
{
LOG(m_netLogger) << "Error sending: " << ec.message();
drop(TCPError);
return;
}
DEV_GUARDED(x_framing)
{
m_writeQueue.pop_front();
if (m_writeQueue.empty())
return;
}
write();
});
}

@twinstar26
Copy link
Contributor

Hey @halfalicious I am very much interested in solving this issue!

@halfalicious
Copy link
Contributor Author

@twinstar26 Sure thing, it's yours! 😄

@twinstar26
Copy link
Contributor

Hey @halfalicious so,

aleth/libp2p/Session.cpp

Lines 299 to 310 in c630593

void Session::disconnect(DisconnectReason _reason)
{
cnetdetails << "Disconnecting (our reason: " << reasonOf(_reason) << ") from " << m_logSuffix;
if (m_socket->ref().is_open())
{
RLPStream s;
prep(s, DisconnectPacket, 1) << (int)_reason;
sealAndSend(s);
}
drop(_reason);
}

Session::sealAndSend() is calling Session::send() which is calling Session::write() performing async_write(). And to solve this issue we need to sleep the current thread by 2 secs after async_write() has done sending disconnect packet.

An easy solution WITHOUT MODIFYING Session::write() might be to keep a check on m_writeQueue (if its empty or not. Empty indicates successful disconnect packet sent) inside of Session::disconnect().

By modifying Session::write():-
We can modify recursive callbacks of Session::write() or implement async_write() handler to see when we are done sending disconnect packet.

So which one to go by?? Do you have any other idea??

@halfalicious
Copy link
Contributor Author

halfalicious commented Jul 7, 2019

@twinstar26 :

Session::sealAndSend() is calling Session::send() which is calling Session::write() performing async_write(). And to solve this issue we need to sleep the current thread by 2 secs after async_write() has done sending disconnect packet.

Your call chain is correct (disconnect -> sealAndSend -> send -> write -> async_write). However, we cannot sleep the current thread because Session::write is called on the network thread so sleeping the thread would block all Aleth network I/O (so discovery , connecting to new peers, syncing with existing peers via the Ethereum Wire protocol, etc would all stop working) which you can imagine is pretty undesirable 😃 What we ultimately want to do is determine that 2 seconds have passed since we finished sending the disconnect packet without blocking the thread and the way we can do this is via boost deadline timers, particularly steady_timer. If you're not familiar with boost timers you can look at other parts of Aleth where we use them, for example the Host class uses one for its run loop (see Host::m_runTimer and Host::run). They can be a bit tricky so let me know if you have any questions ??

An easy solution WITHOUT MODIFYING Session::write() might be to keep a check on m_writeQueue (if its empty or not. Empty indicates successful disconnect packet sent) inside of Session::disconnect().

What would this check look like? How would we continuously execute this check until m_writeQueue is drained without polling?

By modifying Session::write():- We can modify recursive callbacks of Session::write() or implement async_write() handler to see when we are done sending disconnect packet.

We already have an async_write handler in Session::write:

aleth/libp2p/Session.cpp

Lines 238 to 256 in 71f9ba6

ba::async_write(m_socket->ref(), ba::buffer(*out),
[this, self](boost::system::error_code ec, std::size_t /*length*/) {
// must check queue, as write callback can occur following dropped()
if (ec)
{
LOG(m_netLogger) << "Error sending: " << ec.message();
drop(TCPError);
return;
}
DEV_GUARDED(x_framing)
{
m_writeQueue.pop_front();
if (m_writeQueue.empty())
return;
}
write();
});
}

I think that we can check the write queue in the handler and when it is empty we know that the disconnect has been sent so we can start the deadline timer.

I think that you also need to add a check in Session::write to detect that we're sending out a disconnect packet and to include this info in the async_write handler (since we only want to start the timer if we sent the disconnect packet). The packet type is the first field in the bytes array and the disconnect packet type is 0x1 so you should be able to do something like the following:

If (m_writeQueue[0][0] == DisconnectPacket)

Be sure to do this before the call to writeSingleFramePacket since that encodes the packet so I don't think the above check will work after that.

Additionally, we're going to need to avoid closing the socket immediately after sending the disconnect packet since we want to keep it open for 2 seconds. So, I propose the following additional changes to do this:

  1. Move the socket closure code out of Session::drop and into the handler for the new disconnect timer that you'll be creating.
  2. Add a check for m_dropped at the beginning of Session::send which will prevent any more packets being sent to the node that we're disconnecting from (this isn't technically required but it's part of being a good citizen and preventing the other client from wasting resources e.g. continuing to sync with a client that's disconnecting from it)
  3. Update Session::isConnected to return m_dropped rather than if the socket is open - this will prevent this session from being considered a valid peer session and being used in things like peer count computations and syncing while the disconnect is occurring. Again, not technically required but prevents us from wasting some resources on a peer that we're disconnecting from.

After you've made and tested these changes it would also be great if you could create a unit test which validates the new behavior, though we can chat about that when you've finished with your changes 😃

@gumb0
Copy link
Member

gumb0 commented Jul 8, 2019

I think that we can check the write queue in the handler and when it is empty we know that the disconnect has been sent so we can start the deadline timer.

I think that you also need to add a check in Session::write to detect that we're sending out a disconnect packet and to include this info in the async_write handler (since we only want to start the timer if we sent the disconnect packet).

Can we, to avoid complications, start the timer from drop() after sealAndSend() call?
This might be not exactly 2 seconds, but I think it'll be close enough. (Important thing is "to give a fighting chance", not to try to reach exactly 2 seconds with infinite precision (not possible in any case))

@halfalicious
Copy link
Contributor Author

I think that we can check the write queue in the handler and when it is empty we know that the disconnect has been sent so we can start the deadline timer.
I think that you also need to add a check in Session::write to detect that we're sending out a disconnect packet and to include this info in the async_write handler (since we only want to start the timer if we sent the disconnect packet).

Can we, to avoid complications, start the timer from drop() after sealAndSend() call?
This might be not exactly 2 seconds, but I think it'll be close enough. (Important thing is "to give a fighting chance", not to try to reach exactly 2 seconds with infinite precision (not possible in any case))

Presumably you mean to start the timer in Session::Disconnect (after sealAndSend), which would simplify things since we wouldn't need to detect the packet type?

cc @twinstar26

@gumb0
Copy link
Member

gumb0 commented Jul 9, 2019

Yes, right, I meant Session::disconnect.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants