Skip to content
This repository has been archived by the owner on Aug 23, 2020. It is now read-only.

Change: Rewrites networking code #1393

Merged
merged 102 commits into from
Jul 5, 2019

Conversation

luca-moser
Copy link
Member

@luca-moser luca-moser commented Mar 29, 2019

Description

Rewrites the networking code to use a single-threaded NIO selector loop managing read/writes from/to neighbors.

This PR introduces following fundamental changes:

  • Support for UDP connections has been removed
  • Config parameters: UDP_RECEIVER_PORT, TCP_RECEIVER_PORT,MAX_PEERS, DNS_REFRESHER_ENABLED, DNS_RESOLUTION_ENABLED have been removed
  • New config parameters:
    • NEIGHBORING_SOCKET_ADDRESS: defining the socket address to bind the TCP socket to
    • NEIGHBORING_SOCKET_PORT: defining the port of the TCP socket to use
    • RECONNECT_ATTEMPT_INTERVAL_SECONDS: defining the interval at which to try to reconnect
      to disconnected/non-added wanted neighbors.
    • AUTO_TETHERING_ENABLED: whether auto-tethering is enabled (this was previously controlled via TESTNET true), default is false (also in testnet mode).
    • MAX_NEIGHBORS: defining the max numbers of connected neighbors (basically renamed MAX_PEERS to keep the naming the same)
  • Newly received transactions are no longer sent back to the neighbor who sent it in the first place.
  • Transactions which were submitted by a client using broadcastTransactions are now part of the transaction processing pipeline and therefore stored on the node directly before being gossiped to other nodes. Prior those submitted transactions were echoed back by the neighbors and only then were they actually persisted on the node which originally broadcasted them.
  • An actual protocol is now implemented which uses type-length-value denoted messages.
  • It is now possible to connect to multiple neighbors originating from the same IP address, as the identity of a node is its IP address+server socket port
  • Neighbors are now only added after their domain name could be resolved. Reconnects now mean trying to resolve the domain name and also connecting the socket. Reconnects are done in the RECONNECT_ATTEMPT_INTERVAL_SECONDS defined interval. Neighbors to which the connection was closed, are put into "reconnect pool" until they are explicitly removed by removeNeighbors.
  • Gossiping transaction messages are now dynamic and take 341-1650 bytes of data.
  • the getNeighbors API call now also returns the domain with which the neighbor was added and whether the neighbor is actually connected.
  • The requested transaction hash of a transaction gossip packet is now fixed to 49 bytes, no matter in what mode the node runs in.
  • A neighbor communicates his used coordinator address in the initial handshake packet, which if it doesn't match the node's own used coordinator address, will drop the connection. This prevents cross-pollination of nodes running in different networks. The Nodes will also communicate their used minimum weight magnitude, which in case they don't match, will also drop the connection.
  • BCTCurl is used within the transaction processing pipeline which batches up to 64 transactions to hash them at the same time.

Fixes # (issue)
#1376
#1377
#1380
#1381
#1383
#1379
#1388

Type of change

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

How Has This Been Tested?

Currently through a private test net which uses compass. Transactions gossiping seems to work as visualizations of the Tangle look the same on different nodes and the transaction count is the same.

Furthermore unit tests have been written which test the functionality of each newly added component.

Further testing must be undertaken before this PR is merged/used on mainnet.

Checklist:

Please delete items that are not relevant.

  • My code follows the style guidelines for this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@luca-moser luca-moser added C-Network E-Performance Epic - Improving throughput of IRI labels Mar 29, 2019
@luca-moser luca-moser changed the title Rewrites networking code Change: Rewrites networking code Mar 29, 2019
@luca-moser luca-moser force-pushed the network-rewrite branch 3 times, most recently from 107f77a to aa15794 Compare March 29, 2019 20:26
import java.util.Iterator;
import java.util.LinkedHashMap;

public class FIFOCache<K, V> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be great to move away from this non-generic impl. and replace it with a known Impl.
(it originally was an LRU cache with special requirements, but it's not the case anymore.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #1456

I was going to write no need to do it now, but I see you partially did it :-P

@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Mar 31, 2019
@iotaledger iotaledger deleted a comment Jun 17, 2019
@iotaledger iotaledger deleted a comment Jun 17, 2019
Copy link
Contributor

@GalRogozinski GalRogozinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of batched BCT hashing
few changes please

public void run() {
running.set(true);
List<HashRequest> reqs = new ArrayList<>();
exit:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this please

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

running.set(true);
List<HashRequest> reqs = new ArrayList<>();
exit:
while (running.get()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we like to do it is by writing while (!Thread.currentThread().isInterrupted())
This way when we stop the Executor Service running this, it works well :-)

Please change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

try {
firstReq = reqQueue.take();
} catch (InterruptedException e) {
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is what stops it instead
I will be nice to have a log message of shutdown in case we don't have it now elsewhere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way please do:
Thread.currentThread.interrupt();

For an explanation see:
https://daniel.mitterdorfer.name/articles/2015/handling-interruptedexception/

To make it short, the interrupt flag was cleared and this is not good, because we want to stop the thread

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

try {
newReq = reqQueue.poll(batchTimeoutMilliSec, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
break exit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you want to have an interrupt that doesn't make the thread exit
why is that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed anymore. Changed.

Copy link
Contributor

@GalRogozinski GalRogozinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing BCT Curl I realized i overlooked a few things

*/
private void addStage(String name, BlockingQueue<ProcessingContext> queue,
com.iota.iri.network.pipeline.Stage stage) {
stagesThreadPool.submit(new Thread(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do () -> { instead of writing new Thread

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is made on purpose in order to pass a name of the thread. I can't do that with a lambda.

com.iota.iri.network.pipeline.Stage stage) {
stagesThreadPool.submit(new Thread(() -> {
try {
while (!shutdown.get()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to Thread.currentThread.isInterrupted() please

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

}
}
} catch (InterruptedException e) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Thread.currentThread().interrupt();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

long reconnectAttemptTimeout = TimeUnit.SECONDS
.toMillis(networkConfig.getReconnectAttemptIntervalSeconds());

while (!shutdown.get()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also do Thread.currentThread.isInterrupted() here and save a variable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

* <li>[00000111] denotes that this node supports protocol versions 1, 2 and 3.</li>
* <li>[01101110] denotes that this node supports protocol versions 2, 3, 4, 6 and 7.</li>
* <li>[01101110, 01010001] denotes that this node supports protocol versions 2, 3, 4, 6, 7, 9, 13 and 15.</li>
* <li>[01101110, 01010001, 00010001] denotes that this node supports protocol versions 2, 3, 4, 6, 7, 9, 13, 15,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask what prompted to change from Big Endian to Little Endian?
It wasn't enough optimizing it for space, but you also want to optimize the handshake for CPUs?

I just find it even less readable and more confusing...

Copy link
Member Author

@luca-moser luca-moser Jun 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask what prompted to change from Big Endian to Little Endian?

The endianess is still the same, I just swapped the bits (version) from starting from left to start from the right. The byte order is still the same as before.

I've changed it in reverse order because usually one shifts bits to the right and it allows us to check whether a position's bit was set by comparing it with 1 instead of 0b10000000 (which makes more sense imho)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it confused me because I started reading right to left, then jumping right again with my eyes (reminded me of Little Endian, but yeah I shouldn't have used that term)

I guess as long as the documentation is fine I don't care so much wither way, either though I personally rather reading left to right the entire thing at the cost of seeing 0b10000000 in the code.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C-Network E-Performance Epic - Improving throughput of IRI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants