Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IllegalArgumentException appears in the log #5849

Closed
zeusoo001 opened this issue Jun 6, 2024 · 13 comments · Fixed by #5873
Closed

IllegalArgumentException appears in the log #5849

zeusoo001 opened this issue Jun 6, 2024 · 13 comments · Fixed by #5873
Assignees
Labels

Comments

@zeusoo001
Copy link
Contributor

Software Versions

GreatVoyage-v4.7.5(Cleobulus)
OS : Linux
JVM : Oracle Corporation 1.8.0_161 amd64

Expected behaviour

No exceptions should occur.

Actual behaviour

In issue 5847 , I saw an IllegalArgumentException thrown in the posted log.

00:31:44.126 INFO  [peerClient-9] [DB](Manager.java:1926) HeadNumber: 62292059, syncBeginNumber: 62292040, solidBlockNumber: 62292041.
00:31:44.126 INFO  [peerClient-9] [net](SyncService.java:197) Get block chain summary, low: 62292040, highNoFork: 62292059, high: 62292059, realHigh: 62294059
--
lowestBlockNum: 0

java.lang.IllegalArgumentException: Comparison method violates its general contract!
	at java.util.TimSort.mergeHi(TimSort.java:899)
	at java.util.TimSort.mergeAt(TimSort.java:516)
	at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
	at java.util.TimSort.sort(TimSort.java:254)
	at java.util.Arrays.sort(Arrays.java:1512)
	at java.util.ArrayList.sort(ArrayList.java:1464)
	at java.util.Collections$SynchronizedList.sort(Collections.java:2463)
	at org.tron.core.net.peer.PeerManager.sortPeers(PeerManager.java:97)
	at org.tron.core.net.service.handshake.HandshakeService.processHelloMessage(HandshakeService.java:121)
	at org.tron.core.net.P2pEventHandlerImpl.processMessage(P2pEventHandlerImpl.java:177)
	at org.tron.core.net.P2pEventHandlerImpl.onMessage(P2pEventHandlerImpl.java:140)
	at org.tron.p2p.connection.ChannelManager.handMessage(ChannelManager.java:261)
	at org.tron.p2p.connection.ChannelManager.processMessage(ChannelManager.java:207)
	at org.tron.p2p.connection.socket.MessageHandler.decode(MessageHandler.java:51)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:299)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:93)
	at org.tron.p2p.stats.TrafficStats$TrafficStatHandler.channelRead(TrafficStats.java:36)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
--
01:13:59.687 INFO  [peerClient-4] [net](ChannelManager.java:148) Add peer /46.4.94.252:18888, total channels: 35
0
@halibobo1205
Copy link
Contributor

@ss334452 Do you know what can cause this issue?

@zeusoo001
Copy link
Contributor Author

@halibobo1205 This problem should be due to multi-threaded concurrent access, which causes the peer sorting value to be changed during sorting.

peer.getChannel().updateAvgLatency(
        System.currentTimeMillis() - peer.getChannel().getStartTime());
PeerManager.sortPeers();


public static synchronized void sortPeers() {
  peers.sort(Comparator.comparingDouble(c -> c.getChannel().getAvgLatency()));
}

For example, a node establishes connections with peer A and peer B at the same time. When processing a hello message, when peer A thread calls the sortPeers method, peer B thread calls the updateAvgLatency method, causing the sorting value to be changed.

@jwrct
Copy link
Contributor

jwrct commented Jun 12, 2024

@zeusoo001 Has the probability of this concurrency issue been evaluated? Are there any good methods to reproduce this issue?

@zeusoo001
Copy link
Contributor Author

zeusoo001 commented Jun 12, 2024

@jwrct The probability of this concurrency issue occurring is extremely low. Currently, it can only be reproduced through scripts.

@xxo1shine
Copy link
Contributor

@zeusoo001 Is this exception impact on the system greatly ?

@zeusoo001
Copy link
Contributor Author

@ss334452 It will only cause the handshake to fail and the connection to be disconnected, without affecting other connections and functions. Since the probability is extremely small, it has almost no impact on the system.

@xxo1shine
Copy link
Contributor

@zeusoo001 Is there any good solution to this concurrency problem?

@zeusoo001
Copy link
Contributor Author

zeusoo001 commented Jun 19, 2024

@ss334452 The two logics can be bundled together to solve the concurrency problem. The modification is as follows:

    peer.setHelloMessageReceive(msg);
    peer.getChannel().updateAvgLatency(
            System.currentTimeMillis() - peer.getChannel().getStartTime());
    PeerManager.sortPeers();
    peer.onConnect();

public static synchronized void sortPeers(PeerConnection peer) {
    peers.sort(Comparator.comparingDouble(c -> c.getChannel().getAvgLatency()));
}

change to

    peer.setHelloMessageReceive(msg);
    PeerManager.sortPeers(peer);
    peer.onConnect();

public static synchronized void sortPeers(PeerConnection peer) {
    try {
      peer.getChannel().updateAvgLatency(
        System.currentTimeMillis() - peer.getChannel().getStartTime());
      peers.sort(Comparator.comparingDouble(c -> c.getChannel().getAvgLatency()));
    }catch (Exception e) {
      logger.warn("Sort peers failed. {}", e.getMessage());
    }
  }

The impact of sorting failure is minimal, so there is no need to handle exceptions specially.

@xxo1shine
Copy link
Contributor

@zeusoo001 Very good, I agree with this solution.

@fyyhtx
Copy link
Contributor

fyyhtx commented Jun 20, 2024

@zeusoo001 Why is there a try-catch exception handling in your modified code? Is it because even after this modification, there will still be concurrency issues?

@zeusoo001
Copy link
Contributor Author

zeusoo001 commented Jun 20, 2024

@fyyhtx Yes, libp2p channel keepalive will also call this method updateAvgLatency.

@fyyhtx
Copy link
Contributor

fyyhtx commented Jun 20, 2024

@zeusoo001 Is the goal of the modification to address concurrency issues or to handle exceptions? If it is for handling exceptions, you can simply try catching the exception thrown by sort, what do you think? Please confirm how much impact a failed sort would have.

@zeusoo001
Copy link
Contributor Author

@fyyhtx I think your solution is OK. The failure of sortPeers will only affect this sorting and will not have any impact on the system. In addition, the probability of concurrent problems is extremely low. I can refer to your modification plan. The code modification is as follows:

  public static synchronized void sortPeers(PeerConnection peer) {
    try {
      peers.sort(Comparator.comparingDouble(c -> c.getChannel().getAvgLatency()));
    }catch (Exception e) {
      logger.warn("Sort peers failed. {}", e.getMessage());
    }
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants