-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Commands fail to be pop from stack #1564
Comments
Are you able to capture a debug log? |
Thanks for the very fast response! Yes - I meant to post the debug log. Here is a snippet from when the error is happening. If you would like more detail / a different spot, just let me know! 2020-12-21 16:46:28,642 [XNIO-1 task-1] DEBUG io.lettuce.core.RedisChannelHandler - dispatching command SubscriptionCommand [type=GET, output=ValueOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]
2020-12-21 16:46:28,642 [XNIO-1 task-1] DEBUG i.l.c.m.UpstreamReplicaConnectionProvider - getConnectionAsync(READ)
2020-12-21 16:46:28,642 [XNIO-1 task-2] DEBUG io.lettuce.core.RedisChannelHandler - dispatching command SubscriptionCommand [type=GET, output=ValueOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]
2020-12-21 16:46:28,642 [XNIO-1 task-2] DEBUG i.l.c.m.UpstreamReplicaConnectionProvider - getConnectionAsync(READ)
2020-12-21 16:46:28,643 [XNIO-1 task-1] DEBUG io.lettuce.core.RedisChannelHandler - dispatching command SubscriptionCommand [type=GET, output=ValueOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]
2020-12-21 16:46:28,643 [XNIO-1 task-2] DEBUG io.lettuce.core.RedisChannelHandler - dispatching command SubscriptionCommand [type=GET, output=ValueOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]
2020-12-21 16:46:28,643 [XNIO-1 task-1] DEBUG i.l.core.protocol.DefaultEndpoint - [channel=0xa017a14e, /10.10.10.152:37900 -> host-2.account.0001.use1.cache.amazonaws.com/172.23.24.33:6379, epid=0x7] write() writeAndFlush command SubscriptionCommand [type=GET, output=ValueOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]
2020-12-21 16:46:28,643 [XNIO-1 task-2] DEBUG i.l.core.protocol.DefaultEndpoint - [channel=0xa017a14e, /10.10.10.152:37900 -> host-2.account.0001.use1.cache.amazonaws.com/172.23.24.33:6379, epid=0x7] write() writeAndFlush command SubscriptionCommand [type=GET, output=ValueOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]
2020-12-21 16:46:28,643 [XNIO-1 task-1] DEBUG i.l.core.protocol.DefaultEndpoint - [channel=0xa017a14e, /10.10.10.152:37900 -> host-2.account.0001.use1.cache.amazonaws.com/172.23.24.33:6379, epid=0x7] write() done
2020-12-21 16:46:28,643 [XNIO-1 task-2] DEBUG i.l.core.protocol.DefaultEndpoint - [channel=0xa017a14e, /10.10.10.152:37900 -> host-2.account.0001.use1.cache.amazonaws.com/172.23.24.33:6379, epid=0x7] write() done
2020-12-21 16:46:28,643 [lettuce-epollEventLoop-11-7] DEBUG i.l.core.protocol.CommandHandler - [channel=0xa017a14e, /10.10.10.152:37900 -> host-2.account.0001.use1.cache.amazonaws.com/172.23.24.33:6379, chid=0x7] write(ctx, SubscriptionCommand [type=GET, output=ValueOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command], promise)
2020-12-21 16:46:28,643 [lettuce-epollEventLoop-11-7] DEBUG i.l.core.protocol.CommandEncoder - [channel=0xa017a14e, /10.10.10.152:37900 -> host-2.account.0001.use1.cache.amazonaws.com/172.23.24.33:6379] writing command SubscriptionCommand [type=GET, output=ValueOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]
2020-12-21 16:46:28,646 [lettuce-epollEventLoop-11-7] DEBUG i.l.core.protocol.CommandHandler - [channel=0xa017a14e, /10.10.10.152:37900 -> host-2.account.0001.use1.cache.amazonaws.com/172.23.24.33:6379, chid=0x7] write(ctx, SubscriptionCommand [type=GET, output=ValueOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command], promise)
2020-12-21 16:46:28,646 [lettuce-epollEventLoop-11-7] DEBUG i.l.core.protocol.CommandEncoder - [channel=0xa017a14e, /10.10.10.152:37900 -> host-2.account.0001.use1.cache.amazonaws.com/172.23.24.33:6379] writing command SubscriptionCommand [type=GET, output=ValueOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]
2020-12-21 16:46:28,741 [lettuce-epollEventLoop-11-7] DEBUG i.l.core.protocol.CommandHandler - [channel=0xa017a14e, /10.10.10.152:37900 -> host-2.account.0001.use1.cache.amazonaws.com/172.23.24.33:6379, chid=0x7] Received: 2057 bytes, 4 commands in the stack
2020-12-21 16:46:28,741 [lettuce-epollEventLoop-11-7] DEBUG i.l.core.protocol.CommandHandler - [channel=0xa017a14e, /10.10.10.152:37900 -> host-2.account.0001.use1.cache.amazonaws.com/172.23.24.33:6379, chid=0x7] Stack contains: 4 commands
2020-12-21 16:46:28,741 [lettuce-epollEventLoop-11-7] DEBUG i.l.core.protocol.RedisStateMachine - Decode done, empty stack: true
2020-12-21 16:46:28,741 [lettuce-epollEventLoop-11-7] DEBUG i.l.core.protocol.CommandHandler - [channel=0xa017a14e, /10.10.10.152:37900 -> host-2.account.0001.use1.cache.amazonaws.com/172.23.24.33:6379, chid=0x7] Completing command LatencyMeteredCommand [type=GET, output=ValueOutput [output=java.nio.HeapByteBuffer[pos=0 lim=2048 cap=2048], error='null'], commandType=io.lettuce.core.RedisPublisher$SubscriptionCommand]
2020-12-21 16:46:28,745 [lettuce-epollEventLoop-11-7] DEBUG i.l.core.protocol.CommandHandler - [channel=0xa017a14e, /10.10.10.152:37900 -> host-2.account.0001.use1.cache.amazonaws.com/172.23.24.33:6379, chid=0x7] Received: 2057 bytes, 3 commands in the stack
2020-12-21 16:46:28,746 [lettuce-epollEventLoop-11-7] DEBUG i.l.core.protocol.CommandHandler - [channel=0xa017a14e, /10.10.10.152:37900 -> host-2.account.0001.use1.cache.amazonaws.com/172.23.24.33:6379, chid=0x7] Stack contains: 3 commands
2020-12-21 16:46:28,746 [lettuce-epollEventLoop-11-7] DEBUG i.l.core.protocol.RedisStateMachine - Decode done, empty stack: true
2020-12-21 16:46:28,746 [lettuce-epollEventLoop-11-7] DEBUG i.l.core.protocol.CommandHandler - [channel=0xa017a14e, /10.10.10.152:37900 -> host-2.account.0001.use1.cache.amazonaws.com/172.23.24.33:6379, chid=0x7] Completing command LatencyMeteredCommand [type=GET, output=ValueOutput [output=java.nio.HeapByteBuffer[pos=0 lim=2048 cap=2048], error='null'], commandType=io.lettuce.core.RedisPublisher$SubscriptionCommand]
2020-12-21 16:46:31,643 [parallel-8] ERROR c.t.sp.Controller - JT 100
java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 3000ms in 'map' (and no fallback has been configured)
at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.handleTimeout(FluxTimeout.java:294)
at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.doTimeout(FluxTimeout.java:279)
at reactor.core.publisher.FluxTimeout$TimeoutTimeoutSubscriber.onNext(FluxTimeout.java:418)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:119)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2020-12-21 16:46:31,645 [parallel-8] ERROR c.t.sp.Controller - JT 100
java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 3000ms in 'map' (and no fallback has been configured)
at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.handleTimeout(FluxTimeout.java:294)
at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.doTimeout(FluxTimeout.java:279)
at reactor.core.publisher.FluxTimeout$TimeoutTimeoutSubscriber.onNext(FluxTimeout.java:418)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:119)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834) Also, I just attempted to remove the deserializer and instead work off of return new ReactiveRedisTemplate<>(factory, RedisSerializationContext.byteArray()); |
FYI - I just tested removing Spring's facade: val client = RedisClient.create();
val nodes = Arrays.asList(
RedisURI.create("redis://host.account.ng.0001.use1.cache.amazonaws.com:6379"),
RedisURI.create("redis://host-1.account.0001.use1.cache.amazonaws.com:6379"),
RedisURI.create("redis://host-2.account.0001.use1.cache.amazonaws.com:6379"));
val connection = MasterReplica.connect(client, ByteArrayCodec.INSTANCE, nodes);
connection.setReadFrom(ReadFrom.UPSTREAM_PREFERRED);
instance = connection.reactive(); Same issue. It is not Spring. |
I believe that the issue might be the value we are storing. In particular, if the first character byte is 62 signed / ASCII
I tried with shorter values and I couldn't reproduce. Swapping the Any thoughts on why |
The smallest value that I could get to reproduce is:
or bytes As I got the key smaller and smaller, it felt harder and harder to reproduce. This seems to imply it might be a timing issue that is experated by my internet and VPN connection. |
Thanks for digging into the issue, indeed, a lingering |
You are the best --- testing now. Thanks! |
You can grab the artifact from oss.sonatype.org (see https://github.com/lettuce-io/lettuce-core#binariesdownload) |
@mp911de , you are great. Thank you for the quick response and the pointers! I tested locally and I can no longer reproduce. Thanks again! |
Bug Report
We have started seeing issues with Lettuce hanging and not returning any results. We believe this might be caused by a race condition in canceling a request. We can reproduce manually every time, however, we are having trouble creating a script to reproduce.
The environments that we have tested with are Spring Boot 3.7.6, 4.0.0, and 4.0.1. This app is connected to AWS Elastic Beanstalk Redis cluster with one replica.
The other change on our side was an increase in value size from 0.5 KB to 2 KB and a codec deserialization change.
Current Behavior
Commands not processed or responses mixed up
On a web request to Spring that accesses Redis, we issue a simple mget or get. Most of the time this works. However, for some values, some of the time, the request errors out with a time out. After this, the state of the application goes into a mostly failure mode with **redis commands timing out or returning the wrong responses.** We note the following: 1. Before, we were seeing logs of, `Stack contains: 1 commands` . Now, this number becomes `Stack contains: 2 commands` or higher. 2. If we send another request at the same while the first request is "stuck," the first request completes. However, the second request is now "stuck." The first request receives another Redis's command response. For example, when working with two keys, responses for both keys sometimes gets mixed up. It's like the new request pops out an older response.Our testing procedure is to load up the app and hit it with a web browser. When we reload two tabs at once a couple of times, we enter this error state.
On the first error, we sometimes are able to capture a Mono error of:
Input Code
Input Code
API implementation of:
Serialization code of:
Expected behavior/code
We expect all requests to act independently of other requests. A request should never be able to see another request's response. Further, the request should not require another request to complete. If our user code invalidates a contract, we should see an error message with details on why that is happening.
Environment
Possible Solution
From the debug messages and the stacktrace, it appears the previous command that fail to complete for whatever reason is not removed from the stack. Lettuce needs to clean up on failures better.
Additional context
I have a feeling that this is caused by the increased value size that we have in Redis. My guess is that we are not or Lettuce is not dealing with the ByteBuffers rights. Maybe we need to mark it consumed or copy the data out more explicitly? Or, maybe Lettuce needs to ensure that the ByteBuffer is not reused?
Since others are not seeing this, I reckon it has to do with our code and/or the contract between our code and Lettuce.
The text was updated successfully, but these errors were encountered: