-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improperly decoding command responses #1512
Comments
for good measure I added a listener to the connection that would log when it got a push message (which I now found out was a thing). Confirming what I thought above I'm seeing a number of "push" messages appear in my logs with garbage data. I wouldn't expect to see any since I'm still using Redis 5 |
Thanks for the report. Can you provide a minimal reproducer that is able to reproduce the issue? Since our tests are green, we probably miss a specific arrangement that leads to the failure. |
Try adding this test to CommandHandlerUnitTests.java. It passes on 5.3.x branch but fails on main (the assert after the second channelRead fails since it threw away that read and did not complete the LRANGE as expected and as done on previous versions). So actually the more specific issue is that it occurs when the last line of the buffer is incomplete but also starts with '>'. It leads to the line not being consumed and pushing the read index ahead, resulting in the next channelRead to think it got a PUSH message. In the below example the LRANGE should return 4 values, but i split the redis response across two channel reads where the final line of the first channelRead is not yet complete (it is expecting a value of 4 bytes but only 3 are found at that point, the remaining byte comes in at the start of the next channelRead) @Test
void shouldHandleIncompleteResponses() throws Exception {
Command<String, String, List<String>> lrangeCommand = new Command<>(
CommandType.LRANGE, new ValueListOutput<>(StringCodec.UTF8),
new CommandArgs<>(StringCodec.UTF8).addKey("lrangeKey").add(0).add(1));
Command<String, String, List<KeyValue<String, String>>> hmgetCommand = new Command<>(
CommandType.HMGET, new KeyValueListOutput<>(StringCodec.UTF8, Arrays.asList("KEY1", "KEY2", "KEY3")),
new CommandArgs<>(StringCodec.UTF8).addKeys("hmgetKey", "KEY1", "KEY2", "KEY3"));
ChannelPromise channelPromise = new DefaultChannelPromise(channel, ImmediateEventExecutor.INSTANCE);
channelPromise.setSuccess();
sut.channelRegistered(context);
sut.channelActive(context);
List<Command<String, String, ?>> commands = Arrays.asList(lrangeCommand, hmgetCommand);
sut.write(context, commands, channelPromise);
assertThat(stack.size()).isEqualTo(2);
// the LRANGE response comes back across two channelReads
sut.channelRead(context, Unpooled.wrappedBuffer("*4\r\n$3\r\nONE\r\n$4\r\n>TW".getBytes()));
assertThat(stack.size()).isEqualTo(2);
sut.channelRead(context, Unpooled.wrappedBuffer("O\r\n$5\r\nTHREE\r\n$4\r\nFOUR\r\n".getBytes()));
assertThat(stack.size()).isEqualTo(1);
assertThat(lrangeCommand.get()).isNotNull();
assertThat(lrangeCommand.get().size()).isEqualTo(4);
// the HMGET response comes back in another read
sut.channelRead(context, Unpooled.wrappedBuffer("*3\r\n$4\r\nVAL1\r\n$4\r\nVAL2\r\n$4\r\nVAL3\r\n".getBytes()));
assertThat(stack.isEmpty()).isTrue();
assertThat(hmgetCommand.get()).isNotNull();
assertThat(hmgetCommand.get().size()).isEqualTo(3);
} |
I'm also wondering if this issue was the cause of #1443 since it manifests itself as things getting out of sync |
Thanks a lot. Let me investigate why this is. |
With the introduction of push message handling, we introduced this regression. The decode loop doesn't consider the partial decode progress and falls into the push decode code path. |
We now verify that no partial decode is in progress when entering the code path to decode push messages. Previously, a partial read could leave the decode buffer at a read position that started with > which is the byte marker for push decode. Without considering the partial decode progress, subsequent decodes used push decode which left commands in the stack without properly decoding the actual command.
We now verify that no partial decode is in progress when entering the code path to decode push messages. Previously, a partial read could leave the decode buffer at a read position that started with > which is the byte marker for push decode. Without considering the partial decode progress, subsequent decodes used push decode which left commands in the stack without properly decoding the actual command.
That's fixed now. Care to retest against the latest snapshot? |
looks fixed in the snapshot build. thanks! |
Bug Report
Current Behavior
I'm upgrading our version of lettuce from 5.2.0 to 6.0.0 and immediately have encountered an issue which makes the new version unusable for us. Our application is only using HMGET (for 10 keys) and LRANGE (for lists of exactly 100 values) commands. With the new version of the library I am getting exceptions from lettuce indicating that it is either trying to parse more data than was actually requested for the particular command, or apparently waiting for more data to be received that never comes (due to however it is improperly parsing the response data and getting out of alignment, so the command inevitably times out and is canceled). Although all my list data only has 100 values each I can occasionally see through debug/trace logging that lettuce has parsed a list of 150 items (or more, or less than 100). I've also seen it attempting to parse more than 10 keys for my HMGET and then throwing this exception:
Debugging it makes it look like it has actually read all 10 keys for the HMGET but for whatever reason attempts to read an 11th even though the keys iterator has nothing left to match it to. I have trace logging enabled and can see the responses from lettuce and that they do contain the proper number of elements, so whatever piece of code that then deals with processing the response seems to be getting garbled. In other cases I'll see an LRANGE complete with 10 values, making it look like it consumed an HMGET response and skipped over the actual LRANGE response, or some combination of things.
I'm somewhat at a loss of where to go next with debugging this issue since everything works fine on 5.2.0. The application is big and complicated so I don't exactly have a sample app to reproduce it. I'm not seeing exceptions come up from any other parts of the code (so I don't see any exceptions+ coming from the codec).
Is it possible that when lettuce receives data within CommandHandler.channelRead that it is not saving it to its internal buffer correctly (and so throwing away some of the earlier data it received that did not represent the output of a full command)? I do see lines like:
In the above the stack began with 2 LRANGE requests. Trace logging (not included) shows that both responses were received of 100 items each. The first response was split across two channel reads (one of 512 bytes and the other of 494), while the second response came wholly in a 1006 byte response. At that point since it had a full response it went and processed the first command on the stack, the first LRANGE, but associated 150 values with its output (even though there were 100). In between an HMGET request was made and the response was received which was then used to complete the second LRANGE with 10 values.
I've noticed that in the first LRANGE response of 512 bytes, the last line in that response (which is one of the values in the list) coincidentally begins with '>', the same marker to indicate a PUSH type message in the state machine. Maybe that would cause it to treat this as
isPushDecode(buffer) == true
and therefore not log"{} Stack contains: {} commands"
for the second channelRead but still process those bytes into a pushOutput command (that would then be thrown away instead of used on the actual LRANGE command).Environment
The text was updated successfully, but these errors were encountered: