Skip to content
This repository was archived by the owner on Nov 20, 2024. It is now read-only.

Streams get flagged for re-partitioning even when unnecessary. #412

Closed
TimWillard opened this issue Jul 19, 2018 · 6 comments
Closed

Streams get flagged for re-partitioning even when unnecessary. #412

TimWillard opened this issue Jul 19, 2018 · 6 comments
Assignees

Comments

@TimWillard
Copy link

TimWillard commented Jul 19, 2018

@sobychacko
Copy link
Contributor

@Tim-Willard, Unlike the map calls you mentioned on the linked PR above, these particular map calls are necessary as we don't know at that point if the keys are used downstream in the processor. If you come up with some alternatives, please consider contributing through a PR. If you can elaborate more on the use case, we may be able to come up with some ideas to consider.

@TimWillard
Copy link
Author

@sobychacko Looking specifically at the second map call I listed, I don't understand what it is accomplishing. The adapt method, as far as I can tell, has two cases. Case 1 is when we are not using native decoding, and thus we must run bindingTarget through deserializeOnInbound(...), which applies a .branch() and a .mapValues() call. This makes sense, and does not repartition. Case 2 is when we are using native decoding, in which case I believe Spring should do nothing and simply return bindingTarget. Instead, in this case we do a map which does not modify the key or the value, and appears to only accomplish triggering a repartition I don't see a use for.

I guess what I'm asking is: if you replace line 55 with return bindingTarget, what goes wrong? The non-native decoding case works without re-partitioning, why does this case require a map call at all?

@sobychacko
Copy link
Contributor

@Tim-Willard Sorry for responding late. The return bindingTarget.map... statement on line 55 that you are mentioning is only invoked during the initial bootstrap time by the binder framework. This statement is not invoked at runtime per record arrival, but the lambda inside the map operation is. By changing the statement to simply return bindingTarget, it just returns the KStream proxy object at startup, but at runtime, the flow will be entirely broken as nothing is flowing downstream. The map operation forces the data to go to the next step in the topology through the binding. What kind of issues that you are encountering from the repartitioning initiated through this map call? I don't see an obvious way to fix this as this call is needed by the binder internally. There may be workarounds, but I am afraid they are more disruptive to the framework.

@TimWillard
Copy link
Author

@sobychacko The issue I'm encountering is predominantly one of performance. If I want to (for example) join 3 streams together and store the result in a KTable to query, that should only generate 1 internal topic. These map calls cause 3 additional topics to be made (and thus messages written to) that are unnecessary and wouldn't be created if I wasn't using the binder.

Its not something that can't be worked around, but for a simple process like the above the binder is introducing more overhead than the actual work being done (in terms of network traffic and stored messages), and I don't see why it is required. Nothing that I know about Kafka Streams suggests why the map operation is required to force the data to the next step in the topology (especially when the other code path uses a branch and mapValues operation which doesn't cause the repartition). The only thing I was able to gather from what you said is that there is a problem with returning the wrapper that is passed to adapt directly, and that calling map returns the unwrapped KStream. However, calling mapValues should accomplish that same thing without unintended side effects.

@sobychacko
Copy link
Contributor

@Tim-Willard We will keep this issue open for tracking this as it pertains to performance. Will give you an update soon.

@sobychacko
Copy link
Contributor

@Tim-Willard Apologies again for the long delay in fixing the issue. It is fixed now upstream.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

3 participants