perf(consumers): Route messages based on kafka headers #2176
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
The current errors and transactions consumers determine whether the message should be processed or not at a stage when the messages have been deserialized. For example, in transactions consumer it happens at https://github.com/getsentry/snuba/blob/master/snuba/datasets/transactions_processor.py#L78-L79. It is done that way since error and transaction messages are on the same topic.
Improvement
The change introduces a new message pre filter which looks at kafka headers and decides whether the message should be dropped or not.
With this getsentry/sentry#29618 merged, there is a way to do the filtering based on kafka headers. This should improve performance since the consumers don't need to deserialize the message anymore to
determine whether to drop message or not.
Tests
Added unit tests for the KafkaHeaderFilter. Also ran code with some additional logs and saw that the messages being dropped by errors and transactions consumer were mutually exclusive and atleast one consumer was dropping every offset message
00:18:53 transaction-consumer | 2021-10-29 00:18:53,784 Message with offset 21886 dropped 00:18:53 transaction-consumer | 2021-10-29 00:18:53,784 Message with offset 21887 dropped 00:18:53 consumer | 2021-10-29 00:18:53,797 Message with offset 21888 dropped 00:18:53 transaction-consumer | 2021-10-29 00:18:53,894 Message with offset 21889 dropped 00:18:53 transaction-consumer | 2021-10-29 00:18:53,901 Message with offset 21890 dropped 00:18:53 transaction-consumer | 2021-10-29 00:18:53,912 Message with offset 21891 dropped 00:18:53 consumer | 2021-10-29 00:18:53,917 Message with offset 21892 dropped 00:18:54 consumer | 2021-10-29 00:18:54,035 Message with offset 21893 dropped 00:18:54 transaction-consumer | 2021-10-29 00:18:54,063 Message with offset 21894 dropped 00:18:54 transaction-consumer | 2021-10-29 00:18:54,081 Message with offset 21895 dropped 00:18:54 consumer | 2021-10-29 00:18:54,174 Message with offset 21896 dropped 00:18:54 transaction-consumer | 2021-10-29 00:18:54,234 Message with offset 21897 dropped 00:18:54 consumer | 2021-10-29 00:18:54,284 Message with offset 21898 dropped 00:18:54 transaction-consumer | 2021-10-29 00:18:54,324 Message with offset 21899 dropped