(sentry-metrics): Metrics indexer consumer #28431

MeredithAnya · 2021-09-07T21:23:54Z

Metrics Indexer Consumer:
Messages produced by Relay into the ingest-metrics topic have a metric name along with any number of tag key, value string pairs associated with the metric. The snuba topic snuba-metrics (which be changed here) expects the integers instead of strings.

The indexer (which will be implemented later) will store the string to int relationship in postgres, but for now this just uses the mock dummy indexer to actually do the conversion.

In this PR the consumer consumes messages from the ingest-metrics topic, translate the payload to have ints instead of strings and then produce to the snuba-metrics topic so that snuba can then store the data.

RedisMockIndexer
The temporary redis indexer can be used by changing the following in the conf/server.py:

SENTRY_METRICS_INDEXER = "sentry.sentry_metrics.indexer.redis_mock.RedisMockIndexer"

It also uses a bulk_record method to be able to get and set all the strings (metric name, tag keys and values) for a message at once.

src/sentry/sentry_metrics/indexer/mock.py

MeredithAnya · 2021-09-07T21:59:10Z

@fpacifici @jjbayer (cc @jan-auer): some questions/concerns/thoughts I have:

I'm not sure if we want to use the BatchKafkaConsumer, or if we should be writing our own Consumer, like what is done for the QuerySubscriptionConsumer.
I don't know exactly how best to have the dummy indexer work for what product needs to build on top of this, prior adding in the actual indexer.
Is the UseCase actually needed for the indexer? Or is that just information that needs to be passed along later to the metrics product data model?

src/sentry/sentry_metrics/indexer/indexer_consumer.py

jjbayer · 2021-09-08T17:11:33Z

I don't know exactly how best to have the dummy indexer work for what product needs to build on top of this, prior adding in the actual indexer.

For release health we cannot really mock tag values, because the release tag may have any value. Maybe we can use a redis key-value lookup as the simplest possible indexer implementation?

Is the UseCase actually needed for the indexer? Or is that just information that needs to be passed along later to the metrics product data model?

I think it's not necessary from a functional perspective -- but maybe for partitioning?

MeredithAnya · 2021-09-09T17:44:50Z

I don't know exactly how best to have the dummy indexer work for what product needs to build on top of this, prior adding in the actual indexer.

For release health we cannot really mock tag values, because the release tag may have any value. Maybe we can use a redis key-value lookup as the simplest possible indexer implementation?

Is the UseCase actually needed for the indexer? Or is that just information that needs to be passed along later to the metrics product data model?

I think it's not necessary from a functional perspective -- but maybe for partitioning?

Updates:

Added redis per @jjbayer's suggestion, assuming that it's an okay implementation of it to get things unblocked
Changed the interface to have only record and reverse_resolve where record does both the work of looking up the value and recording it if it's not already there

fpacifici

Thanks, this is a step in the right direction.
Will review again tomorrow with some more details on how to produce in an efficient manner.
I am not sure about the goal of that pubsub class that does not allow us to set a callback.

fpacifici · 2021-09-10T01:09:35Z

src/sentry/runner/commands/run.py

+@click.option("--topic", default="ingest-metrics", help="Topic to get subscription updates from.")
+@batching_kafka_options("metrics-consumer")
+@configuration
+def metrics_consumer(**options):


Is this going to start the consumer by default? I don't think we need it yet by default.

I've had to manually run sentry run metrics-consumer after running the sentry devserver so I don't think this starts by default

How come we are adding a separate command rather than using the single ingest-consumer command which runs the rest of the ingest consumers? We could still temporarily omit metrics from --all-consumer-types and only run it if explicitly called with the metrics consumer type if was the concern. Curious if there is another reason.

@lynnagara (cc @fpacifici) the ingest-consumer commands ends up using the IngestConsumerWorker where as we want to use the MetricsIndexerWorker instead. I felt like it was easier to keep these separate for now then to refactor the ingest-consumer command. It seems like this could be easily changed down the line if we wanted, but open to changing it now if people feel strongly

src/sentry/sentry_metrics/indexer/indexer_consumer.py

fpacifici · 2021-09-10T01:26:23Z

src/sentry/sentry_metrics/indexer/indexer_consumer.py

+            snuba_metrics_publisher = KafkaPublisher(
+                kafka_config.get_kafka_producer_cluster_options(cluster_name),
+                asynchronous=False,
+            )
+            snuba_metrics_publisher.publish(snuba_metrics["topic"], json.dumps(message))


This would flush the messages for every message, which is not ideal considering you have batches of messages and you can flush before committing on the ingest topic only once per batch of messages.
We should also set the callback for each message.

I would have a look at this approach to see how to use the callback
https://github.com/getsentry/cdc/blob/8643ee7a5bf491755c46169c6841131521d34b6c/cdc/producer.py#L41-L161
You probably do not need all that complexity.

Some follow up on what a consumer should do:

When writing a consumer like this that needs to achieve a high throughput, there are a few elements to take into account and a few requirements:

We cannot auto commit, because we would be committing before processing the message. If there is an error in processing the message is lost. This is taken into account as we do not auto commit

We don't need to commit after every offset. Less load on the broker and network

We must not loose messages. So if an error happens at any point during processing we cannot commit the entire batch but only the portion that we sent to Kafka

producing is asynchronous. So we need to wait for the callback before being sure that the message is persisted in a kafka topic https://docs.confluent.io/clients-confluent-kafka-python/current/overview.html#asynchronous-writes

We should really avoid duplicates, as they would not be deduplicated in Clickhouse since metrics are stored in a pre-aggregated way. So if there is an error during processing that causes the consumer to crash we should have committed up to the last acknowledged message (acknowledged meaning that we did receive the callback from the producer).

We should keep producing asynchronously and not flush every message individually as that would have a real impact on throughput.

Exactly one semantics is technically not achievable as nobody can deduplicate messages. But we can make duplication extremely rare (basically only in case Kafka commit fails multiple times or the consumer crashes for out of memory without being able to commit and after flushing.

So there are a few ways to do that:

Simple batching consumer. Do the processing phase, then the batch flush sends all the messages, at the end flushes and waits for callbacks. It only commits on kafka the offset of the last callback received.

Something like the cdc link above. Keep producing messages as soon as they are processed and periodically commit the last offset we got the callback for.

jjbayer

This looks great! I did a manual end-to-end test locally and everything works as expected.

src/sentry/sentry_metrics/indexer/mock.py

jjbayer · 2021-09-10T12:29:42Z

src/sentry/sentry_metrics/indexer/mock.py


-    def resolve(self, organization: Organization, use_case: UseCase, string: str) -> Optional[int]:


Contrary to what I said previously, I think it would make sense to keep the resolve method. It's already in use here, and if the indexer entries ever get a TTL, it would not make sense to prolong the retention every time the indexer is queried from the product side.

src/sentry/sentry_metrics/indexer/mock.py

nikhars · 2021-09-24T19:56:26Z

LGTM. Nice job.

fpacifici

Good first step. Thanks

fpacifici · 2021-09-24T19:53:12Z

src/sentry/sentry_metrics/indexer/indexer_consumer.py

+                on_delivery=self.callback,
+            )
+
+        messages_left = self.__producer.flush(5.0)


please as a follow up PR. We will want a metric here to measure how long we wait.

fpacifici · 2021-09-24T20:07:14Z

Please figure out what is wrong with CI before merging

The UseCase arg was removed in #28431.

* WIP(metrics): Metrics indexer consumer * use redis for mock indexer * add bulk_record and async producing * make redis mock indexer separate file * fix type errors * add comment * remove UseCase and updates tests * update more tests * clean up part I * mini cleanup * add basic tests * all org_ids are ints * missed one * more clean up * consumer test * rename test file * lil updates * attempt to fix tests * try dis tho

The UseCase arg was removed in #28431.

MeredithAnya commented Sep 7, 2021

View reviewed changes

src/sentry/sentry_metrics/indexer/mock.py Outdated Show resolved Hide resolved

MeredithAnya changed the title ~~WIP(metrics): Metrics indexer consumer~~ WIP(sentry-metrics): Metrics indexer consumer Sep 7, 2021

MeredithAnya commented Sep 7, 2021

View reviewed changes

src/sentry/sentry_metrics/indexer/indexer_consumer.py Outdated Show resolved Hide resolved

MeredithAnya requested review from jjbayer and fpacifici September 7, 2021 22:01

jjbayer reviewed Sep 8, 2021

View reviewed changes

src/sentry/sentry_metrics/indexer/indexer_consumer.py Outdated Show resolved Hide resolved

jjbayer reviewed Sep 8, 2021

View reviewed changes

src/sentry/sentry_metrics/indexer/indexer_consumer.py Outdated Show resolved Hide resolved

vercel bot deployed to Preview – sentry September 9, 2021 17:39 View deployment

vercel bot deployed to Preview – storybook September 9, 2021 17:39 View deployment

MeredithAnya mentioned this pull request Sep 9, 2021

ref(metrics): Change metrics topic name getsentry/snuba#2089

Merged

fpacifici reviewed Sep 10, 2021

View reviewed changes

jjbayer reviewed Sep 10, 2021

View reviewed changes

src/sentry/sentry_metrics/indexer/mock.py Outdated Show resolved Hide resolved

MeredithAnya force-pushed the metrics/SNS-397 branch from 289eb4b to 7d65ee2 Compare September 17, 2021 20:08

vercel bot deployed to Preview – sentry September 17, 2021 20:09 View deployment

vercel bot deployed to Preview – storybook September 17, 2021 20:09 View deployment

vercel bot deployed to Preview – sentry September 20, 2021 16:25 View deployment

vercel bot deployed to Preview – storybook September 20, 2021 16:25 View deployment

MeredithAnya force-pushed the metrics/SNS-397 branch from 4d13ffe to 23021e9 Compare September 20, 2021 16:31

vercel bot deployed to Preview – storybook September 20, 2021 16:31 View deployment

vercel bot deployed to Preview – sentry September 20, 2021 16:31 View deployment

MeredithAnya changed the title ~~WIP(sentry-metrics): Metrics indexer consumer~~ (sentry-metrics): Metrics indexer consumer Sep 20, 2021

MeredithAnya marked this pull request as ready for review September 20, 2021 16:34

vercel bot deployed to Preview – sentry September 20, 2021 18:33 View deployment

vercel bot deployed to Preview – storybook September 20, 2021 18:33 View deployment

MeredithAnya requested review from jjbayer and a team September 20, 2021 18:38

MeredithAnya added 9 commits September 24, 2021 12:29

clean up part I

e5268bc

mini cleanup

b99836f

add basic tests

839905e

all org_ids are ints

b78aafd

missed one

ba5cbcf

more clean up

f417c2f

consumer test

afd3fe9

rename test file

94396de

lil updates

9d4cbd9

MeredithAnya force-pushed the metrics/SNS-397 branch from 933819c to 9d4cbd9 Compare September 24, 2021 19:29

vercel bot deployed to Preview – sentry September 24, 2021 19:29 View deployment

vercel bot deployed to Preview – storybook September 24, 2021 19:29 View deployment

fpacifici approved these changes Sep 24, 2021

View reviewed changes

attempt to fix tests

e4605ed

MeredithAnya requested a review from a team as a code owner September 24, 2021 20:58

vercel bot deployed to Preview – sentry September 24, 2021 20:58 View deployment

vercel bot deployed to Preview – storybook September 24, 2021 20:58 View deployment

try dis tho

7f85d7b

vercel bot deployed to Preview – storybook September 24, 2021 22:47 View deployment

vercel bot deployed to Preview – sentry September 24, 2021 22:47 View deployment

MeredithAnya merged commit d393562 into master Sep 27, 2021

MeredithAnya deleted the metrics/SNS-397 branch September 27, 2021 16:33

jjbayer mentioned this pull request Sep 28, 2021

fix(metrics): Remove UseCase from indexer calls #28890

Merged

jjbayer added a commit that referenced this pull request Sep 28, 2021

fix(metrics): Remove UseCase from indexer calls (#28890)

25988d0

The UseCase arg was removed in #28431.

vuluongj20 pushed a commit that referenced this pull request Sep 30, 2021

fix(metrics): Remove UseCase from indexer calls (#28890)

fa1b6a9

The UseCase arg was removed in #28431.

MeredithAnya mentioned this pull request Sep 30, 2021

ref(sentry-metrics): Add MetricsKeyIndexer table #28914

Merged

github-actions bot locked and limited conversation to collaborators Oct 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(sentry-metrics): Metrics indexer consumer #28431

(sentry-metrics): Metrics indexer consumer #28431

MeredithAnya commented Sep 7, 2021 •

edited

Loading

MeredithAnya commented Sep 7, 2021

jjbayer commented Sep 8, 2021

MeredithAnya commented Sep 9, 2021

fpacifici left a comment

fpacifici Sep 10, 2021

MeredithAnya Sep 17, 2021 •

edited

Loading

lynnagara Sep 21, 2021 •

edited

Loading

MeredithAnya Sep 24, 2021

fpacifici Sep 10, 2021 •

edited

Loading

fpacifici Sep 10, 2021

jjbayer left a comment

jjbayer Sep 10, 2021

nikhars commented Sep 24, 2021

fpacifici left a comment

fpacifici Sep 24, 2021

fpacifici commented Sep 24, 2021


		def resolve(self, organization: Organization, use_case: UseCase, string: str) -> Optional[int]:

(sentry-metrics): Metrics indexer consumer #28431

(sentry-metrics): Metrics indexer consumer #28431

Conversation

MeredithAnya commented Sep 7, 2021 • edited Loading

MeredithAnya commented Sep 7, 2021

jjbayer commented Sep 8, 2021

MeredithAnya commented Sep 9, 2021

fpacifici left a comment

Choose a reason for hiding this comment

fpacifici Sep 10, 2021

Choose a reason for hiding this comment

MeredithAnya Sep 17, 2021 • edited Loading

Choose a reason for hiding this comment

lynnagara Sep 21, 2021 • edited Loading

Choose a reason for hiding this comment

MeredithAnya Sep 24, 2021

Choose a reason for hiding this comment

fpacifici Sep 10, 2021 • edited Loading

Choose a reason for hiding this comment

fpacifici Sep 10, 2021

Choose a reason for hiding this comment

jjbayer left a comment

Choose a reason for hiding this comment

jjbayer Sep 10, 2021

Choose a reason for hiding this comment

nikhars commented Sep 24, 2021

fpacifici left a comment

Choose a reason for hiding this comment

fpacifici Sep 24, 2021

Choose a reason for hiding this comment

fpacifici commented Sep 24, 2021

MeredithAnya commented Sep 7, 2021 •

edited

Loading

MeredithAnya Sep 17, 2021 •

edited

Loading

lynnagara Sep 21, 2021 •

edited

Loading

fpacifici Sep 10, 2021 •

edited

Loading