Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to collect producer batch request latency and error information? #130

Closed
beanliu opened this issue Feb 8, 2022 · 10 comments

Comments

@beanliu
Copy link

beanliu commented Feb 8, 2022

The best I could get so far is HookProduceBatchWritten and HookBrokerWrite. But they are not perfect:

  • HookBrokerWrite: impossible to distinguish if it's a produce request (bytesWritten is a strong signal but not always).
  • HookProduceBatchWritten: lack of latency and error information.

Maybe I'm missing something in the docs. But it would be great to have request latency and error metrics for producer monitoring.

@twmb
Copy link
Owner

twmb commented Feb 9, 2022

HookBrokerE2E can be used by processing the latency when key == 0 (same thing for HookBrokerWrite, but E2E is a bit more robust in the latency). This would be the latency from writing the bytes to the wire through reading the response, but not processing it.

I could potentially add latency to ProduceBatchMetrics as well. This could be any form of latency: from writing through reading the bytes to just before processing, or through processing (probably both options available). This would be an easier way to hook into latency, but would have a tiny bit of overhead from after the time the bytes are actually read to just before the data is processed.

So in summary:

  • It's possible to get some latency today by hooking into the key field, as produce request is key 0. Using this would allow you to get latency today.
  • This issue is a good feature request for adding more latency metrics to ProduceBatchMetrics.

I'm planning to spend some time this weekend on a recent batch of feature requests.

@beanliu
Copy link
Author

beanliu commented Feb 9, 2022

💯 cool, thanks.

@twmb
Copy link
Owner

twmb commented Feb 13, 2022

Looking at this further, I think it makes sense to keep in BrokerE2E, which is specific to the request lifecycle itself, vs ProduceBatchMetrics, which is specific to an individual batch within a produce request. Adding it to ProduceBatchMetrics may erroneously make it look like every batch in a request had a certain amount of latency, vs. the entire request itself and all batches within it having the one single latency.

Do you want per topic / per partition latency breakdown, or is per-request latency good? Per-record latency can currently be gleaned from the timestamp field within a record (set once buffered) and the time once the promise is called.

How would per-batch latency be used / beneficial in comparison to per-request and potentially per-record?

@twmb
Copy link
Owner

twmb commented Feb 13, 2022

Closing for the moment (to help me track some stuff), but this can be reopened if we decide more latency should be added somewhere else.

@twmb twmb closed this as completed Feb 13, 2022
@beanliu
Copy link
Author

beanliu commented Feb 13, 2022

Agree that adding latency info to ProduceBatchMetrics is confusing. And per-request latency is good enough in my case. However, if there is a need, it makes sense to collect finer granularity latency with the record timestamp provided in the Produce promise.

Minor suggestion: it would be great to have all related information well commented on in a single place somewhere.

@twmb
Copy link
Owner

twmb commented Feb 14, 2022

Where do you think it could be documented? doc/hooks.md? doc/monitoring.md?

@beanliu
Copy link
Author

beanliu commented Feb 14, 2022

Where do you think it could be documented? doc/hooks.md? doc/monitoring.md?

They both look good to me as long as it is reachable from a related section of the https://github.com/twmb/franz-go page.

@twmb
Copy link
Owner

twmb commented Feb 28, 2022

Sorry about the delay -- v1.4 should be this week. I think there's one more thing to add to it. I'll reopen this for now so that it can be properly closed when v1.4 is released (now that I have a v1.4 tracking issue).

@twmb twmb reopened this Feb 28, 2022
@twmb
Copy link
Owner

twmb commented Mar 1, 2022

30b9f1f

To be merged into master soon.

@twmb
Copy link
Owner

twmb commented Mar 1, 2022

Merged!

@twmb twmb closed this as completed Mar 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants