-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a metrics query endpoint #2736
Comments
I think this isn't something that belongs to the Jaeger core. We already have quite a few things to do in the tracing front and there are capable UIs out there serving as metrics dashboards. For instance, when used with Istio, Kiali can get this kind of information: Instead of having to use a metrics backend for this, we could brainstorm on how we can use our trace information to provide this kind of information. We've talked a bit in the past about trace analytics with notebooks and/or spark, and I think this is the path we should explore. |
@jpkrohling thanks for your thoughts. In fact, the proposal is to use our trace information to provide this kind of information. This is described in more detail in the linked proposal document. Sorry it wasn't clear in this issue description, I've updated it. The idea is the OTEL collector will aggregate metrics from span data via a new spanmetrics processor (WIP) and write them to a prometheus exporter into a prometheus compliant backend like M3 or, of course, Prometheus. Then this data can be used to enrich the user experience of Jaeger UI. For example, sort services by relevance based on error rates, latency or call count, and also potentially enrich the DAG with latency or error annotations. IMHO, using another tool like notebook/spark involves a context switch instead of a seamless experience within Jaeger UI itself. BTW, the intention is to give new users or even experienced users, the ability to quickly deep dive into the most relevant traces and hit the road running with Jaeger UI, and make good use of the empty real-estate of the opening search screen. I think that in-depth/advanced analysis of trace data belongs in the realm of notebooks and spark, and believe these satisfy a different group of use cases, where ML can reveal interesting insights from trace data. However, in my mind, this is intended to be quite simple and opinionated towards the specific use cases that improve the user experience of Jaeger UI, and should not try to replicate powerful metrics dashboard products like Grafana or Kiali (which couples users to Istio), nor play the role of a trace analysis tool. |
Nice proposal, sorry for not properly reading it first. I think I need some time to process it, as my initial reaction is the same: we might be going on a path that would deviate us from our main purpose (tracing). If we were to derive this information from the storage that we already use, I'd be fine with it, but getting this data from a metrics backend sounds weird, especially as the source of information was already trace data in the first place. @kevinearls do you think this type of query would be possible with InfluxDB IOx? |
@jpkrohling to be honest, I don't really know enough about IOx yet to say either way. |
I don't view this as "metrics" solution, but as a way to navigate to interesting traces through aggregate views. Most of these aggregates naturally take form of time series, hence the need for a storage backend that can store time series. |
This is actually why I asked above about IOx. If our own storage can answer this, we don't need to require people to convert traces to data points only to come back to Jaeger later. |
@jpkrohling If you take the data from the tracing storage you are not necessarily providing accurate view of a system state since sampling limits the amount of trace data on the storage. |
Not sure about the other arguments, but you are right about the sampling. I think we'll talk about storage tomorrow, so, we can discuss more about this after that. But I think this is looking good. |
As an end-user, for a period of time, I am trying to have an aggregated statistics / metrics of the very same data presented under "Trace Statistics" of a trace. Also tried over Grafana. I checked the metrics provided under name prometheus, did not exist. Cheked opentelemety metrics both in javaagent metrics exporter and otel collector exposed ones... no luck. In that sense, I value the motivation @albertteoh here within. I can not argue the point of view in responses. While, if it was there as UI or prometheus , it would have been very beneficial. When metrics are discussed, the main metrics produced by tracing itself is almost unreachable. It is there. It is so valuable. But we can not view it. This trace/span created metrics are the distinction between a commercial apm product. We can see trace itself under Grafana. But the treasure inside is locked. And still, if it was available on jaeger UI, would absolutely be very practical for many newcomers in observability. |
@jaegertracing/jaeger-maintainers and community, I wanted to checkin to see if we're okay to go ahead with this proposal or if there are any outstanding questions/concerns that still need addressing? |
+1 |
go go go! |
+1, This is great functionality and addition to the Jaeger project. Doing it at the collector with tail-based sampling makes a lot of sense. The data itself can live in any (query compatible) storage, I see this as a pluggable feature like a dependency diagram (calculated by Spark). |
@albertteoh can we close this one? |
@pavolloffay yup, I've closed it. We're tracking progress in #2954. |
Requirement - what kind of business use case are you trying to solve?
Follow-up to: #2574
To integrate (operational R.E.D) metrics into Jaeger, sourced from the OTEL collector's spanmetrics processor, which opens up potential use cases such as:
Though not the focus of this issue, the following mockup (courtesy of @Danafrid) aims to communicate what a potential use case of an operational homepage may look like:
Problem - what in Jaeger blocks you from solving the requirement?
The ability to query a metrics backend from Jaeger UI.
Proposal - what do you suggest to solve the problem or improve the existing situation?
More details can be found in the proposal document.
Add an
/api/metrics
endpoint to the existing jaeger-query service, or introduce a new service if this is a preferred approach.This endpoint will relay the request over to a configured PromQL compatible metrics backend such as Prometheus or M3DB.
We would appreciate feedback from the community on this feature proposal.
The text was updated successfully, but these errors were encountered: