Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Retrieving Jenkins Build Duration Metrics via OpenTelemetry Plugin #972

Closed
miraccan00 opened this issue Oct 23, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@miraccan00
Copy link

Hello OpenTelemetry Development Team,

I have been working on integrating Jenkins with OpenTelemetry to collect build metrics. While I've made some progress, I've encountered challenges in retrieving certain metrics and would like to seek your assistance or guidance.

Objectives:

  • Retrieve Build Duration Metrics for Jobs:

    I need to collect the build duration metrics for individual Jenkins jobs. This includes capturing the time each build takes to complete.

  • Calculate Average Build Duration for Specific Jobs:

    I aim to compute the average build duration over time for specific jobs to analyze performance trends.

Challenges:

  • Unified Data Collection:

    I wish to obtain all the metrics that the Jenkins Prometheus plugin provides but exclusively using the OpenTelemetry plugin. My goal is to avoid using multiple collectors and centralize all data collection through OpenTelemetry.

  • Plugin Support and Roadmap:

    It appears that the current OpenTelemetry Jenkins plugin may not support some of these metrics out of the box. If that's the case, I am willing to contribute to the plugin's development. I would greatly appreciate a roadmap, guidelines, or any documentation that could assist me in extending the plugin to support these metrics.

Additional Goals:

  • Grafana Dashboard Integration:

    Ultimately, I aim to create a Grafana dashboard that visualizes all the collected Jenkins metrics in one place, leveraging the data from OpenTelemetry.

Current Implementation:

I have created a repository with my current setup and attempts to achieve these objectives:

Request:

  • Support and Guidance:

    Could you please advise on whether the OpenTelemetry Jenkins plugin currently supports these metrics? If not, what would be the recommended approach to implement this functionality?

  • Collaboration Opportunity:

    If development is needed to add this feature, I am eager to contribute. Guidance on how to proceed or whom to collaborate with would be highly appreciated.

Thank you for your time and consideration. I look forward to your response and the possibility of enhancing the Jenkins OpenTelemetry integration together.

Best regards,

@cyrille-leclerc
Copy link
Contributor

cyrille-leclerc commented Oct 25, 2024

Great suggestion!
For the build metrics, please see:

Longer term you are absolutely right, the Jenkins otel plugin should provide all the metrics needed by Jenkins admins and users.

I'm on PTO at the moment, I'll follow up asap.

@christophe-kamphaus-jemmic
Copy link
Contributor

The opentelemetry-plugin also supports sending build traces to a tracing backend (elasticsearch/jaeger).
These traces can be queried to calculate metrics which can be displayed on a dashboard. These are also called span metrics.
This is already possible in the current version of the plugin by using the span duration grouped by ci.pipeline.id attribute set on the root span of the build.

If you also want duration metrics for individual stages per-pipeline that is possible by adding withSpanAttributes to your jobs.
cf. #952 (comment), #811 (comment)

In general it's not a good idea to have very specific metrics (ie. specific to a single job run) because of the cardinality issue some metric backends suffer from (eg. Prometheus). Usually metrics are used to aggregate data (counts, histograms, …) while traces/logs consider individual requests/events.
For traces/logs it's possible to use sampling to reduce the amount of data needing to be processed and stored. If a sampling rate of 100% is used than any metric calculate based on the traces should be accurate.

I think #959 is a great addition to the opentelemetry-plugin. It is fine since it aggregates the individual job runs for a given pipeline and gives administrators control over which pipelines should be monitored specifically.
What it does not allow is querying the exact build duration for a specific job run.
Having metrics specific to a job run would be problematic. The prometheus-plugin has such an option which is thankfully guarded by a configuration option, but it is global and does not allow filtering which jobs it applies to:
Image

In my experience if you want per-run metrics you are better of to query the traces.

@kuisathaverat kuisathaverat added the enhancement New feature or request label Oct 28, 2024
@cyrille-leclerc
Copy link
Contributor

Please use the ci.pipeline.run.duration{ci.pipeline.id="<<pipeline full name>>", ci.pipeline.result="<<SUCCESS, UNSTABLE, FAILURE, NOT_BUILT, ABORTED>>"} histogram metric we have just released.
ℹ Use the otel.instrumentation.jenkins.run.metric.duration.allow_list and otel.instrumentation.jenkins.run.metric.duration.deny_list to specify the pipelines for which you want to capture the run duration, other pipelines will be aggregated in the ci.pipeline.id="#other#" time series.

See documentation https://github.com/jenkinsci/opentelemetry-plugin/blob/main/docs/monitoring-metrics.md#build-duration

I'm marking your enhancement request as solved. Please open new enhancement requests if needed.

@miraccan00
Copy link
Author

Thanks for addressing my enhancement request and providing the solution. I appreciate the prompt response and detailed guidance.

@cyrille-leclerc
Copy link
Contributor

You're welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants