-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[podmanreceiver] Add metrics and resource metadata #30232
[podmanreceiver] Add metrics and resource metadata #30232
Conversation
container.blockio.io_service_bytes_recursive.read: | ||
enabled: true | ||
description: "Number of bytes transferred from the disk by the container" | ||
extended_documentation: "[More docs]i/www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt)." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the format of link is not valid. Should it be like this? [More docs](https://www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in df281be
Could you also update the |
container.memory.usage.limit: | ||
enabled: true | ||
description: "Memory limit of the container." | ||
unit: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the memory related metrics' unit should be By
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, fixed in df281be
Definitely, thanks for the feedback. Added in df281be |
container.blockio.io_service_bytes_recursive.write: | ||
enabled: true | ||
description: "Number of bytes transferred to the disk by the container" | ||
extended_documentation: "[More docs]i/www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt)." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should also be fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 5f3528c. Thanks
container.cpu.usage.system: | ||
enabled: true | ||
description: "System CPU usage." | ||
unit: ns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://opentelemetry.io/docs/specs/semconv/system/system-metrics/#metric-systemcputime
I'm not sure if cputime unit should be ns
or s
, @open-telemetry/collector-approvers should this metric use s
as unit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that cputime metric's unit is in seconds, as this is how many OSes report it (/proc/stats). But containers are handled by cgroup controllers, which allow for better precision. I would say not to lose precision, instead add container metrics into the semantic convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your comments also make sense, I will pull this discussion to slacks, to get more input. @rogercoll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docker stats receiver collects container.cpu.usage.system
as well: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/dockerstatsreceiver/metadata.yaml#L63-L67, and the unit is ns
. IMO we should also use ns
here to stay consistent between the two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docker stats receiver collects
container.cpu.usage.system
as well: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/dockerstatsreceiver/metadata.yaml#L63-L67, and the unit isns
. IMO we should also usens
here to stay consistent between the two.
Make senses, I already throw this to other approvers in slack, if they have no objection to this, I will approve this pr.
@rogercoll @mackjmr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Does it make sense to start working on the container's semantic convention PR? (feel free to add me into the slack thread @rogercoll)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No one responses for that for a day. And as @mackjmr said docker stats receiver also take ns
as unit, so I think this is ok to approve.
Co-authored-by: Mackenzie <63265430+mackjmr@users.noreply.github.com>
cmd/mdatagen/validate_test.go
Outdated
"container.cpu.utilization": {"docker_stats", "kubeletstats"}, | ||
"container.cpu.usage.system": {"docker_stats", "podman_stats"}, | ||
"container.cpu.usage.percpu": {"docker_stats", "podman_stats"}, | ||
"container.cpu.usage.total": {"docker_stats", "podman_stats"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it okay to report the same metric with different units?
As discussed above, on the Docker stats receiver we have nanoseconds
opentelemetry-collector-contrib/receiver/dockerstatsreceiver/documentation.md
Lines 41 to 47 in facd369
### container.cpu.usage.total | |
Total CPU time consumed. | |
| Unit | Metric Type | Value Type | Aggregation Temporality | Monotonic | | |
| ---- | ----------- | ---------- | ----------------------- | --------- | | |
| ns | Sum | Int | Cumulative | true | |
### container.cpu.usage.total | |
Total CPU time consumed. | |
| Unit | Metric Type | Value Type | Aggregation Temporality | Monotonic | | |
| ---- | ----------- | ---------- | ----------------------- | --------- | | |
| s | Sum | Int | Cumulative | true | |
Is that okay? If seconds is the right unit, shouldn't we use it on the Docker stats receiver as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, I would not use second's precision for the sake of convenience at the expense of precision. Instead, we should endeavor to establish distinct conventions specifically tailored to containers. Should we wait for the container's semantic convention open-telemetry/semantic-conventions#282 (nanoseconds)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, I would not use second's precision for the sake of convenience at the expense of precision. Instead, we should endeavor to establish distinct conventions specifically tailored to containers.
My objection here is with the use of different units on each metric, I would expect them to have the same unit (whether it is nanoseconds or seconds I agree is something we can leave open-telemetry/semantic-conventions#282 to decide on)
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Can I ask that the files in conflict get fixed up, outside of that, I don't see any outstanding issues that would block this PR? |
I think we still wait for the open-telemetry/semantic-conventions#282 to be merged first |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
just ping jsuereth on open-telemetry/semantic-conventions#282 to merge the pr. |
@rogercoll could you push forward this pr since open-telemetry/semantic-conventions#282 is merge? |
After the merge of open-telemetry/semantic-conventions#282, some additional changes will still be required:
Those are breaking changes in all the current metrics, should we proceed with this PR (no-op change) to include the metadata file and implement the semantic convention alignment in another one? |
My preference would be to split the changes to keep change sizes small, are you okay with that @fatsheep9146 ? |
I agreed. @rogercoll could you fix the conflicts, so I could push this pr to be merged ASAP. |
Done, I also updated the changelog file to a "breaking" change type due to the cpu precision fix (ns -> s). |
If we prefer a totally no-op change PR, I could rollback the a5a0099 commit and push it into a different PR. Let me know what you think |
Description:
Link to tracking Issue: #28640
Testing: Previous tests preserved.
Documentation: