Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[podmanreceiver] Add metrics and resource metadata #30232

Merged
merged 24 commits into from
Apr 9, 2024

Conversation

rogercoll
Copy link
Contributor

@rogercoll rogercoll commented Dec 28, 2023

Description:

  • Adds "metadata.yml" file to autogenerate metrics and resources.
  • [Update: not done in this PR] Fixes invalid network metrics: "rx -> input" and "tx -> output"

Link to tracking Issue: #28640

Testing: Previous tests preserved.

Documentation:

@rogercoll rogercoll requested a review from dmitryax as a code owner December 28, 2023 16:48
@github-actions github-actions bot added the cmd/mdatagen mdatagen command label Dec 28, 2023
container.blockio.io_service_bytes_recursive.read:
enabled: true
description: "Number of bytes transferred from the disk by the container"
extended_documentation: "[More docs]i/www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt)."
Copy link
Contributor

@fatsheep9146 fatsheep9146 Dec 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the format of link is not valid. Should it be like this? [More docs](https://www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in df281be

@fatsheep9146
Copy link
Contributor

Could you also update the metric section in ReadMe.md to let user to see the documentation.md to check about the metrics supported by podman receiver?

container.memory.usage.limit:
enabled: true
description: "Memory limit of the container."
unit: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the memory related metrics' unit should be By.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, fixed in df281be

@rogercoll
Copy link
Contributor Author

Could you also update the metric section in ReadMe.md to let user to see the documentation.md to check about the metrics supported by podman receiver?

Definitely, thanks for the feedback. Added in df281be

container.blockio.io_service_bytes_recursive.write:
enabled: true
description: "Number of bytes transferred to the disk by the container"
extended_documentation: "[More docs]i/www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt)."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should also be fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5f3528c. Thanks

container.cpu.usage.system:
enabled: true
description: "System CPU usage."
unit: ns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://opentelemetry.io/docs/specs/semconv/system/system-metrics/#metric-systemcputime
I'm not sure if cputime unit should be ns or s, @open-telemetry/collector-approvers should this metric use s as unit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that cputime metric's unit is in seconds, as this is how many OSes report it (/proc/stats). But containers are handled by cgroup controllers, which allow for better precision. I would say not to lose precision, instead add container metrics into the semantic convention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comments also make sense, I will pull this discussion to slacks, to get more input. @rogercoll

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker stats receiver collects container.cpu.usage.system as well: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/dockerstatsreceiver/metadata.yaml#L63-L67, and the unit is ns. IMO we should also use ns here to stay consistent between the two.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker stats receiver collects container.cpu.usage.system as well: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/dockerstatsreceiver/metadata.yaml#L63-L67, and the unit is ns. IMO we should also use ns here to stay consistent between the two.

Make senses, I already throw this to other approvers in slack, if they have no objection to this, I will approve this pr.
@rogercoll @mackjmr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Does it make sense to start working on the container's semantic convention PR? (feel free to add me into the slack thread @rogercoll)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No one responses for that for a day. And as @mackjmr said docker stats receiver also take ns as unit, so I think this is ok to approve.

rogercoll and others added 2 commits January 3, 2024 17:21
Co-authored-by: Mackenzie <63265430+mackjmr@users.noreply.github.com>
"container.cpu.utilization": {"docker_stats", "kubeletstats"},
"container.cpu.usage.system": {"docker_stats", "podman_stats"},
"container.cpu.usage.percpu": {"docker_stats", "podman_stats"},
"container.cpu.usage.total": {"docker_stats", "podman_stats"},
Copy link
Member

@mx-psi mx-psi Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to report the same metric with different units?

As discussed above, on the Docker stats receiver we have nanoseconds

### container.cpu.usage.total
Total CPU time consumed.
| Unit | Metric Type | Value Type | Aggregation Temporality | Monotonic |
| ---- | ----------- | ---------- | ----------------------- | --------- |
| ns | Sum | Int | Cumulative | true |
while here we use seconds
### container.cpu.usage.total
Total CPU time consumed.
| Unit | Metric Type | Value Type | Aggregation Temporality | Monotonic |
| ---- | ----------- | ---------- | ----------------------- | --------- |
| s | Sum | Int | Cumulative | true |

Is that okay? If seconds is the right unit, shouldn't we use it on the Docker stats receiver as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, I would not use second's precision for the sake of convenience at the expense of precision. Instead, we should endeavor to establish distinct conventions specifically tailored to containers. Should we wait for the container's semantic convention open-telemetry/semantic-conventions#282 (nanoseconds)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, I would not use second's precision for the sake of convenience at the expense of precision. Instead, we should endeavor to establish distinct conventions specifically tailored to containers.

My objection here is with the use of different units on each metric, I would expect them to have the same unit (whether it is nanoseconds or seconds I agree is something we can leave open-telemetry/semantic-conventions#282 to decide on)

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@MovieStoreGuy
Copy link
Contributor

Can I ask that the files in conflict get fixed up, outside of that, I don't see any outstanding issues that would block this PR?

@fatsheep9146
Copy link
Contributor

fatsheep9146 commented Mar 4, 2024

Can I ask that the files in conflict get fixed up, outside of that, I don't see any outstanding issues that would block this PR?

I think we still wait for the open-telemetry/semantic-conventions#282 to be merged first

@MovieStoreGuy

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Mar 18, 2024
@fatsheep9146
Copy link
Contributor

just ping jsuereth on open-telemetry/semantic-conventions#282 to merge the pr.

@fatsheep9146
Copy link
Contributor

@rogercoll could you push forward this pr since open-telemetry/semantic-conventions#282 is merge?

@rogercoll
Copy link
Contributor Author

rogercoll commented Mar 28, 2024

After the merge of open-telemetry/semantic-conventions#282, some additional changes will still be required:

  • IO: From container.blockio.io_service_bytes_recursive to container.disk.io
  • Network: From container.network.io.usage to container.network.io
  • Memory:
    • From container.memory.usage.total to container.memory.usage
    • Add container.memory.limit to semantic conventions
    • Add container.memory.utilization to semantic conventions
  • CPU:
    • From container.cpu.usage.system and container.cpu.usage.total to container.cpu.time
    • Add container.cpu.logical_number attribute to semantic conventions.
    • Add container.cpu.utilization to semantic conventions.

Those are breaking changes in all the current metrics, should we proceed with this PR (no-op change) to include the metadata file and implement the semantic convention alignment in another one?

@MovieStoreGuy
Copy link
Contributor

My preference would be to split the changes to keep change sizes small, are you okay with that @fatsheep9146 ?

@fatsheep9146
Copy link
Contributor

My preference would be to split the changes to keep change sizes small, are you okay with that @fatsheep9146 ?

I agreed. @rogercoll could you fix the conflicts, so I could push this pr to be merged ASAP.

@rogercoll
Copy link
Contributor Author

I agreed. @rogercoll could you fix the conflicts, so I could push this pr to be merged ASAP.

Done, I also updated the changelog file to a "breaking" change type due to the cpu precision fix (ns -> s).

@rogercoll
Copy link
Contributor Author

If we prefer a totally no-op change PR, I could rollback the a5a0099 commit and push it into a different PR. Let me know what you think

@MovieStoreGuy MovieStoreGuy merged commit fe59abb into open-telemetry:main Apr 9, 2024
169 of 170 checks passed
@github-actions github-actions bot added this to the next release milestone Apr 9, 2024
@rogercoll rogercoll deleted the add_podman_metadata branch April 10, 2024 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants