Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use upstream ecs-agent types for deserializing API responses #75

Merged
merged 1 commit into from
Oct 12, 2024

Conversation

isker
Copy link
Contributor

@isker isker commented Oct 5, 2024

This project rolls its own types for deserializing ECS task metadata and container stats responses. Maintaining these types can be tedious, as the documentation of these API endpoints is underspecified, in the sense that properties included in API responses are broadly available in the documentation, but their precise types (including whether properties are optional) are sometimes not. If we roll our own types, we are stuck reverse-engineering the specifics of the ECS API responses, which is made more tedious by the fact that these can differ between EC2 and Fargate.

Good news: rolling our own types is not necessary. The ECS Agent is open source and written in Go. We can depend on it as a library, purely to get at the structs that it uses for API responses. We were already doing a similar thing for docker stats; this is just more comprehensive. (The ECS Agent project itself depends on the same docker library we were using for these stats, and includes them in its API responses.)

Switching to these ECS Agent types has revealed situations in which our types were incorrect, silently relying on implicit type conversions of JSON values in Go's JSON deserializer. One of these situations was resulting in actually invalid data being served: ECS tasks on EC2 need not specify task-level resource limits at all, such that these properties are optional on the JSON response, but our types for them were not pointers, such that we were incorrectly reporting derived metrics as zero, when they really should not exist at all.

The downsides of doing this:

  • The ECS Agent project has no detectable Go module version. I am not an expert on this, but I think it's related to a single repo containing multiple Go modules. I don't think this is a big deal, as the existing docker dependency already did not have a Go module version.
  • We have to upgrade to go 1.21, as the ECS agent project declares that it requires it in its go.mod.
  • Binary size grows, about 12MB -> 18MB on my laptop. I am surprised that simply switching to use these structs blew up binary size so much, but I don't think binary size is taken very seriously in the Go ecosystem anyway, so I don't think this is a big problem.

I think these downsides are worth it in that, going forward, we can more reliably develop metrics derived from ECS Agent API responses, because we're using types that are much more likely to be correct.

I plan to add more metrics, and improve existing ones, using these types in the future.

I have validated these changes by recording output on EC2 and Fargate before and after this change here:
https://github.com/isker/ecs-exporter-cdk/tree/master/experiments/use-official-types. The resulting diffs are as expected.

@isker isker force-pushed the use-official-types branch 2 times, most recently from 6e72692 to 6c70c24 Compare October 5, 2024 20:44
.circleci/config.yml Outdated Show resolved Hide resolved
Copy link
Contributor

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I'm happy that there is a better, officially supported, way of parsing the responses/types.

A minor nit on the Go version changes.

.github/workflows/golangci-lint.yml Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
@isker isker force-pushed the use-official-types branch from 6c70c24 to 1cbfbab Compare October 11, 2024 22:02
This project rolls its own types for deserializing ECS task metadata and
container stats responses. Maintaining these types can be tedious, as
the documentation of these API endpoints is underspecified, in the sense
that properties included in API responses are broadly available in the
documentation, but their precise types (including whether properties are
optional) are sometimes not. If we roll our own types, we are stuck
reverse-engineering the specifics of the ECS API responses, which is
made more tedious by the fact that these can differ between EC2 and
Fargate.

Good news: rolling our own types is not necessary. The ECS Agent is open
source and written in Go. We can depend on it as a library, purely to
get at the structs that it uses for API responses. We were already doing
a similar thing for docker stats; this is just more comprehensive. (The
ECS Agent project itself depends on the same docker library we were
using for these stats, and includes them in its API responses.)

Switching to these ECS Agent types has revealed situations in which our
types were incorrect, silently relying on implicit type conversions of
JSON values in Go's JSON deserializer. One of these situations was
resulting in actually invalid data being served: ECS tasks on EC2 need
not specify task-level resource limits at all, such that these
properties are optional on the JSON response, but our types for them
were not pointers, such that we were incorrectly reporting derived
metrics as zero, when they really should not exist at all.

The downsides of doing this:
- The ECS Agent project has no detectable Go module version. I am not an
expert on this, but I think it's related to a single repo containing
multiple Go modules. I don't think this is a big deal, as the existing
docker dependency already did not have a Go module version.
- We have to upgrade to go 1.21, as the ECS agent project declares that
it requires it in its go.mod.
- Binary size grows, about 12MB -> 18MB on my laptop. I am surprised
that simply switching to use these structs blew up binary size so much,
but I don't think binary size is taken very seriously in the Go
ecosystem anyway, so I don't think this is a big problem.

I think these downsides are worth it in that, going forward, we can
more reliably develop metrics derived from ECS Agent API responses,
because we're using types that are much more likely to be correct.

I plan to add more metrics, and improve existing ones, using these types
in the future.

I have validated these changes by recording output on EC2 and Fargate
before and after this change here:
https://github.com/isker/ecs-exporter-cdk/tree/master/experiments/use-official-types.
The resulting diffs are as expected.

Signed-off-by: Ian Kerins <git@isk.haus>
@isker isker force-pushed the use-official-types branch from 1cbfbab to 1b6f039 Compare October 11, 2024 22:04
@isker
Copy link
Contributor Author

isker commented Oct 12, 2024

Hmm, not sure what to do about that lint failure.

@SuperQ
Copy link
Contributor

SuperQ commented Oct 12, 2024

I'm going to ignore the lint error, since this will get fixed with the next sync of the lint config.

@SuperQ SuperQ merged commit cfe0be6 into prometheus-community:main Oct 12, 2024
3 of 4 checks passed
SuperQ added a commit that referenced this pull request Oct 13, 2024
* [CHANGE] Use upstream ecs-agent types for deserializing API responses #75
* [CHANGE] Update exporter boilerplate #77
* [ENHANCEMENT] Add additional metrics #53

Signed-off-by: SuperQ <superq@gmail.com>
@SuperQ SuperQ mentioned this pull request Oct 13, 2024
SuperQ added a commit that referenced this pull request Oct 13, 2024
* [CHANGE] Use upstream ecs-agent types for deserializing API responses #75
* [CHANGE] Update exporter boilerplate #77
* [ENHANCEMENT] Add additional metrics #53

Signed-off-by: SuperQ <superq@gmail.com>
@isker isker deleted the use-official-types branch October 17, 2024 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants