Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ability to poll for container stats instead of constant stream #1475

Closed

Conversation

jcbowman
Copy link
Contributor

@jcbowman jcbowman commented Jul 26, 2018

Summary

Allows for container stat polling instead of forcing constant stream. This allows for a significant decrease in CPU usage when a lot of containers are deployed.

Implementation details

There's 2 environment configuration variables that allow you to disable stream (default is to leave streaming turned on) and specify the polling interval (default is 15 seconds).

Testing

This was tested by replacing the ecs-agent container on a ECS host with over 150 containers. We've been using the polling code for the past month in both our prod and nonprod environments.

  • Builds on Linux (make release)
  • Builds on Windows (go build -out amazon-ecs-agent.exe ./agent)
  • Unit tests on Linux (make test) pass
  • Unit tests on Windows (go test -timeout=25s ./agent/...) pass
  • Integration tests on Linux (make run-integ-tests) pass
  • Integration tests on Windows (.\scripts\run-integ-tests.ps1) pass
  • Functional tests on Linux (make run-functional-tests) pass
  • Functional tests on Windows (.\scripts\run-functional-tests.ps1) pass

New tests cover the changes:

Description for the changelog

Feature - Allow for Container Stat Polling

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

fixes #1422

@fenxiong
Copy link
Contributor

Hi @jcbowman ,
Thanks for creating this PR! We will take a look at it. Also, can you change the base branch to dev, since that's where we will merge PR into?

Thanks,
Feng

@fenxiong fenxiong requested a review from a team July 30, 2018 22:15
@jcbowman jcbowman changed the base branch from master to dev July 30, 2018 23:33
@jcbowman jcbowman force-pushed the add-polling-option-for-getting-metrics branch from 0d2a143 to 3205ef6 Compare July 31, 2018 01:46
@jcbowman
Copy link
Contributor Author

jcbowman commented Jul 31, 2018

Thanks for the guidance. I changed the base branch to dev and squashed my changes. For some reason one of the builds started failing after pointing to dev. At a first glance the failing test doesn't appear related to these changes - I'll try to take a deeper look at it soon.

@petderek
Copy link
Contributor

@jcbowman It looks like it was just a transient failure -- so no worries!

@jcbowman jcbowman force-pushed the add-polling-option-for-getting-metrics branch from 3205ef6 to 3163afa Compare July 31, 2018 16:56
@boynux
Copy link
Contributor

boynux commented Aug 7, 2018

Is there any ETA for this change to be merged and rolled out?

Copy link

@richardpen richardpen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I do have few comments.

@@ -271,6 +283,11 @@ func (cfg *Config) validateAndOverrideBounds() error {
cfg.TaskMetadataBurstRate = DefaultTaskMetadataBurstRate
}

if cfg.PollingMetricsWaitDuration < minimumPollingMetricsWaitDuration || cfg.PollingMetricsWaitDuration > maximumPollingMetricsWaitDuration {
seelog.Warnf("Invalid value for polling metrics wait duration, will be overridden with the default value: %s. Parsed value: %v, minimum value: %v, maximum value: %v.", DefaultPollingMetricsWaitDuration.String(), cfg.PollingMetricsWaitDuration, minimumPollingMetricsWaitDuration, maximumPollingMetricsWaitDuration)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you split this into multiple lines? Also, this was only required if PollMetrics is true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated these to only run if PollMetrics is true and split this into two different checks

@jcbowman jcbowman force-pushed the add-polling-option-for-getting-metrics branch from f7728c7 to 1bd8433 Compare August 30, 2018 14:43
@jcbowman jcbowman force-pushed the add-polling-option-for-getting-metrics branch from 1bd8433 to 062d013 Compare August 30, 2018 14:46
@jcbowman
Copy link
Contributor Author

jcbowman commented Aug 30, 2018

@richardpen I think I've addressed the PR comments. I squashed my commits to hopefully make merging easier. Let me know if there is anything else I should address.

I've also deployed a version with the recent changes to one of our ECS servers - CPU is staying low and stats are showing up in CloudWatch. 👍

@adnxn
Copy link
Contributor

adnxn commented Oct 24, 2018

I've also deployed a version with the recent changes to one of our ECS servers - CPU is staying low and stats are showing up in CloudWatch.

Oh nice, this is great to hear.

@jcbowman, @boynux: We'd like to get this PR moving. Please address the branch conflicts and also unit tests appear to be failing. You can validate unit tests locally with make test target.

@yhlee-aws
Copy link
Contributor

closing this in favor of #1646, please track the progress of this change through the new PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants