Add asg refresh and info modules #425

danquixote · 2021-02-18T20:17:27Z

SUMMARY

Adding the ec2_asg_instance_refresh and related *_info module. These modules are intended to be used together to start or cancel an EC2 AutoScaling Group (ASG) instance refresh, and then track the subsequent progress with the provided InstanceRefreshId. The *_info module can also be used to get multiple pages of refresh history using the NextToken.

ISSUE TYPE

New Module Pull Request

COMPONENT NAME

ec2_asg_instance_refresh
ec2_asg_instance_refreshes_info

ADDITIONAL INFORMATION

{
           'instance_refresh_id': 'string'
 }


 {
            'instance_refreshes': [
                    {
                        'instance_refresh_id': '6507a3e5-4950-4503-8978-e9f2636efc09',
                        'auto_scaling_group_name': 'ansible-test-hermes-63642726-asg',
                        'status': 'Cancelled',
                        'status_reason': 'Cancelled due to user request.',
                        'start_time': '2021-02-04T03:39:40+00:00',
                        'end_time': '2021-02-04T03:41:18+00:00',
                        'percentage_complete': 0,
                        'instances_to_update': 1
                    }
            ],
            'next_token': 'string'
        }

ansibullbot · 2021-02-18T20:19:14Z

cc @jillr @s-hertel @tremble @wimnat
click here for bot help

pabelanger · 2021-02-18T21:05:32Z

recheck

danquixote · 2021-02-18T22:13:22Z

Thank you, @pabelanger, I'll definitely re-check on my end. However, upon a rebuild, most of the errors I'm seeing are for "max retries" failures on other, existing modules, e.g.:

11:03 TASK [ec2_vpc_egress_igw : test failure with non-existent VPC ID] ************** 11:09 An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ClientError: An error occurred (RequestLimitExceeded) when calling the DescribeEgressOnlyInternetGateways operation (reached max retries: 4): Request limit exceeded. 11:09 fatal: [testhost]: FAILED! => {"boto3_version": "1.15.18", "botocore_version": "1.18.18", "changed": false, "error": {"code": "RequestLimitExceeded", "message": "Request limit exceeded."}, "msg": "Could not get list of existing Egress-Only Internet Gateways: An error occurred (RequestLimitExceeded) when calling the DescribeEgressOnlyInternetGateways operation (reached max retries: 4): Request limit exceeded.", "resource_actions": ["ec2:DescribeEgressOnlyInternetGateways"], "response_metadata": {"http_headers": {"cache-control": "no-cache, no-store", "connection": "close", "date": "Thu, 18 Feb 2021 21:55:15 GMT", "server": "AmazonEC2", "strict-transport-security": "max-age=31536000; includeSubDomains", "transfer-encoding": "chunked", "x-amzn-requestid": "38036d5b-d398-46e5-bc51-e0ee9a2160ae"}, "http_status_code": 503, "max_attempts_reached": true, "request_id": "38036d5b-d398-46e5-bc51-e0ee9a2160ae", "retry_attempts": 4}} 11:09 ...ignoring 11:09 11:09 TASK [ec2_vpc_egress_igw : assert failure with non-existent VPC ID] ************ 11:10 fatal: [testhost]: FAILED! => { 11:10 "assertion": "result.error.code == \"InvalidVpcID.NotFound\"", 11:10 "changed": false, 11:10 "evaluated_to": false, 11:10 "msg": "Assertion failed" 11:10 } 11:10

danquixote · 2021-02-19T01:27:44Z

test /rebuild_failed

tremble · 2021-02-19T16:08:59Z

RequestLimitExceeded means that you're hitting the API limits. We have a decorator which should help with that.
#421 has an example of how to apply the decorator.

Short example:

ec2 = module.client('ec2', retry_decorator=AWSRetry.jittered_backoff(retries=10))
ec2.describe_instances(InstanceIds=['i-123456789'], aws_retry=True)

We've generally found that the default boto3 retry process is insufficient.

If you need to use pagination there's also more information in:
https://docs.ansible.com/ansible/devel/dev_guide/platforms/aws_guidelines.html#api-throttling-rate-limiting-and-pagination

tests/integration/targets/ec2_asg_instance_refresh/aliases

tremble

Thanks for taking the time to write these modules. Some suggestions inline.

Additionally, some of your documentation is a little terse (or possibly include copy&paste artifacts). Please see
https://docs.ansible.com/ansible/latest/dev_guide/developing_modules_documenting.html#documentation-block for more information about what's expected in the documentation blocks.

plugins/modules/ec2_asg_instance_refresh.py

plugins/modules/ec2_asg_instance_refreshes_info.py

plugins/modules/ec2_asg_instance_refresh.py

danquixote · 2021-02-23T06:52:56Z

@tremble Thank you very much for your feedback on these two modules. Apologies, I only saw your comment about the 'retries' issue initially, and only worked with that in my recent rebuilds. I'll take into account all your other suggestions, as well.

---I'm seeing a lot of other folk's builds are failing in intermittent ways, and wasn't sure if it's Shippable running up against their own AWS account's API limits?

Also, in running my own builds, I've noted instances in Shippable where a simple "true/false" assertion task-step would take over five minutes, or when a docker teardown would also cause a build to 'hang' for a while (in addition to the retries-related issues for existing modules).

tremble · 2021-02-23T08:27:18Z

I'm seeing a lot of other folk's builds are failing in intermittent ways, and wasn't sure if it's Shippable running up against their own AWS account's API limits?

When the tests are run in shippable they're running in an AWS account managed by the Ansible team.

ansible-test tries to be clever and only run tests against code that's changed, but when changes are made to module_utils or things like the groups list, this trigges a full CI run. This in turn can trigger around 24 parallel sets of tests and starts bumping up against the AWS account's API limits.

In general we see four types of flake:

API rate limits (the retry decorator helps here)
Race conditions because some of Amazon's APIs will return 'success' before the change finishes propagating (there are 'waiters' in some of the modules to start dealing with this)
Resource limits: a few tests don't clean up consistently and as more tests are added the ordering changes, which can result in different total requirements. Getting these adjusted is just a case of speaking nicely to the Cloud team to get these raised when we spot them
Conflicts in tests, it is (or in some cases, was when the module was written) difficult to name/tag certain resources so that the 6 copies of the same test running in parallel don't try overwriting each other. The worst offender for this are EIPs, but the other gotcha is hardcoding names rather than using '{{ resource_prefix }}'.

Also, in running my own builds, I've noted instances in Shippable where a simple "true/false" assertion task-step would take over five minutes, or when a docker teardown would also cause a build to 'hang' for a while (in addition to the retries-related issues for existing modules).

Yeah, that's just the downside of running in an environment we don't fully control. In a perfect world the CI nodes would have the docker containers pre-downloaded. The Ansible Cloud team has plans to move to a Zuul controlled entirely by the wider Ansible team (see also the big warning at the top of the shippable pages - shippable's being decommissioned)

danquixote · 2021-02-25T07:50:06Z

Thank you for all the details, @tremble . Good to know some of the limitations of the current testing setup. I've been going over your code-review feedback, and am working on getting a passing build.

danquixote · 2021-03-03T07:55:24Z

Hi @tremble. Thank you again for your code-review/feedback. I thought your comments/fixes were pretty straightforward, so I implemented them and then tried to rebuild several times over the course of the last week. However, I'm still getting some unexpected failures which all seem to be related to retries for tests on existing modules, or simply some sections timing out past the 45 minute mark. (e.g.: https://app.shippable.com/github/ansible-collections/community.aws/runs/1708/summary/console).

I'd still love to get these two modules into the collection, but I'm not sure what else I should do on my end to facilitate the process. I don't want to keep "clogging-up" the Shippable builds with my repeat-failures, and hoping one will make it under the 45 minute mark. However, I'm not sure what needs to be done in relation to these new "asg-refresh" modules or their related tests, as all failures seem to be coming from other, existing, modules (unless I'm mistaken?).

…tra blank lines

danquixote · 2021-03-19T07:00:22Z

@tremble My apologies, just coming back to this code after some time...I think I may have also made a mistake in trying to rebuild without replying to each code-change you mentioned in your "change requested". I'd just changed the code itself, but not sure if that was sufficient? Anyhow, I'm still getting some intermittent failures, and wasn't sure if it would be easier to close this PR and open a new, cleaner one. Please let me know what you think would be best.

danquixote · 2021-03-19T20:25:07Z

Hi @tremble. I see my builds are still failing. Not sure if I have a regression. Also, due to my mistakes in the review process, I think it's better if I clean up my code a bit and open a new PR. Thank you again for the review, I will incorporate it into a cleaner, new PR. Hopefully that will pass the build.

…ameters (ansible-collections#425) Add example for amazon.aws.aws_secret with region and aws_profile parameters SUMMARY Added a new example for amazon.aws.aws_secret that includes use of the region and aws_profile parameters. Resolves ansible-collections#416. ISSUE TYPE Docs Pull Request COMPONENT NAME amazon.aws.aws_secret Reviewed-by: Abhijeet Kasurde <None> Reviewed-by: Mark Chappell <None> Reviewed-by: None <None>

danquixote mentioned this pull request Feb 18, 2021

Autoscaling instance refresh API support #135

Closed

ansibullbot added community_review has_issue integration tests/integration module module needs_triage new_contributor Help guide this first time contributor new_module New module new_plugin New plugin plugins plugin (any type) tests tests labels Feb 18, 2021

ansibullbot added needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR and removed community_review labels Feb 19, 2021

danquixote force-pushed the add_asg_refresh_and_info_modules branch from 89cd027 to 467a287 Compare February 19, 2021 05:48

ansibullbot removed the needs_triage label Feb 19, 2021

danquixote force-pushed the add_asg_refresh_and_info_modules branch 2 times, most recently from b37ac83 to 1a7385a Compare February 21, 2021 07:55

tremble reviewed Feb 21, 2021

View reviewed changes

tests/integration/targets/ec2_asg_instance_refresh/aliases Outdated Show resolved Hide resolved

tremble requested changes Feb 21, 2021

View reviewed changes

danquixote force-pushed the add_asg_refresh_and_info_modules branch from 059a6e9 to 1a7385a Compare February 23, 2021 05:07

danquixote force-pushed the add_asg_refresh_and_info_modules branch from dd081bb to 9076255 Compare March 3, 2021 01:34

danquixote requested a review from tremble March 4, 2021 05:00

danquixote force-pushed the add_asg_refresh_and_info_modules branch from b0126fc to 39d0cd9 Compare March 11, 2021 05:59

danquixote added 10 commits March 19, 2021 00:40

Add module and tests for ec2_asg_instance_refreshes_info

bf53d06

Add module and tests for ec2_asg_instance_refresh

b476c83

Update runtime.yml for new EC2 ASG refresh modules and remove some ex…

fe8af71

…tra blank lines

Fixup retries

8e1b44f

Fix aliases

dfe5a7e

Remove unused imports

7ecc3ae

Update strategy description

8ff75a1

Use call to fail_json for module error

9563959

Fix documentation

a1be5fb

Fix doc.s

2703899

danquixote force-pushed the add_asg_refresh_and_info_modules branch from 39d0cd9 to 2703899 Compare March 19, 2021 00:40

danquixote added 2 commits March 19, 2021 01:11

Add no_log to argument spec for next_token

8bc68ff

build after rebase

7bf73e9

test trigger

6fce1c3

danquixote closed this Mar 19, 2021

danquixote mentioned this pull request Mar 19, 2021

Add asg refresh and info modules #487

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add asg refresh and info modules #425

Add asg refresh and info modules #425

danquixote commented Feb 18, 2021 •

edited

Loading

ansibullbot commented Feb 18, 2021

pabelanger commented Feb 18, 2021

danquixote commented Feb 18, 2021

danquixote commented Feb 19, 2021 •

edited

Loading

tremble commented Feb 19, 2021 •

edited

Loading

tremble left a comment

danquixote commented Feb 23, 2021 •

edited

Loading

tremble commented Feb 23, 2021 •

edited

Loading

danquixote commented Feb 25, 2021

danquixote commented Mar 3, 2021

danquixote commented Mar 19, 2021

danquixote commented Mar 19, 2021

Add asg refresh and info modules #425

Add asg refresh and info modules #425

Conversation

danquixote commented Feb 18, 2021 • edited Loading

SUMMARY

ISSUE TYPE

COMPONENT NAME

ADDITIONAL INFORMATION

ansibullbot commented Feb 18, 2021

pabelanger commented Feb 18, 2021

danquixote commented Feb 18, 2021

danquixote commented Feb 19, 2021 • edited Loading

tremble commented Feb 19, 2021 • edited Loading

tremble left a comment

Choose a reason for hiding this comment

danquixote commented Feb 23, 2021 • edited Loading

tremble commented Feb 23, 2021 • edited Loading

danquixote commented Feb 25, 2021

danquixote commented Mar 3, 2021

danquixote commented Mar 19, 2021

danquixote commented Mar 19, 2021

danquixote commented Feb 18, 2021 •

edited

Loading

danquixote commented Feb 19, 2021 •

edited

Loading

tremble commented Feb 19, 2021 •

edited

Loading

danquixote commented Feb 23, 2021 •

edited

Loading

tremble commented Feb 23, 2021 •

edited

Loading