Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add asg refresh and info modules #425

Conversation

danquixote
Copy link

@danquixote danquixote commented Feb 18, 2021

SUMMARY

Adding the ec2_asg_instance_refresh and related *_info module. These modules are intended to be used together to start or cancel an EC2 AutoScaling Group (ASG) instance refresh, and then track the subsequent progress with the provided InstanceRefreshId. The *_info module can also be used to get multiple pages of refresh history using the NextToken.

ISSUE TYPE
  • New Module Pull Request
COMPONENT NAME

ec2_asg_instance_refresh
ec2_asg_instance_refreshes_info

ADDITIONAL INFORMATION
{
           'instance_refresh_id': 'string'
 }


 {
            'instance_refreshes': [
                    {
                        'instance_refresh_id': '6507a3e5-4950-4503-8978-e9f2636efc09',
                        'auto_scaling_group_name': 'ansible-test-hermes-63642726-asg',
                        'status': 'Cancelled',
                        'status_reason': 'Cancelled due to user request.',
                        'start_time': '2021-02-04T03:39:40+00:00',
                        'end_time': '2021-02-04T03:41:18+00:00',
                        'percentage_complete': 0,
                        'instances_to_update': 1
                    }
            ],
            'next_token': 'string'
        }

@ansibullbot
Copy link

@ansibullbot ansibullbot added community_review has_issue integration tests/integration module module needs_triage new_contributor Help guide this first time contributor new_module New module new_plugin New plugin plugins plugin (any type) tests tests labels Feb 18, 2021
@pabelanger
Copy link
Contributor

recheck

@danquixote
Copy link
Author

Thank you, @pabelanger, I'll definitely re-check on my end. However, upon a rebuild, most of the errors I'm seeing are for "max retries" failures on other, existing modules, e.g.:

11:03 TASK [ec2_vpc_egress_igw : test failure with non-existent VPC ID] ************** 11:09 An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ClientError: An error occurred (RequestLimitExceeded) when calling the DescribeEgressOnlyInternetGateways operation (reached max retries: 4): Request limit exceeded. 11:09 fatal: [testhost]: FAILED! => {"boto3_version": "1.15.18", "botocore_version": "1.18.18", "changed": false, "error": {"code": "RequestLimitExceeded", "message": "Request limit exceeded."}, "msg": "Could not get list of existing Egress-Only Internet Gateways: An error occurred (RequestLimitExceeded) when calling the DescribeEgressOnlyInternetGateways operation (reached max retries: 4): Request limit exceeded.", "resource_actions": ["ec2:DescribeEgressOnlyInternetGateways"], "response_metadata": {"http_headers": {"cache-control": "no-cache, no-store", "connection": "close", "date": "Thu, 18 Feb 2021 21:55:15 GMT", "server": "AmazonEC2", "strict-transport-security": "max-age=31536000; includeSubDomains", "transfer-encoding": "chunked", "x-amzn-requestid": "38036d5b-d398-46e5-bc51-e0ee9a2160ae"}, "http_status_code": 503, "max_attempts_reached": true, "request_id": "38036d5b-d398-46e5-bc51-e0ee9a2160ae", "retry_attempts": 4}} 11:09 ...ignoring 11:09 11:09 TASK [ec2_vpc_egress_igw : assert failure with non-existent VPC ID] ************ 11:10 fatal: [testhost]: FAILED! => { 11:10 "assertion": "result.error.code == \"InvalidVpcID.NotFound\"", 11:10 "changed": false, 11:10 "evaluated_to": false, 11:10 "msg": "Assertion failed" 11:10 } 11:10

@danquixote
Copy link
Author

danquixote commented Feb 19, 2021

test /rebuild_failed

@ansibullbot ansibullbot added needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR and removed community_review labels Feb 19, 2021
@danquixote danquixote force-pushed the add_asg_refresh_and_info_modules branch from 89cd027 to 467a287 Compare February 19, 2021 05:48
@tremble
Copy link
Contributor

tremble commented Feb 19, 2021

RequestLimitExceeded means that you're hitting the API limits. We have a decorator which should help with that.
#421 has an example of how to apply the decorator.

Short example:

ec2 = module.client('ec2', retry_decorator=AWSRetry.jittered_backoff(retries=10))
ec2.describe_instances(InstanceIds=['i-123456789'], aws_retry=True)

We've generally found that the default boto3 retry process is insufficient.

If you need to use pagination there's also more information in:
https://docs.ansible.com/ansible/devel/dev_guide/platforms/aws_guidelines.html#api-throttling-rate-limiting-and-pagination

@danquixote danquixote force-pushed the add_asg_refresh_and_info_modules branch 2 times, most recently from b37ac83 to 1a7385a Compare February 21, 2021 07:55
Copy link
Contributor

@tremble tremble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to write these modules. Some suggestions inline.

Additionally, some of your documentation is a little terse (or possibly include copy&paste artifacts). Please see
https://docs.ansible.com/ansible/latest/dev_guide/developing_modules_documenting.html#documentation-block for more information about what's expected in the documentation blocks.

@danquixote danquixote force-pushed the add_asg_refresh_and_info_modules branch from 059a6e9 to 1a7385a Compare February 23, 2021 05:07
@danquixote
Copy link
Author

danquixote commented Feb 23, 2021

@tremble Thank you very much for your feedback on these two modules. Apologies, I only saw your comment about the 'retries' issue initially, and only worked with that in my recent rebuilds. I'll take into account all your other suggestions, as well.

---I'm seeing a lot of other folk's builds are failing in intermittent ways, and wasn't sure if it's Shippable running up against their own AWS account's API limits?

Also, in running my own builds, I've noted instances in Shippable where a simple "true/false" assertion task-step would take over five minutes, or when a docker teardown would also cause a build to 'hang' for a while (in addition to the retries-related issues for existing modules).

@tremble
Copy link
Contributor

tremble commented Feb 23, 2021

I'm seeing a lot of other folk's builds are failing in intermittent ways, and wasn't sure if it's Shippable running up against their own AWS account's API limits?

When the tests are run in shippable they're running in an AWS account managed by the Ansible team.

ansible-test tries to be clever and only run tests against code that's changed, but when changes are made to module_utils or things like the groups list, this trigges a full CI run. This in turn can trigger around 24 parallel sets of tests and starts bumping up against the AWS account's API limits.

In general we see four types of flake:

  1. API rate limits (the retry decorator helps here)
  2. Race conditions because some of Amazon's APIs will return 'success' before the change finishes propagating (there are 'waiters' in some of the modules to start dealing with this)
  3. Resource limits: a few tests don't clean up consistently and as more tests are added the ordering changes, which can result in different total requirements. Getting these adjusted is just a case of speaking nicely to the Cloud team to get these raised when we spot them
  4. Conflicts in tests, it is (or in some cases, was when the module was written) difficult to name/tag certain resources so that the 6 copies of the same test running in parallel don't try overwriting each other. The worst offender for this are EIPs, but the other gotcha is hardcoding names rather than using '{{ resource_prefix }}'.

Also, in running my own builds, I've noted instances in Shippable where a simple "true/false" assertion task-step would take over five minutes, or when a docker teardown would also cause a build to 'hang' for a while (in addition to the retries-related issues for existing modules).

Yeah, that's just the downside of running in an environment we don't fully control. In a perfect world the CI nodes would have the docker containers pre-downloaded. The Ansible Cloud team has plans to move to a Zuul controlled entirely by the wider Ansible team (see also the big warning at the top of the shippable pages - shippable's being decommissioned)

@danquixote
Copy link
Author

Thank you for all the details, @tremble . Good to know some of the limitations of the current testing setup. I've been going over your code-review feedback, and am working on getting a passing build.

@danquixote danquixote force-pushed the add_asg_refresh_and_info_modules branch from dd081bb to 9076255 Compare March 3, 2021 01:34
@danquixote
Copy link
Author

Hi @tremble. Thank you again for your code-review/feedback. I thought your comments/fixes were pretty straightforward, so I implemented them and then tried to rebuild several times over the course of the last week. However, I'm still getting some unexpected failures which all seem to be related to retries for tests on existing modules, or simply some sections timing out past the 45 minute mark. (e.g.: https://app.shippable.com/github/ansible-collections/community.aws/runs/1708/summary/console).

I'd still love to get these two modules into the collection, but I'm not sure what else I should do on my end to facilitate the process. I don't want to keep "clogging-up" the Shippable builds with my repeat-failures, and hoping one will make it under the 45 minute mark. However, I'm not sure what needs to be done in relation to these new "asg-refresh" modules or their related tests, as all failures seem to be coming from other, existing, modules (unless I'm mistaken?).

@danquixote danquixote requested a review from tremble March 4, 2021 05:00
@danquixote danquixote force-pushed the add_asg_refresh_and_info_modules branch from b0126fc to 39d0cd9 Compare March 11, 2021 05:59
@danquixote danquixote force-pushed the add_asg_refresh_and_info_modules branch from 39d0cd9 to 2703899 Compare March 19, 2021 00:40
@danquixote
Copy link
Author

@tremble My apologies, just coming back to this code after some time...I think I may have also made a mistake in trying to rebuild without replying to each code-change you mentioned in your "change requested". I'd just changed the code itself, but not sure if that was sufficient? Anyhow, I'm still getting some intermittent failures, and wasn't sure if it would be easier to close this PR and open a new, cleaner one. Please let me know what you think would be best.

@danquixote
Copy link
Author

Hi @tremble. I see my builds are still failing. Not sure if I have a regression. Also, due to my mistakes in the review process, I think it's better if I clean up my code a bit and open a new PR. Thank you again for the review, I will incorporate it into a cleaner, new PR. Hopefully that will pass the build.

@danquixote danquixote closed this Mar 19, 2021
alinabuzachis pushed a commit to alinabuzachis/community.aws that referenced this pull request May 25, 2022
…ameters (ansible-collections#425)

Add example for amazon.aws.aws_secret with region and aws_profile parameters

SUMMARY

Added a new example for amazon.aws.aws_secret that includes use of the region and aws_profile parameters.
Resolves ansible-collections#416.

ISSUE TYPE


Docs Pull Request

COMPONENT NAME

amazon.aws.aws_secret

Reviewed-by: Abhijeet Kasurde <None>
Reviewed-by: Mark Chappell <None>
Reviewed-by: None <None>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has_issue integration tests/integration module module needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR new_contributor Help guide this first time contributor new_module New module new_plugin New plugin plugins plugin (any type) tests tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants