Skip to content

Commit

Permalink
Merge pull request #487 from /issues/486
Browse files Browse the repository at this point in the history
fixes #486 - 9.0.0 release
  • Loading branch information
jantman authored Sep 22, 2020
2 parents a6eef26 + 93680d5 commit e11bd0a
Show file tree
Hide file tree
Showing 18 changed files with 873 additions and 123 deletions.
25 changes: 25 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,31 @@
Changelog
=========

.. _changelog.9_0_0:

9.0.0 (2020-09-22)
------------------

**Important:** This release requires new IAM permissions: ``sts:GetCallerIdentity`` and ``cloudwatch:GetMetricData``

**Important:** This release includes updates for major changes to ECS limits, which includes the renaming of some existing limits.

* `Issue #477 <https://github.com/jantman/awslimitchecker/issues/477>`__ - EC2 instances running on Dedicated Hosts (tenancy "host") or single-tenant hardware (tenancy "dedicated") do not count towards On-Demand Instances limits. They were previously being counted towards these limits; they are now excluded from the count. Thanks to `pritam2277 <https://github.com/pritam2277>`__ for reporting this issue and providing details and test data.
* `Issue #477 <https://github.com/jantman/awslimitchecker/issues/477>`__ - For all VPC resources that support the ``owner-id`` filter, supply that filter when describing them, set to the current account ID. This will prevent shared resources from other accounts from being counted against the limits. Thanks to `pritam2277 <https://github.com/pritam2277>`__ for reporting this issue and providing details and test data.
* `Issue #475 <https://github.com/jantman/awslimitchecker/issues/475>`__ - When an Alert Provider is used, only exit non-zero if an exception is encountered. Exit zero even if there are warnings and/or criticals. Thanks to `varuzam <https://github.com/varuzam>`__ for this feature request.
* `Issue #467 <https://github.com/jantman/awslimitchecker/issues/467>`__ - Fix the Service Quotas quota name for VPC "NAT Gateways per AZ" limit. Thanks to `xRokco <https://github.com/xRokco>`__ for reporting this issue, as well as the required fix.
* `Issue #457 <https://github.com/jantman/awslimitchecker/issues/457>`__ - In the required IAM permissions, replace ``support:*`` with the specific permissions that we need.
* `Issue #463 <https://github.com/jantman/awslimitchecker/issues/463>`__ - Updates for the major changes to ECS limits `in August 2020 <https://github.com/awsdocs/amazon-ecs-developer-guide/commit/3ba9bc24b3f667557f43a49b9001fea3538311ad#diff-d98743b56c4036e0baeb5e15901d2a73>`__. Thanks to `vincentclee <https://github.com/vincentclee>`__ for reporting this issue.

* The ``EC2 Tasks per Service (desired count)`` limit has been replaced with ``Tasks per service``, which measures the desired count of tasks of all launch types (EC2 or Fargate). The default value of this limit has increased from 1000 to 2000.
* The default of ``Clusters`` has increased from 2,000 to 10,000.
* The default of ``Services per Cluster`` has increased from 1,000 to 2,000.
* The ``Fargate Tasks`` limit has been removed.
* The ``Fargate On-Demand resource count`` limit has been added, with a default quota value of 500. This limit measures the number of ECS tasks and EKS pods running concurrently on Fargate. The current usage for this metric is obtained from CloudWatch.
* The ``Fargate Spot resource count`` limit has been added, with a default quota value of 500. This limit measures the number of ECS tasks running concurrently on Fargate Spot. The current usage for this metric is obtained from CloudWatch.

* Add internal helper method to :py:class:`~._AwsService` to get Service Quotas usage information from CloudWatch.

.. _changelog.8_1_0:

8.1.0 (2020-09-18)
Expand Down
8 changes: 7 additions & 1 deletion awslimitchecker/checker.py
Original file line number Diff line number Diff line change
Expand Up @@ -648,8 +648,14 @@ def get_required_iam_policy(self):
:rtype: dict
"""
required_actions = [
'cloudwatch:GetMetricData',
'servicequotas:ListServiceQuotas',
'support:*',
'support:DescribeTrustedAdvisorCheckRefreshStatuses',
'support:DescribeTrustedAdvisorCheckResult',
'support:DescribeTrustedAdvisorCheckSummaries',
'support:DescribeTrustedAdvisorChecks',
'support:RefreshTrustedAdvisorCheck',
'sts:GetCallerIdentity',
'trustedadvisor:Describe*',
'trustedadvisor:RefreshCheck'
]
Expand Down
2 changes: 2 additions & 0 deletions awslimitchecker/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -540,6 +540,8 @@ def console_entry_point(self):
)
else:
alerter.on_success(duration=time.time() - start_time)
# with alert provider, always exit zero
raise SystemExit(0)
raise SystemExit(res)


Expand Down
101 changes: 101 additions & 0 deletions awslimitchecker/services/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@

import abc
import logging
import boto3
from datetime import datetime, timedelta
from awslimitchecker.connectable import Connectable

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -90,6 +92,28 @@ def __init__(self, warning_threshold, critical_threshold,
self.limits = {}
self.limits = self.get_limits()
self._have_usage = False
self._current_account_id = None
self._cloudwatch_client = None

@property
def current_account_id(self):
"""
Return the numeric Account ID for the account that we are currently
running against.
:return: current account ID
:rtype: str
"""
if self._current_account_id is not None:
return self._current_account_id
kwargs = dict(self._boto3_connection_kwargs)
sts = boto3.client('sts', **kwargs)
logger.info(
"Connected to STS in region %s", sts._client_config.region_name
)
cid = sts.get_caller_identity()
self._current_account_id = cid['Account']
return cid['Account']

@abc.abstractmethod
def find_usage(self):
Expand Down Expand Up @@ -278,3 +302,80 @@ def _update_service_quotas(self):
)
if val is not None:
lim._set_quotas_limit(val)

def _cloudwatch_connection(self):
"""
Return a connected CloudWatch client instance. ONLY to be used by
:py:meth:`_get_cloudwatch_usage_latest`.
"""
if self._cloudwatch_client is not None:
return self._cloudwatch_client
kwargs = dict(self._boto3_connection_kwargs)
if self._max_retries_config is not None:
kwargs['config'] = self._max_retries_config
self._cloudwatch_client = boto3.client('cloudwatch', **kwargs)
logger.info(
"Connected to cloudwatch in region %s",
self._cloudwatch_client._client_config.region_name
)
return self._cloudwatch_client

def _get_cloudwatch_usage_latest(
self, dimensions, metric_name='ResourceCount', period=60
):
"""
Given some metric dimensions, return the value of the latest data point
for the ``AWS/Usage`` metric specified.
:param dimensions: list of dicts; dimensions for the metric
:type dimensions: list
:param metric_name: AWS/Usage metric name to get
:type metric_name: str
:param period: metric period
:type period: int
:return: return the metric value (float or int), or None if it cannot
be retrieved
:rtype: ``float, int or None``
"""
conn = self._cloudwatch_connection()
kwargs = dict(
MetricDataQueries=[
{
'Id': 'id',
'MetricStat': {
'Metric': {
'Namespace': 'AWS/Usage',
'MetricName': metric_name,
'Dimensions': dimensions
},
'Period': period,
'Stat': 'Average'
}
}
],
StartTime=datetime.utcnow() - timedelta(hours=1),
EndTime=datetime.utcnow(),
ScanBy='TimestampDescending',
MaxDatapoints=1
)
try:
logger.debug('Querying CloudWatch GetMetricData: %s', kwargs)
resp = conn.get_metric_data(**kwargs)
except Exception as ex:
logger.error(
'Error querying CloudWatch GetMetricData for AWS/Usage %s: %s',
metric_name, ex
)
return 0
results = resp.get('MetricDataResults', [])
if len(results) < 1 or len(results[0]['Values']) < 1:
logger.warning(
'No data points found for AWS/Usage metric %s with dimensions '
'%s; using value of zero!', metric_name, dimensions
)
return 0
logger.debug(
'CloudWatch metric query returned value of %s with timestamp %s',
results[0]['Values'][0], results[0]['Timestamps'][0]
)
return results[0]['Values'][0]
12 changes: 12 additions & 0 deletions awslimitchecker/services/ec2.py
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,12 @@ def _instance_usage(self):
logger.info("Spot instance found (%s); skipping from "
"Running On-Demand Instances count", inst.id)
continue
if inst.placement.get('Tenancy', 'default') != 'default':
logger.info(
'Skipping instance %s with Tenancy %s',
inst.id, inst.placement['Tenancy']
)
continue
if inst.state['Name'] in ['stopped', 'terminated']:
logger.debug("Ignoring instance %s in state %s", inst.id,
inst.state['Name'])
Expand Down Expand Up @@ -347,6 +353,12 @@ def _instance_usage_vcpu(self, ris):
logger.info("Spot instance found (%s); skipping from "
"Running On-Demand Instances count", inst.id)
continue
if inst.placement.get('Tenancy', 'default') != 'default':
logger.info(
'Skipping instance %s with Tenancy %s',
inst.id, inst.placement['Tenancy']
)
continue
if inst.state['Name'] in ['stopped', 'terminated']:
logger.debug("Ignoring instance %s in state %s", inst.id,
inst.state['Name'])
Expand Down
72 changes: 46 additions & 26 deletions awslimitchecker/services/ecs.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,16 +71,43 @@ def find_usage(self):
for lim in self.limits.values():
lim._reset_usage()
self._find_usage_clusters()
self._find_usage_fargate()
self._have_usage = True
logger.debug("Done checking usage.")

def _find_usage_fargate(self):
"""
Find the usage for Fargate, via CloudWatch.
"""
self.limits['Fargate On-Demand resource count']._add_current_usage(
self._get_cloudwatch_usage_latest(
[
{'Name': 'Type', 'Value': 'Resource'},
{'Name': 'Resource', 'Value': 'OnDemand'},
{'Name': 'Service', 'Value': 'Fargate'},
{'Name': 'Class', 'Value': 'None'},
],
),
aws_type='AWS::ECS::TaskDefinition'
)
self.limits['Fargate Spot resource count']._add_current_usage(
self._get_cloudwatch_usage_latest(
[
{'Name': 'Type', 'Value': 'Resource'},
{'Name': 'Resource', 'Value': 'Spot'},
{'Name': 'Service', 'Value': 'Fargate'},
{'Name': 'Class', 'Value': 'None'},
],
),
aws_type='AWS::ECS::TaskDefinition'
)

def _find_usage_clusters(self):
"""
Find the ECS service usage for clusters. Calls
:py:meth:`~._find_usage_one_cluster` for each cluster.
"""
count = 0
fargate_task_count = 0
paginator = self.conn.get_paginator('list_clusters')
for page in paginator.paginate():
for cluster_arn in page['clusterArns']:
Expand All @@ -101,21 +128,7 @@ def _find_usage_clusters(self):
aws_type='AWS::ECS::Service',
resource_id=cluster['clusterName']
)
# Note: 'statistics' is not always present in API responses,
# even if requested. As far as I can tell, it's omitted if
# a cluster has no Fargate tasks.
for stat in cluster.get('statistics', []):
if stat['name'] != 'runningFargateTasksCount':
continue
logger.debug(
'Found %s Fargate tasks in cluster %s',
stat['value'], cluster_arn
)
fargate_task_count += int(stat['value'])
self._find_usage_one_cluster(cluster['clusterName'])
self.limits['Fargate Tasks']._add_current_usage(
fargate_task_count, aws_type='AWS::ECS::Task'
)
self.limits['Clusters']._add_current_usage(
count, aws_type='AWS::ECS::Cluster'
)
Expand All @@ -127,7 +140,7 @@ def _find_usage_one_cluster(self, cluster_name):
:param cluster_name: name of the cluster to find usage for
:type cluster_name: str
"""
tps_lim = self.limits['EC2 Tasks per Service (desired count)']
tps_lim = self.limits['Tasks per service']
paginator = self.conn.get_paginator('list_services')
for page in paginator.paginate(
cluster=cluster_name, launchType='EC2'
Expand All @@ -136,8 +149,6 @@ def _find_usage_one_cluster(self, cluster_name):
svc = self.conn.describe_services(
cluster=cluster_name, services=[svc_arn]
)['services'][0]
if svc['launchType'] != 'EC2':
continue
tps_lim._add_current_usage(
svc['desiredCount'],
aws_type='AWS::ECS::Service',
Expand All @@ -160,7 +171,7 @@ def get_limits(self):
limits['Clusters'] = AwsLimit(
'Clusters',
self,
2000,
10000,
self.warning_threshold,
self.critical_threshold,
limit_type='AWS::ECS::Cluster',
Expand All @@ -176,29 +187,38 @@ def get_limits(self):
limits['Services per Cluster'] = AwsLimit(
'Services per Cluster',
self,
1000,
2000,
self.warning_threshold,
self.critical_threshold,
limit_type='AWS::ECS::Service'
)
limits['EC2 Tasks per Service (desired count)'] = AwsLimit(
'EC2 Tasks per Service (desired count)',
limits['Tasks per service'] = AwsLimit(
'Tasks per service',
self,
1000,
2000,
self.warning_threshold,
self.critical_threshold,
limit_type='AWS::ECS::TaskDefinition',
limit_subtype='EC2'
)
limits['Fargate Tasks'] = AwsLimit(
'Fargate Tasks',
limits['Fargate On-Demand resource count'] = AwsLimit(
'Fargate On-Demand resource count',
self,
50,
500,
self.warning_threshold,
self.critical_threshold,
limit_type='AWS::ECS::TaskDefinition',
limit_subtype='Fargate'
)
limits['Fargate Spot resource count'] = AwsLimit(
'Fargate Spot resource count',
self,
500,
self.warning_threshold,
self.critical_threshold,
limit_type='AWS::ECS::TaskDefinition',
limit_subtype='FargateSpot'
)
self.limits = limits
return limits

Expand Down
Loading

0 comments on commit e11bd0a

Please sign in to comment.