-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include region/partition criteria for test instance selection #933
base: tests_strategy_improvement
Are you sure you want to change the base?
Conversation
f'feature combinations: {self.feature_combinations}' | ||
# mash will try to test all the possible features supported in the | ||
# available test regions for each partition | ||
for partition, test_regions in self.partitions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to do some smoke testing with cn partition in particular. If this will start to cause a lot of unnecessary false positives. The networking in CN region can be troublesome which leads to false job failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow your comment.
With the code as in this PR, all the instance features are tested in every partition, and that could be a problem, as you say, in CN region if tests fail per the unreliable behavior of the partition itself. Furthermore, a exception is raised if some feature combination is NOT possible to be tested in some partition.
Giving it a second thought, I was wondering if it would be better, to avoid too much job false failures, to try to test the features in a best effort
basis (removing this exception raise and just writing a warning log if some feature cannot be tested in some partition) for us to control what is tested or not by means of the test instance catalog config.
That way we'd control, for example, if SEV is tested in CN or not, just removing the feature as supported in the CN test instance catalog... and we'd just a warning in the log stating SEV cannot be tested in CN partition with the current config...
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trouble with only treating failures as warning level is that something could slip through the cracks. Technically the partitions are different reference implementations and thus there's a chance an image fails on one and no on another. Probably the best we can do is to note this concern when we deploy. That way if we notice an uptick in false failures we are aware of the reason.
mash/services/test/ec2_job.py
Outdated
@@ -84,6 +85,13 @@ def run_job(self): | |||
) | |||
return | |||
|
|||
self.partitions = get_partition_test_regions(self.test_regions) | |||
if not self.partitions: | |||
msg = 'At least one partition is required for tests.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really a configuration issue, i.e. the partition
key missing in the config. The message should express that this needs to be fixed in the configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
mash/services/test/ec2_job.py
Outdated
self.cloud_architecture == 'aarch64': | ||
# Skip test aarch64 images in China and GovCloud. | ||
# There are no aarch64 based instance types available. | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer correct, gov-cloud has M6g
and T4g
[1] we should fix this, meaning push aarch64 images to the gov-cloud regions. T4g
is also available in China [2]
[1] https://docs.aws.amazon.com/ec2/latest/instancetypes/ec2-instance-regions.html
[2] https://www.amazonaws.cn/en/ec2/instance-types/
mash/services/test/ec2_job.py
Outdated
# for the partition that can cover a single feat combination | ||
msg = ( | ||
f'No instances in the instance catalog for {partition}' | ||
f' partition can cover the any feature combination: ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/any//
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
This PR includes the partition/region as a criteria for the selection of the instances that will be used for the tests.
The mechanism will try to test all the instance feature combinations possible that can be covered for each partition with the test instance catalog configured. If no test can be executed in some partition (with exceptions as arm64 images in
aws-cn
oraws-us-gov
partitions) an exception is raised (as there's some config issue in the instance catalog).I still left the actual test execution commented. I'll test it and include it in the following PR.
Also updated the provided configuration example with actual values of instances for each region.