Include region/partition criteria for test instance selection #933

apozsuse · 2024-12-20T16:45:02Z

This PR includes the partition/region as a criteria for the selection of the instances that will be used for the tests.

The mechanism will try to test all the instance feature combinations possible that can be covered for each partition with the test instance catalog configured. If no test can be executed in some partition (with exceptions as arm64 images in aws-cn or aws-us-gov partitions) an exception is raised (as there's some config issue in the instance catalog).

I still left the actual test execution commented. I'll test it and include it in the following PR.

Also updated the provided configuration example with actual values of instances for each region.

mash/services/test/ec2_job.py

smarlowucf · 2024-12-30T21:15:04Z

mash/services/test/ec2_job.py

-                f'feature combinations: {self.feature_combinations}'
+        # mash will try to test all the possible features supported in the
+        # available test regions for each partition
+        for partition, test_regions in self.partitions:


I think we need to do some smoke testing with cn partition in particular. If this will start to cause a lot of unnecessary false positives. The networking in CN region can be troublesome which leads to false job failures.

Not sure I follow your comment.
With the code as in this PR, all the instance features are tested in every partition, and that could be a problem, as you say, in CN region if tests fail per the unreliable behavior of the partition itself. Furthermore, a exception is raised if some feature combination is NOT possible to be tested in some partition.

Giving it a second thought, I was wondering if it would be better, to avoid too much job false failures, to try to test the features in a best effort basis (removing this exception raise and just writing a warning log if some feature cannot be tested in some partition) for us to control what is tested or not by means of the test instance catalog config.
That way we'd control, for example, if SEV is tested in CN or not, just removing the feature as supported in the CN test instance catalog... and we'd just a warning in the log stating SEV cannot be tested in CN partition with the current config...

wdyt?

The trouble with only treating failures as warning level is that something could slip through the cracks. Technically the partitions are different reference implementations and thus there's a chance an image fails on one and no on another. Probably the best we can do is to note this concern when we deploy. That way if we notice an uptick in false failures we are aware of the reason.

rjschwei · 2025-01-02T20:31:48Z

mash/services/test/ec2_job.py

@@ -84,6 +85,13 @@ def run_job(self):
            )
            return

+        self.partitions = get_partition_test_regions(self.test_regions)
+        if not self.partitions:
+            msg = 'At least one partition is required for tests.'


This is really a configuration issue, i.e. the partition key missing in the config. The message should express that this needs to be fixed in the configuration.

rjschwei · 2025-01-02T20:40:52Z

mash/services/test/ec2_job.py

+                        self.cloud_architecture == 'aarch64':
+                    # Skip test aarch64 images in China and GovCloud.
+                    # There are no aarch64 based instance types available.
+                    continue


This is no longer correct, gov-cloud has M6g and T4g [1] we should fix this, meaning push aarch64 images to the gov-cloud regions. T4g is also available in China [2]

[1] https://docs.aws.amazon.com/ec2/latest/instancetypes/ec2-instance-regions.html
[2] https://www.amazonaws.cn/en/ec2/instance-types/

rjschwei · 2025-01-02T20:41:21Z

mash/services/test/ec2_job.py

+                # for the partition that can cover a single feat combination
+                msg = (
+                    f'No instances in the instance catalog for {partition}'
+                    f' partition can cover the any feature combination: '


Include region/partition criteria for test instance selection

341c563

apozsuse requested review from rjschwei and smarlowucf December 20, 2024 16:45

smarlowucf reviewed Dec 30, 2024

View reviewed changes

rjschwei requested changes Jan 2, 2025

View reviewed changes

Address some PR comments

c938d9b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include region/partition criteria for test instance selection #933

Include region/partition criteria for test instance selection #933

apozsuse commented Dec 20, 2024

smarlowucf Dec 30, 2024

apozsuse Jan 14, 2025

smarlowucf Jan 14, 2025

rjschwei Jan 2, 2025

apozsuse Jan 14, 2025

rjschwei Jan 2, 2025

rjschwei Jan 2, 2025

apozsuse Jan 14, 2025

Include region/partition criteria for test instance selection #933

Are you sure you want to change the base?

Include region/partition criteria for test instance selection #933

Conversation

apozsuse commented Dec 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment