Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support for Security Groups in FSX Lustre Settings #5818

Open
sheppard-brandon-bah opened this issue Nov 3, 2023 · 13 comments

Comments

@sheppard-brandon-bah
Copy link

sheppard-brandon-bah commented Nov 3, 2023

Required Info:

  • AWS ParallelCluster version 3.7.2

I am blocked from using the Parallel Cluster CLI in enterprise-provided AWS accounts and in most of the government-provided AWS accounts I have encountered due to the creation of the FSX Cluster Security Group and Security Group Rules that are part of the FSX Lustre Cluster deployment. Due to security restrictions in place on these types of accounts, account-level IAM users are blocked from creating their our own Security Groups and Rules. Instead, a set of standard groups/rules are provided for each account.

In order to enable use of the Parallel Cluster CLI within enterprise and government AWS environments, can we please have the option to configure existing Security Groups to be used in the FSX Lustre Cluster deployment over forcing the creation of a new group with no option to add one?

An example config snippet below...

  SharedStorage:
    MountDir: /workspace
    Name: fsx-workspace
    StorageType: FsxLustre
    FsxLustreSettings:
      StorageCapacity: 1200
      ImportedFileChunkSize: 1024
      ExportPath: s3://pcluster-training-workspace
      ImportPath: s3://pcluster-training-workspace
      SecurityGroupIds: 
         - sg-xxxxx
         - sg-xxxxx
@sheppard-brandon-bah
Copy link
Author

I will add that I have attempted to add the updates myself but there appears to be some magic behind validating the config options that is esacping me at the moment...
{ "configurationValidationErrors": [ { "level": "ERROR", "type": "ConfigSchemaValidator", "message": "[('SharedStorage', {0: {'FsxLustreSettings': {'SecurityGroupId': ['Unknown field.']}}})]" } ], "message": "Invalid cluster configuration." }
Would also take help here if that is quickest path to getting this feature in place.

@sheppard-brandon-bah
Copy link
Author

I will add that I have attempted to add the updates myself but there appears to be some magic behind validating the config options that is esacping me at the moment... { "configurationValidationErrors": [ { "level": "ERROR", "type": "ConfigSchemaValidator", "message": "[('SharedStorage', {0: {'FsxLustreSettings': {'SecurityGroupId': ['Unknown field.']}}})]" } ], "message": "Invalid cluster configuration." } Would also take help here if that is quickest path to getting this feature in place.

Figured this out, there is a conversion that happens from CamelCase to all lowercase with underscores. Moving on and will report back on further issues.

@sheppard-brandon-bah
Copy link
Author

Gotten a little further. Was not able to stop the creation of the SG even when removing the reference to the function that creates them. Not sure what I am missing but maybe someone can please help me better understand so I can finish this up.

In this image, I have hard-coded the security group passed in through the config in the fsx cloud formation template vars.
cluster_stack py_w_hardcoded_sg

But in the resulting deployment, the rendered template still has a ref to the Security Group that gets auto-created with the deployment.
cfn_tmplate_w_ref_to_sg_to_be_created

Any thoughts on how/why this is happening or what I am missing is appreciated.

@sheppard-brandon-bah
Copy link
Author

It was some kind of local caching issue in my virtual environment. After deleting and recreating, was able to get a new error that I am looking into.

@sheppard-brandon-bah
Copy link
Author

I was able to add the needed functionality. Will open a PR soon to get this added into a release if accepted.

@sheppard-brandon-bah
Copy link
Author

Reopening this issue to track the PR with code updates for FSX security groups.

@sheppard-brandon-bah
Copy link
Author

I have successfully tested this code in CLI version 3.7.2. When testing these updates against the develop code, I get this error...
image
Can you please advise as to how I can test this code change against the develop branch. Thank you!

@enrico-usai
Copy link
Contributor

HI @sheppard-brandon-bah , you're facing the "Cannot find official ParallelCluster AMI" message because we didn't release yet ParallelCluster 3.8.0 (develop) AMIs, so the CLI is not able to find them.

To test a cluster creation with develop code you first need to create a 3.8.0 ami with pcluster build-image command and then use it in the cluster configuration as CustomAmi.

Anyway, before investing more time on it, give us the time to review your patch and understand if it's possible to include it within the product. Thanks for your inputs.

@sheppard-brandon-bah
Copy link
Author

I expected that to be the response and was hoping you all would take a look without me needing to go down that path, so thank you for doing just that and look forward to hearing back. :)

@hanwen-pcluste
Copy link
Contributor

Hello Brandon,

While we keep looking at the patch, an easy way to unblock you is to use external FSx file systems instead of Pcluster managed FSx file systems. You can create a FSx file system before creating the cluster. Then use FileSystemId in the cluster configuration file.

Thank you,
Hanwen

@sheppard-brandon-bah
Copy link
Author

Hello Hanwen, I really appreciate the suggested workaround. I will keep that in my back pocket in case it is needed at some point.

I don't think I was very clear, so my fault, but to be clear, we are not blocked currently. We have included the patched code in the source for 3.7.2, which has AMIs available for us to use. The customized PC CLI is working as expected and we are able to spin up HPC clusters with FSX filesystems using existing SGs for the filesytem, on demand as needed.

But still, would be nice to see the patch merged so we do not have to maintain a custom fork of the repo to get the functionality we need from Parallel Cluster CLI.

Thank you!

@hanwen-pcluste
Copy link
Contributor

Understood! In our team, any changes to CLI parameters have to go through design review. So it will take some time before we include your PR.

Thank you for the contribution,
Hanwen

@sheppard-brandon-bah
Copy link
Author

Appreciate the heads up and happy to chip in where I can. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants