-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 bucket script #624
S3 bucket script #624
Conversation
…into s3_bucket_script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested on my Ubuntu machine, and it's also working there. I'll check with Bryan from TechOps to see if he has any security concerns with running the container like this.
Reviewed 1 of 3 files at r1, all commit messages.
Reviewable status: 1 of 3 files reviewed, 3 unresolved discussions (waiting on @benjaminking)
.devcontainer/Dockerfile
line 26 at r1 (raw file):
curl \ unzip RUN apt-get update
I'm not sure it's necessary to call apt-get update again, since it's already called right before this, unless apt-get needs to be updated only after certain packages have been installed.
.devcontainer/Dockerfile
line 28 at r1 (raw file):
RUN apt-get update RUN apt-get install --no-install-recommends -y \ less \
I don't see these two packages used anywhere. Are they just being added as helpful developer tools? I think they'd be useful, especially since I use nano quite a bit. I think it'd be cleaner to move this to the larger apt-get install section directly above it
.devcontainer/Dockerfile
line 54 at r1 (raw file):
CMD ["bash"] # Set up the S3 bucket RUN apt update
Is this separated out because it needs to use apt instead of apt-get, or could it be added to the longer apt-get install earlier in the dockerfile?
I updated the Dockerfile based on your comments. Firstly, Also, for reasons I don't understand, if you want to install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @benjaminking)
.devcontainer/Dockerfile
line 48 at r2 (raw file):
ENV SIL_NLP_CACHE_PROJECT_DIR=/root/.cache/silnlp/projects # Set up the S3 bucket RUN mkdir -p /silnlp
To be consistent with the readme instructions, which mount at ~/S, it's best to mount it at /root/S. I double checked in my docker container, and the ~ is equivalent to the /root folder.
I've just pushed a commit that changes the mount point to To register the new AppArmor profile, you need to run I believe that Docker will ignore the AppArmor profile on systems that don't support AppArmor (or don't have it installed), so there shouldn't be any impact to non-Linux users. The AppArmor profile is also the same as the default Docker profile, except that it allows the container to perform mounts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @benjaminking)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I will let someone who has a better understanding of these things approve it.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @benjaminking)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benjaminking Sorry this was in my backlog to review for so long. I've tested it and it's working for me. Now that we have an app armor profile, would we like to go ahead and extend this not just for dev containers but for the AQuA/Cheetah servers as well? This might help with avoiding the failures for uploading checkpoints on those servers.
Reviewed 3 of 4 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @benjaminking)
That's ok. It's been on my backlog as well. I think the next step for us is to install the App Armor profile on one of our servers, modify the Docker setup in |
I actually already installed the app armor profile on the AQuA and Cheetah servers about an hour ago to prepare for testing on them, using the command you listed. So it should be all set for any changes you'd like to make to clearml_connection.py. |
Excellent. I will try to test out creating a Docker container using the new profile. |
I was able to successfully create a Docker container with the new AppArmor profile and run a task with it. My next step is to try to automatically connect to the S3 bucket from inside the container. |
It looks like my attempts at connecting to the S3 bucket from ClearML are failing because the (There are workarounds I can probably do, but I wasn't sure what the status was regarding S3 and ClearML) |
@TaperChipmunk32 I think we were still supporting aws for a little while longer, so it might be good to add the section back to rclone.conf? Also I thought that IDX might be using their own aws S3 bucket with SILNLP, and if that's the case it would be good to leave the support in. @benjaminking We are switching over to MinIO next week, and support for it was recently added to master, so it also might make sense to try adapting your changes to MinIO. |
Good point about IDX, if they do still use AWS, we can leave AWS support in. I'll just need to update the warning message and add back AWS to rclone.conf in a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @benjaminking)
@ddaspit I just realized something from our discussions on the |
@benjaminking I can add a commit to switch this PR over to using MinIO and B2 instead of AWS, if you would like me to. |
Yes, please feel free to change it over to using MinIO and Backblaze. |
Support should be in for MinIO now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TaperChipmunk32 That is a good point. I would love to be able to strip out all of the S3 code. We should discuss that option further. It might be the way forward for #685.
Reviewed 2 of 2 files at r5, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @benjaminking)
I am looking again into alternative ways to automatically mount the bucket on the AQuA servers, which is the last major hurdle before we can merge this in. |
…into s3_bucket_script Merge in order to rebase s3_bucket_script branch
I was able to get the AQuA machines to automatically connect to the MinIO bucket on ClearML jobs. We should be able to look into merging this soon, if everyone is comfortable with the way I implemented this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r5, 5 of 5 files at r7, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @benjaminking)
minio_bucket_setup.sh
line 5 at r7 (raw file):
sed -i -e "s#access_key_id = x*#access_key_id = $MINIO_ACCESS_KEY#" ~/.config/rclone/rclone.conf sed -i -e "s#secret_access_key = x*#secret_access_key = $MINIO_SECRET_KEY#" ~/.config/rclone/rclone.conf
Does this script need to have the endpoint variable as well? Also, this script looks very similar to s3_bucket_setup.sh. Do they serve different use cases?
I don't think it should need the endpoint variable. My understanding is that users of the dev container will have a value of I had actually been intending for |
Remove old s3_bucket_setup.sh script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r8, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @benjaminking)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 8 files reviewed, 1 unresolved discussion (waiting on @mshannon-sil)
.devcontainer/Dockerfile
line 48 at r2 (raw file):
Previously, mshannon-sil wrote…
To be consistent with the readme instructions, which mount at ~/S, it's best to mount it at /root/S. I double checked in my docker container, and the ~ is equivalent to the /root folder.
Done. (just trying to wrap up all of the unresolved things on reviewable, so we can merge)
This PR modifies the previous S3 bucket script to support automatic mounting of the S3 bucket in a dev container for both WSL2-based Ubuntu systems and standalone Ubuntu systems. It splits the logic of the previous script into two portions:
s3_bucket_setup.sh
script, which is run as a post-install command)The contents of the S3 bucket should now be accessible at
/silnlp
upon creation of the dev container. The major change required to allow this to run on standalone Ubuntu systems is to run the Docker container with a different AppArmor profile. For simplicity, I configured the container to run in "unconfined" mode, but if more fine-grained security is desired (or if ClearML requires it), we should be able to create a custom AppArmor profile that we distribute in the SILNLP repo.This has been tested on both WSL2-based Ubuntu (in Windows 11) and in standalone Linux Mint, a Ubuntu-based distro.
This change is