Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: S3 backend #38

Open
wants to merge 1 commit into
base: v0.4.0
Choose a base branch
from

Conversation

jvkersch
Copy link
Contributor

@jvkersch jvkersch commented May 5, 2020

This is a first cut of an S3 backend for cwl-tes. It works pretty much the same way as the FTP backend: resources get uploaded to S3 and URLs in the workflow are rewritten to point to S3.

Activating the backend happens through the --remote-storage-url parameter: if this is a HTTP/S URL then it's assumed that we're talking to an S3-like system, and the URL should be of the form http(s)://<endpoint>/<bucket>. S3 credentials are retrieved from the environment.

Implementation-wise, the biggest change is in the addition of the s3.py module, which adds another subclass of the StdFsAccess class. The other changes are in the main.py module, where the biggest changes are in dispatching on the kind of remote storage URL, as well as making things reasonably polymorphic, so that the worker functions that do the uploading and parsing don't have to care about the type of storage backend.

I wanted to share this at this point to get some input about design and future directions. I've only tried this with Minio, and while I've tried to leave the original FTP interface intact, I've not yet tested this in practice. I've also paid very little attention to unit tests (but would be glad to remedy this).

Very glad to hear comments and suggestions!

@kellrott
Copy link
Contributor

kellrott commented May 5, 2020

Awesome!
We will try to get this tested and reviewed. This week is bit busy, so it might take a little time.

@jvkersch
Copy link
Contributor Author

jvkersch commented May 6, 2020

Thanks for the quick reply! I would be happy to improve on this, just let me know.

@kellrott
Copy link
Contributor

kellrott commented Jun 6, 2020

Initial testing, got a bytes to str error. Probably because I'm working Python 3.7.7

Traceback (most recent call last):
  File "./cwl-tes", line 11, in <module>
    cwl_tes.main.main(sys.argv[1:])
  File "/Users/ellrott/workspaces/cwl-tes/cwl_tes/main.py", line 192, in main
    make_fs_access = _create_s3_fs_access_factory(parsed_args)
  File "/Users/ellrott/workspaces/cwl-tes/cwl_tes/main.py", line 129, in _create_s3_fs_access_factory
    parsed_args.remote_storage_url)
  File "/Users/ellrott/workspaces/cwl-tes/cwl_tes/s3.py", line 42, in parse_s3_endpoint_url
    url = "s3:/" + parse.path
TypeError: can only concatenate str (not "bytes") to str

@kellrott
Copy link
Contributor

kellrott commented Jun 6, 2020

Also, add minio to requirements.txt

@kellrott kellrott mentioned this pull request Jun 6, 2020
@kellrott kellrott changed the base branch from master to v0.4.0 June 2, 2022 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants