-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix s3 output #73
Fix s3 output #73
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic fix, this will make the workflow much easier. Is it worth updating the readme, so that it details how to run like this(~/nextflow run APHA-CSU/btb-seq -with-docker aphacsubot/btb-seq -r Fixs3Output --reads='s3://s3-csu-001/SB4030/M02410_5271/*_{S*_R1,S*_R2}*.fastq.gz' --outdir='s3://s3-staging-area/RichardEllis/'
).
I'm actually getting an error in
Looks like the error coming in on lines 17/18/19. Perhaps something to do with running this inside docker rather than natively? |
I think the error is due to a python/git issue and is mentioned here: gitpython-developers/GitPython#1016 (comment). I have a new branch which uses the Nextflow workflow property to capture the commit ID rather than the python script. There will be a PR request for that soon |
Ah ok, good to have a handle on that. Perhaps we shouldn't merge this in until that bug is fixed? Probably best to only having working code in the |
Use nextflow workflow property for commitID
Now added the fix for commit ID as well. I think that the readme likely needs a bigger overhaul so will save changes for a future PR. |
bin/combineCsv.py
Outdated
|
||
date_out = date.today().strftime('%d%b%y') | ||
user = getpass.getuser() | ||
scriptpath = os.path.dirname(os.path.abspath(__file__)) | ||
repo = git.Repo(scriptpath, search_parent_directories=True) | ||
commit = repo.head.object.__str__() | ||
#commit = repo.head.object.__str__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the commented line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. If works I'm happy with it. Perhaps just remove the commented line (L19).
Good spot - I had forgotten to remove it - don't like to delete anything until its fully tested |
There has been an issue where if the input data directory was set as an s3 URI the final
CombineOutput
process would fail, meaning that the results table was never generated. This is linked to the issue I raised with nextflow here: nextflow-io/nextflow#2502 (comment).After a bit if digging and lots of testing, the issue was caused by the source data parent directory being recognised as an s3 object when collected as a parameter
params.DataDir
in line 78 of bTB-WGS_process.nf. The process couldn't be initiated with this for some reason, but could if this was a local path. I still think this is a nextflow bug, but have fixed the issue in our pipeline by defining a variable fromparams.DataDir
(turning what was perceived as a path into a string) which can then be used as an alternate input for theCombineOutput
process.This now fixes things so that an EC2 instance can run the pipeline without the need for separate copying of input files to local storage, or the output files back to s3. We can now run on a naïve instance with a simple command (tested on my ranch machine):
~/nextflow run APHA-CSU/btb-seq -with-docker aphacsubot/btb-seq -r Fixs3Output --reads='s3://s3-csu-001/SB4030/M02410_5271/*_{S*_R1,S*_R2}*.fastq.gz' --outdir='s3://s3-staging-area/RichardEllis/'
The only requirements are to have nextflow and docker installed. Nextflow will pull the repo (and I have defined the specific branch with
-r
) and use the docker image specified. I think this should simplify the reprocess and hopefully make batch implementation much simpler. This is now much closer to nextflow's ideal modus operandi