-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update: Include sample_name
IRIDA-Next input column
#28
Conversation
A big change was updating the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really great @sgsutcliffe . Thanks so much 😄 .
I have a few small comments and I still have to test this out through IRIDA Next, but it looks really good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job Steven! Thanks for the reminders and suggestions to the sample_name
functionality!! 😺 Just a few comments/questions.
workflows/staramr.nf
Outdated
ch_input = Channel.fromSamplesheet("input") | ||
.map { meta, contigs -> | ||
// Remove characters from meta.irida_id that could cause issues in processes | ||
meta.irida_id = meta.irida_id.replaceAll(/[;\\#><|]/, '_') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will changing the irida_id
(by replacing characters with '_') affect the ability to map the files and metadata back to the correct IRIDA Next sample on the platform? Specifically, I'm concerned about how this might impact the creation of the iridanext.output.json.gz
as iridanext.config relies on irida_id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably not the best solution -- as you rightly suggest it could in theory cause some headaches. I wanted to avoid a chance of code injection in my bash command that uses meta.irida_id
. In the case of staramrnf
because we modify the meta.irida_id
before any of the files to be included in the iridanext.output.json.gz
it doesn't affect the iridanext.output.json.gz
. As to running on IRIDA-Next in theory it shouldn't be a problem because the it seems these characters are not used in the IRIDA-Next ID see here but I will need to confirm this 100% before I am done with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, echoing that we shouldn't be changing IRIDA IDs as it will cause problems if it (now or later with changes) finds it way back to IRIDA. I'd feel more comfortable putting the restrictions in at the samplesheet level (schema_input.json
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted the change here 1d829b3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much @sgsutcliffe for all your great work on this. I just have one requested change (and another optional change related to adding a pattern to the JSON Schema for the IRIDA ID).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Thanks so much for all your work on this Steven 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me Steven!
Modified the template for input samplesheet.csv file to include the
sample_name
column in addition tosample
in-line with changes to IRIDA-Next update as seen with the speciesabundance pipeline. What this means is that the output files and thesample
name will be changed tosample_name
if asample_name
is called. Ifstaramrnf
is being locally then thesample_name
can be left blank.Made a few changes:
-
sample_name
special characters will be replaced with"-"
- If no
sample_name
is supplied in the columnsample
will be used- To avoid repeat values for
sample_name
allsample_name
values will be suffixed with the index of theinput
samplesheet.csv- Tests to check that the variety of different
sample_names
work with thePR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).