Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update: Include sample_name IRIDA-Next input column #28

Merged
merged 8 commits into from
Sep 18, 2024

Conversation

sgsutcliffe
Copy link
Collaborator

@sgsutcliffe sgsutcliffe commented Sep 9, 2024

Modified the template for input samplesheet.csv file to include the sample_name column in addition to sample in-line with changes to IRIDA-Next update as seen with the speciesabundance pipeline. What this means is that the output files and the sample name will be changed to sample_name if a sample_name is called. If staramrnf is being locally then the sample_name can be left blank.

Made a few changes:
- sample_name special characters will be replaced with "-"
- If no sample_name is supplied in the column sample will be used
- To avoid repeat values for sample_name all sample_name values will be suffixed with the index of the input samplesheet.csv
- Tests to check that the variety of different sample_names work with the

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@sgsutcliffe
Copy link
Collaborator Author

A big change was updating the assets/samplsheet.csv and tests/test_samplesheet.csv so I have changed the files to point to the local files rather than the URL in the repository. A temporary change until the PR is merged, then will update to the dev branch URL

Copy link
Member

@apetkau apetkau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great @sgsutcliffe . Thanks so much 😄 .

I have a few small comments and I still have to test this out through IRIDA Next, but it looks really good.

@sgsutcliffe
Copy link
Collaborator Author

It has been tested in IRIDA-Next and appears to be working!
image
image

Copy link

@kylacochrane kylacochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job Steven! Thanks for the reminders and suggestions to the sample_name functionality!! 😺 Just a few comments/questions.

ch_input = Channel.fromSamplesheet("input")
.map { meta, contigs ->
// Remove characters from meta.irida_id that could cause issues in processes
meta.irida_id = meta.irida_id.replaceAll(/[;\\#><|]/, '_')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will changing the irida_id (by replacing characters with '_') affect the ability to map the files and metadata back to the correct IRIDA Next sample on the platform? Specifically, I'm concerned about how this might impact the creation of the iridanext.output.json.gz as iridanext.config relies on irida_id.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not the best solution -- as you rightly suggest it could in theory cause some headaches. I wanted to avoid a chance of code injection in my bash command that uses meta.irida_id. In the case of staramrnf because we modify the meta.irida_id before any of the files to be included in the iridanext.output.json.gz it doesn't affect the iridanext.output.json.gz. As to running on IRIDA-Next in theory it shouldn't be a problem because the it seems these characters are not used in the IRIDA-Next ID see here but I will need to confirm this 100% before I am done with this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, echoing that we shouldn't be changing IRIDA IDs as it will cause problems if it (now or later with changes) finds it way back to IRIDA. I'd feel more comfortable putting the restrictions in at the samplesheet level (schema_input.json).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the change here 1d829b3

Copy link
Member

@apetkau apetkau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much @sgsutcliffe for all your great work on this. I just have one requested change (and another optional change related to adding a pattern to the JSON Schema for the IRIDA ID).

Copy link
Member

@apetkau apetkau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks so much for all your work on this Steven 😄

Copy link

@kylacochrane kylacochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me Steven!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants