Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Galaxy wrapper for MFAssignR::FindRecalSeries tool #581

Merged
merged 8 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 32 additions & 8 deletions tools/mfassignr/help.xml
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,10 @@ The recommended workflow how to run the MFAssignR package is as follows:
(3) Use IsoFiltR() to identify potential 13C and 34S isotope masses.
(4) Using the S/N threshold, and the two data frames output from IsoFiltR(), run MFAssignCHO() to assign MF with C, H, and O to assess the mass accuracy.
(5) Use RecalList() to generate a list of the potential recalibrant series.
(6) After choosing recalibrant series, use Recal() to recalibrate the mass lists.
(7) Assign MF to the recalibrated mass list using MFAssign().
(8) Check the output plots from MFAssign() to evaluate the quality of the assignments.
(6) Choose the most suitable recalibrant series using FindRecalSeries().
(7) After choosing recalibrant series, use Recal() to recalibrate the mass lists.
(8) Assign MF to the recalibrated mass list using MFAssign().
(9) Check the output plots from MFAssign() to evaluate the quality of the assignments.

For detailed documentation on the individual steps please see the individual tool wrappers.
</token>
Expand All @@ -49,8 +50,8 @@ KMDnoise is a Kendrick Mass Defect (KMD) approach for the noise estimation. It s

Output:

- noise estimate - (this noise level can then be multiplied by the user chosen value (3, 6, 10) in order to set the signal to noise cut for formula assignment.)
- KMD plot - bounds of the noise estimation area are highlighted in red
- noise estimate - this noise level can then be multiplied by the user chosen value (3, 6, 10) in order to set the signal to noise cut for formula assignment.
- KMD plot - bounds of the noise estimation area are highlighted in red.
</token>

<token name="@HISTNOISE_HELP@">
Expand All @@ -64,7 +65,7 @@ HistNoise function creates a histogram using natural log of the intensity, which
Output:

- noise estimate - this noise level can then be multiplied by the user chosen value in order to set the signal to noise cut for formula assignment
- Histogram - shows where the cut is being applied123
- Histogram - shows where the cut is being applied

</token>

Expand Down Expand Up @@ -118,7 +119,7 @@ Output:
MFAssignR - RecalList
=============================

This tool is the fifth step of the MFAssignR workflow (MFAssignCHO -> RecalList -> Recal)
This tool is the fifth step of the MFAssignR workflow (MFAssignCHO -> RecalList -> FindRecalSeries)

RecalList() function identifies the homologous series that could be used for recalibration. On the input, there is the output from MFAssign() or MFAssignCHO() functions. It returns a dataframe that contains the CH2 homologous series that contain more than 3 members.

Expand All @@ -127,11 +128,34 @@ Output:
- Dataframe that contains the CH2 homologous series that contain more than 3 members.
</token>

<token name="@FINDRECALSERIES_HELP@">
MFAssignR - FindRecalSeries
=============================

This tool is the sixth step of the MFAssignR workflow (RecalList -> FindRecalSeries -> Recal)

This function takes on input the CH2 homologous recalibration series, which are provided by the RecalList function and tries to find the most suitable series combination for recalibration based on the following criteria:

(1) Series should cover the full mass spectral range,
(2) Series should be optimally long and combined have a “Tall Peak” at least every 100 m/z,
(3) Abundance score: the higher, the better,
(4) Peak score: the closer to 0, the better,
(5) Peak Distance: the closer to 1, the better,
(6) Series Score: the closer to this value, the better.

Combinations of 5 series are assembled, scores are computed for other metrics (in case of Peak proximity and Peak
distance, an inverted score is computed) and these are summed. Finally, either a series of the size of combination or top 10 unique series having the highest score are outputted.

Output:

- Dataframe of n or 10 most suitable recalibrant series.
</token>

<token name="@RECAL_HELP@">
MFAssignR - Recal
=============================

This tool is the sixth step of the MFAssignR workflow (RecalList -> Recal -> MFAssign)
This tool is the seventh step of the MFAssignR workflow (FindRecalSeries -> Recal -> MFAssign)

Recal() function recalibrates the 'Mono' and 'Iso' outputs from the IsoFiltR() function and prepares a dataframe containing chose recalibrants. Also it outputs a plot for the qualitative assessment of recalibrants. The input to the function is output from MFAssign() or MFAssignCHO().

Expand Down
24 changes: 22 additions & 2 deletions tools/mfassignr/macros.xml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<macros>
<token name="@TOOL_VERSION@">1.0.3</token>
<token name="@TOOL_VERSION@">1.1.1</token>
<xml name="requirements">
<requirements>
<requirement type="package" version="@TOOL_VERSION@">r-mfassignr</requirement>
Expand Down Expand Up @@ -96,6 +96,26 @@
help= "Upper limit of molecular mass to be assigned."/>
</xml>

<xml name="findrecalseries_param">
<param name="input_file" type="data" format="tabular" label="Input data"
help= "Recalibration series, output from RecalList"/>
<param name="global_min" type="float" label="Global min"
help= "A lower bound of the instrument m/z range."/>
<param name="global_max" type="float" label="Global max"
help= "A higher bound of the instrument m/z range."/>
<param name="number_of_combinations" type="integer" label="Number of combinations"
help= "Combinations of how many series should be computed." value="5"/>
<param name="abundance_score_threshold" type="float" label="Abundance score threshold" value="100"
help= "A threshold for filtering abundance score parameter. The series with higher values are better."/>
<param name="peak_distance_threshold" type="float" label="Peak distance threshold" value="2"
help= "A threshold for the peak distance parameter. The closer this value is to 1, the better."/>
<param name="coverage_threshold" type="integer" label="How many % of the m/z range should be covered."
help= "How many % of the m/z range should be covered." value="90"/>
<param name="fill_series" type="boolean" truevalue="TRUE" falsevalue="FALSE" label="Whether to return top 10 unique series (TRUE) or series only from the best combination."
help= "If TRUE, top 10 unique series will be returned, otherwise only the series from the best
combination will be returned." value="false"/>
</xml>

<xml name="recal_param">
<param name="input_file" type="data" format="tabular" label="Input data (Output from MFAssign)"
help= "Input data frame, the output from MFAssign or MFAssignCHO"/>
Expand All @@ -104,7 +124,7 @@
<param name="peaks" type="data" format="tabular" label="Peaks dataframe (Mono from IsoFiltR)"
help= "Peaks data frame, the Mono output from IsoFiltR"/>
<param name="isopeaks" type="data" format="tabular" label="Isopeaks dataframe (Iso from IsoFiltR)"
optional="true" help= "Isopeaks data frame, the Mono output from IsoFiltR"/>
help= "Isopeaks data frame, the Mono output from IsoFiltR"/>
<expand macro="ionmode_param" />
<expand macro="noise_threshold_params" />
<param name="mzRange" type="float" label="Mass windows used for the segmented recalibration" value="30"
Expand Down
64 changes: 64 additions & 0 deletions tools/mfassignr/mfassignr_findRecalSeries.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
<tool id="mfassignr_findRecalSeries" name="MFAssignR FindRecalSeries" version="@TOOL_VERSION@+galaxy0" profile="23.0">
<description>Selects most suitable series for recalibration</description>
<macros>
<import>macros.xml</import>
<import>help.xml</import>
</macros>
<edam_topics>
<edam_topic>topic_3172</edam_topic>
</edam_topics>
<edam_operations>
<edam_operation>operation_3627</edam_operation>
</edam_operations>
<expand macro="creator" />
<expand macro="refs" />

<expand macro="requirements" />
<command detect_errors="exit_code"><![CDATA[
Rscript -e 'packageVersion("MFAssignR")' &&
Rscript '${mfassignr_findrecalseries}'
]]>
</command>
<configfiles>
<configfile name="mfassignr_findrecalseries"><![CDATA[
library(dplyr)
df <- read.table("$input_file", header=TRUE, sep="\t")
result <- MFAssignR::FindRecalSeries(
df = df,
global_min = $global_min,
global_max = $global_max,
number_of_combinations = $number_of_combinations,
abundance_score_threshold = $abundance_score_threshold,
peak_distance_threshold = $peak_distance_threshold,
coverage_threshold = $coverage_threshold,
fill_series = $fill_series
) |> dplyr::rename(Series = series)

write.table(result, file="$final_series", sep="\t", row.names=FALSE)
]]></configfile>
</configfiles>
<inputs>
<expand macro="findrecalseries_param"/>
</inputs>
<outputs>
<data name="final_series" format="tabular" label="Final recalibration series"/>
</outputs>
<tests>
<test>
<param name="input_file" value="recallist/recal_series.tabular"/>
<param name="global_min" value="100"/>
<param name="global_max" value="500"/>
<param name="abundance_score_threshold" value="50"/>
<param name="number_of_combinations" value="3"/>
<param name="coverage_threshold" value="50"/>
<param name="fill_series" value="TRUE"/>
<output name="final_series" file="findrecalseries/selected_series.tabular"/>
</test>
</tests>
<help>
@FINDRECALSERIES_HELP@

@GENERAL_HELP@
</help>
<expand macro="citations"/>
</tool>
3 changes: 2 additions & 1 deletion tools/mfassignr/mfassignr_isofiltr.xml
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
<tool id="mfassignr_isofiltr" name="MFAssignR IsoFiltR" version="@TOOL_VERSION@+galaxy1" profile="23.0">
<tool id="mfassignr_isofiltr" name="MFAssignR IsoFiltR" version="@TOOL_VERSION@+galaxy0" profile="23.0">
<description>Separates likely isotopic masses from monoisotopic masses in a mass list</description>
<macros>
<import>macros.xml</import>
<import>help.xml</import>
</macros>
<edam_topics>
<edam_topic>topic_3172</edam_topic>
Expand Down
4 changes: 2 additions & 2 deletions tools/mfassignr/mfassignr_recal.xml
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,8 @@
<assert_contents>
<has_size size="91080" delta="200"/>
</assert_contents>
</output>
</test>
</output>
</test>
</tests>
<help><![CDATA[
@RECAL_HELP@
Expand Down
2 changes: 1 addition & 1 deletion tools/mfassignr/mfassignr_recallist.xml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
<data name="recal_series" format="tabular" label="Recalibration series by ${tool.name} on ${on_string}"/>
</outputs>
<tests>
<test>
<test>
<param name="input_file" value="mfassigncho/unambig.tabular" />
<output name="recal_series" file="recallist/recal_series.tabular" lines_diff="100"/>
</test>
Expand Down
2 changes: 1 addition & 1 deletion tools/mfassignr/mfassignr_snplot.xml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<tool id="mfassignr_snplot" name="MFAssignR SNplot" version="@TOOL_VERSION@+galaxy1" profile="23.0">
<tool id="mfassignr_snplot" name="MFAssignR SNplot" version="@TOOL_VERSION@+galaxy0" profile="23.0">
<description>Noise level assessment using the SNplot function.</description>
<macros>
<import>macros.xml</import>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"Series" "total_abundance" "total_series_length" "peak_proximity" "peak_distance_proximity" "series_id" "sum_score"
"O_H_7" 437.136255030871 504.562 129.612788237483 2723.59808058946 "O_H_7 O2_H_6 O2_H_11" 3794.90912385781
"O2_H_6" 437.136255030871 504.562 129.612788237483 2723.59808058946 "O_H_7 O2_H_6 O2_H_11" 3794.90912385781
"O2_H_11" 437.136255030871 504.562 129.612788237483 2723.59808058946 "O_H_7 O2_H_6 O2_H_11" 3794.90912385781
"O4_H_11" 943.304144088114 392.438 134.36084248065 1826.47532994759 "O2_H_6 O2_H_11 O4_H_11" 3296.57831651636
"O3_H_12" 330.037060987448 364.407 135.12153276257 1826.47538570915 "O2_H_6 O2_H_11 O3_H_12" 2656.04097945917
Loading
Loading