-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from hgb-bin-proteomics/develop
v1.0.0 release
- Loading branch information
Showing
7 changed files
with
667 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions | ||
# Reference workflow provided by (c) GitHub | ||
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions | ||
|
||
name: msannika_fdr | ||
|
||
on: | ||
push: | ||
branches: [ master ] | ||
pull_request: | ||
branches: [ master ] | ||
|
||
jobs: | ||
build: | ||
|
||
runs-on: ubuntu-latest | ||
strategy: | ||
matrix: | ||
python-version: ['3.7', '3.8', '3.9', '3.10', '3.11', '3.12'] | ||
|
||
steps: | ||
- uses: actions/checkout@v3 | ||
- name: Set up Python ${{ matrix.python-version }} | ||
uses: actions/setup-python@v3 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
- name: Copy scripts and data to "/tests" | ||
run: | | ||
cp msannika_fdr.py tests | ||
cp data/DSSO_Crosslinks.xlsx . | ||
cp data/DSSO_CSMs.xlsx . | ||
- name: Install dependencies | ||
run: | | ||
python -m pip install --upgrade pip | ||
python -m pip install flake8 pytest | ||
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi | ||
- name: Lint with flake8 | ||
run: | | ||
# stop the build if there are Python syntax errors or undefined names | ||
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics | ||
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide | ||
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics | ||
- name: Test with pytest | ||
run: | | ||
pytest tests/tests.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,190 @@ | ||
# MSAnnika_FDR | ||
![workflow_state](https://github.com/hgb-bin-proteomics/MSAnnika_FDR/workflows/msannika_fdr/badge.svg) | ||
|
||
# MS Annika FDR | ||
|
||
A script and functions to group and validate [MS Annika](https://github.com/hgb-bin-proteomics/MSAnnika) | ||
results. The main use case would be for re-validating results after filtering or | ||
merging results from different MS Annika runs. | ||
|
||
## Usage | ||
|
||
- Install python 3.7+: [https://www.python.org/downloads/](https://www.python.org/downloads/) | ||
- Install requirements: `pip install -r requirements.txt` | ||
- Export MS Annika results from Proteome Discoverer to Microsoft Excel format. | ||
- Run `python msannika_fdr.py filename.xlsx -fdr 0.01` (see below for more examples). | ||
- The script may take a few minutes, depending on the number of CSMs/crosslinks to process. | ||
- Done! | ||
|
||
## Examples | ||
|
||
`msannika_fdr.py` takes one positional and one optional argument. The first | ||
argument always has to be the filename(s) of the MS Annika result file(s). You | ||
may specify any number of result files, keep in mind however that | ||
`msannika_fdr.py` will process these files seperately, if you want to merge | ||
several result files, check out [MS Annika Combine Results](https://github.com/hgb-bin-proteomics/MSAnnika_Combine_Results). | ||
For demonstration purposes we will use the files supplied in the `/data` folder: | ||
- `DSSO_Crosslinks.xlsx` contains unvalidated crosslinks from an MS Annika | ||
search. | ||
- `DSSO_CSMs.xlsx` contains unvalidated CSMs from an MS Annika search. | ||
|
||
The following is a valid `msannika_fdr.py` call: | ||
|
||
```bash | ||
python msannika_fdr.py DSSO_Crosslinks.xlsx | ||
``` | ||
|
||
This will not do anything because no FDR was given. You should see in the output | ||
that the script skipped the file. However, doing the same with a CSM file | ||
results in a different output: | ||
|
||
```bash | ||
python msannika_fdr.py DSSO_CSMs.xlsx | ||
``` | ||
|
||
This will group the CSMs by sequence and position to crosslinks and you should | ||
see a file `DSSO_CSMs_crosslinks.xlsx` generated. | ||
|
||
If you suppy the optional argument `-fdr` or `--false_discovery_rate` and the | ||
desired FDR as a floating point number, the results will be validated: | ||
|
||
```bash | ||
python msannika_fdr.py DSSO_Crosslinks.xlsx -fdr 0.01 | ||
``` | ||
|
||
This will validate the input crosslinks for estimated 1% FDR and will generate a | ||
a file called `DSSO_Crosslinks_validated.xlsx` containing only crosslinks above | ||
the estimated 1% FDR threshold. Note that the following command will produce the | ||
same output (FDR values >= 1 will automatically be divided by 100): | ||
|
||
```bash | ||
python msannika_fdr.py DSSO_Crosslinks.xlsx -fdr 1 | ||
``` | ||
|
||
Validating a CSMs file works the same way: | ||
|
||
```bash | ||
python msannika_fdr.py DSSO_CSMs.xlsx -fdr 0.01 | ||
``` | ||
|
||
This will will validate the input CSMs for estimated 1% FDR and will generate a | ||
a file `DSSO_CSMs_validated.xlsx` containing only CSMs above the estimated 1% | ||
FDR threshold. Furthermore, it will group the input CSMs to crosslinks and | ||
output them to the file `DSSO_CSMs_crosslinks.xlsx` and then validate those | ||
crosslinks for 1% estimated FDR and store the result in | ||
`DSSO_CSMs_crosslinks_validated.xlsx`. | ||
|
||
You can also supply several files to the script like this: | ||
|
||
```bash | ||
python msannika_fdr.py DSSO_CSMs.xlsx DSSO_Crosslinks.xlsx -fdr 0.01 | ||
``` | ||
|
||
This will process the input files seperately and sequentially and produce the | ||
files as mentioned above: | ||
- `DSSO_Crosslinks_validated.xlsx` | ||
- `DSSO_CSMs_validated.xlsx` | ||
- `DSSO_CSMs_crosslinks.xlsx` | ||
- `DSSO_CSMs_crosslinks_validated.xlsx` | ||
|
||
## Parameters | ||
|
||
```python | ||
""" | ||
DESCRIPTION: | ||
A script to group and validate results from MS Annika searches. | ||
USAGE: | ||
msannika_fdr.py f [f ...] | ||
[-fdr FDR][--false_discovery_rate FDR] | ||
[-h][--help] | ||
[--version] | ||
positional arguments: | ||
f MS Annika result files in Microsoft Excel format (.xlsx) | ||
to process. | ||
optional arguments: | ||
-fdr FDR, --false_discovery_rate FDR | ||
False discovery rate to validate results for. Supports | ||
both percentage input (e.g. 1) or fraction input (e.g. | ||
0.01). By default not set and the input results will | ||
just be grouped to crosslinks (if CSMs as input) or | ||
nothing will be done (if crosslinks as input). | ||
Default: None | ||
-h, --help show this help message and exit | ||
--version show program's version number and exit | ||
""" | ||
``` | ||
|
||
## Function Documentation | ||
|
||
If you want to integrate the MS Annika FDR calculation into your own scripts, | ||
you can import the following functions as given: | ||
|
||
```python | ||
import pandas as pd | ||
|
||
crosslinks = pd.read_excel("DSSO_Crosslinks.xlsx") | ||
csms = pd.read_excel("DSSO_CSMs.xlsx") | ||
|
||
# Grouping CSMs to crosslinks | ||
from msannika_fdr import MSAnnika_CSM_Grouper | ||
Crosslinks_grouped_from_CSMs = MSAnnika_CSM_Grouper.group(csms) | ||
|
||
# The function signature of MSAnnika_CSM_Grouper.group is: | ||
def group(data: pd.DataFrame) -> pd.DataFrame: | ||
"""code omitted""" | ||
return | ||
|
||
# Validating CSMs for 0.01 FDR | ||
from msannika_fdr import MSAnnika_CSM_Validator | ||
Validated_CSMs = MSAnnika_CSM_Validator.validate(csms, 0.01) | ||
|
||
# The function signature of MSAnnika_CSM_Validator.validate is: | ||
def validate(data: pd.DataFrame, fdr: float) -> pd.DataFrame: | ||
"""code omitted""" | ||
return | ||
|
||
# Validating Crosslinks for 0.01 FDR | ||
from msannika_fdr import MSAnnika_Crosslink_Validator | ||
Validated_Crosslinks = MSAnnika_Crosslink_Validator.validate(crosslinks, 0.01) | ||
|
||
# The function signature of MSAnnika_Crosslink_Validator.validate is: | ||
def validate(data: pd.DataFrame, fdr: float) -> pd.DataFrame: | ||
"""code omitted""" | ||
return | ||
``` | ||
|
||
## Known Issues | ||
|
||
[List of known issues](https://github.com/hgb-bin-proteomics/MSAnnika_FDR/issues) | ||
|
||
## Citing | ||
|
||
If you are using the MS Annika FDR script please cite: | ||
``` | ||
MS Annika 2.0 Identifies Cross-Linked Peptides in MS2–MS3-Based Workflows at High Sensitivity and Specificity | ||
Micha J. Birklbauer, Manuel Matzinger, Fränze Müller, Karl Mechtler, and Viktoria Dorfer | ||
Journal of Proteome Research 2023 22 (9), 3009-3021 | ||
DOI: 10.1021/acs.jproteome.3c00325 | ||
``` | ||
|
||
If you are using MS Annika please cite: | ||
``` | ||
MS Annika 2.0 Identifies Cross-Linked Peptides in MS2–MS3-Based Workflows at High Sensitivity and Specificity | ||
Micha J. Birklbauer, Manuel Matzinger, Fränze Müller, Karl Mechtler, and Viktoria Dorfer | ||
Journal of Proteome Research 2023 22 (9), 3009-3021 | ||
DOI: 10.1021/acs.jproteome.3c00325 | ||
``` | ||
or | ||
``` | ||
MS Annika: A New Cross-Linking Search Engine | ||
Georg J. Pirklbauer, Christian E. Stieger, Manuel Matzinger, Stephan Winkler, Karl Mechtler, and Viktoria Dorfer | ||
Journal of Proteome Research 2021 20 (5), 2560-2569 | ||
DOI: 10.1021/acs.jproteome.0c01000 | ||
``` | ||
|
||
## License | ||
|
||
- [MIT](https://github.com/hgb-bin-proteomics/MSAnnika_FDR/blob/master/LICENSE) | ||
|
||
## Contact | ||
|
||
- [micha.birklbauer@fh-hagenberg.at](mailto:micha.birklbauer@fh-hagenberg.at) |
Binary file not shown.
Binary file not shown.
Oops, something went wrong.