The acronym stands for Mass Spectrometry File Toolbox
. This is a library for access to some common mass-spectrometry/proteomics data formats from Java:
- mzML
- mzXML
- pepXML/pep.xml
- protXML/prot.xml
- mzIdentML
- cef (Agilent)
- GPMdb XML
This library is what drives BatMass.
Please cite the following paper if you used MSFTBX or BatMass in your work:
Avtonomov D.M. et al: J. Proteome Res. June 16, 2016. DOI: 10.1021/acs.jproteome.6b00021
Latest version on Maven Central
<dependency>
<groupId>com.github.chhh</groupId>
<artifactId>msftbx</artifactId>
<version>1.8.8</version>
</dependency>
- To get started, follow the tutorial: http://www.batmass.org/tutorial/data-access-layer/#parsing-lc-ms-data-mzml-mzxml-files
- Check out a fully working example repo: https://github.com/chhh/msftbx-examples
- The exmaple compiles and runs with a single command, only requires java to be installed, nothing else.
- Parsers for mzML/mzXML with unified API
- Very fast, multi-threaded
- Rich standardized API for contents of those files (scan and run meta-info, not just spectra).
- msNumpress compression support for mzML
- Automated LC/MS run structure determination:
- Data structures for parent-child relationship between spectra
- Indexes for scans based on scan numbers, retention times both globally and for each MS level separately
- Convenient methods to get next-previous scans at the same MS level
- Tolerant to malformed data
- Can handle MS2 scan tags nested inside MS1 scans
- Tolerant to missing or broken file index
- Reindexing on the fly
- Memory management
- Automated spectra parsing on demand
- You can parse just the structure of an LC/MS run without the spectral data, the memory footprint in this case will be very small. Only when spectra are requested will they be parsed.
- Soft referencing of spectral data for GC
- Tracking of which loaded data is not being used by any components with automated unloading.
- Automated spectra parsing on demand
- Upcoming support for Thermo RAW files on Windows
- pepXML parser and writer
- protXML parser and writer
- mzIdentML parser
- GPMdb XML files parser
- Agilent .cef files parser
Get pre-built jars from Maven Central.
cd ./MSFileToolbox && mvn clean package
Will produce the jar files with just the library msftbx-X.X.X.jar
as well as one large jar msftbx-X.X.X-jar-with-dependencies.jar
.
The latter can be used as is, it includes all the needed dependencies.
NetBeans Module: Open the root directory in NetBeans as a project. You will see MSFTBX
module suite which consists of 3 modules: MSFileToolbox Module - (this is the main thing), MSFileToolbox Libx - these are the depencies, and Auto Update (MSFTBX) - this is the update center for NetBeans Platform projects (you definitely don't need this) .
- SLF4J
- Google Guava
- Apache Commons Pool 2
- OboParser from Biojava's submodule Ontology
- Javolution Core (slightly modified, sources are here, this modified dependency is published on Maven Central)
When dealing with mzIdentML files (.mzid) you will encounter AbstractParamType
.
In the definition of mzIdentML both cvParam
and userParam
inherit from it
and both cvParam
and userParam
can be stored in the same list. Thus, when
you get such a list, you'll need to cast manually to the concrete type like so:
List<AbstractParamType> paramGroup = blabla.getParamGroup();
for (AbstractParamType param : paramGroup) {
if (param instanceof CVParamType) {
CVParamType p = (CVParamType)param;
// do something with cvParam
} else if (param instanceof UserParamType) {
UserParamType p = (UserParamType)param;
// do something with userParam
}
}
- Make MSFTBX Java 9 compatible. JAXB dependencies included.
- Incompatible change to previous versions. PepXml, ProtXml, MzIdentMl parsers now use Doubles instead of Floats everywhere. Any old code using old Float properties might break now.