MR-MSPOLYGRAPH: A MapReduce Implementation of a Hybrid Spectral Library-Database Search Method for Large-scale Peptide Identification
A MapReduce based implementation called MR-MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPoly-graph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduces the time to solution from weeks to hours, for processing tens of thousands of experimental spectra. Speedup and other related performance studies are also reported on a 400 core Hadoop cluster using spectral data sets from environmental microbial communities as inputs.
Downloads & Instructions
If you want to download the SEQUENTIAL version of the software, then please visit http://omics.pnl.gov/software/MSPolygraph.php
|Source code for MR-MSPolygraph||.zip
|Source code for MR-MSPolygraph.|
|A comprehensive microbial database containing 2.65 million protein sequences.|
|Contains a spectral library for S. oneidensis MR-1 (1,752 peptides) along with its parameter files.|
|Contains 1,000 experimental spectra derived from Synechococcus sp. PCC 7002.|
|Contains other parameter files (*.frq, *.dat) which will be required during runtime.|
|This is file that is supplied as the input argument to polygraph. This file needs to be present in the working directory from where Edit this file before use.|
|Spectral index file||index_1k.dat
|This file should contain the paths to all experimental spectral *.dta files that need to be matched. Edit this file before use.|
|Hadoop run script||script
|This shell script shows an example hadoop command that can be used to run mspolygraph_mr. Edit this file before use.|
Download verification: md5sum checksums for all files above
(currently our web server supports compressed file downloads only in .zip format. To unzip in linux console, please use the unzip command.)
Please cite the following two source papers for this work:
1) A. Kalyanaraman, W.R. Cannon, B. Latt, D.J. Baxter (2011). MapReduce implementation of a hybrid spectral library-database search method for large-scale peptide identification. Bioinformatics, In press, doi: 10.1093/bioinformatics/btr523. Preprint
2) W.R. Cannon, M. Rawlins, D.J. Baxter, S.J. Callister, M.S. Lipton, and D.A. Bryant (2011), Large improvements in MS/MS based peptide identification rates using a hybrid analysis, Journal of Proteome Research, 10(5):2306-2317.
A. Kalyanaraman: < a n a n t h @ e e c s . w s u . e d u >
School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164-2752.
W.R. Cannon: < w i l l i a m . c a n n o n @ p n n l . g o v >
Computational Biology and Bioinformatics Group, Pacific Northwest National Laboratory, Richland WA 99352
Funding from NSF IIS 0916463.