Email updates

Keep up to date with the latest news and content from Proteome Science and BioMed Central.

This article is part of the supplement: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2011: Proteome Science

Open Access Proceedings

An unsupervised machine learning method for assessing quality of tandem mass spectra

Wenjun Lin1, Jianxin Wang2, Wen-Jun Zhang13 and Fang-Xiang Wu13*

Author Affiliations

1 Division of Biomedical Engineering, University of Saskatchewan, 57 Campus Dr., Saskatoon, S7N 5A9, Canada

2 School of Information Science and Engineering, Central South University, Changsha, P.R.China

3 Department of Mechanical Engineering, University of Saskatchewan, 57 Campus Dr., Saskatoon, S7N 5A9, Canada

For all author emails, please log on.

Proteome Science 2012, 10(Suppl 1):S12  doi:10.1186/1477-5956-10-S1-S12

Published: 21 June 2012

Abstract

Background

In a single proteomic project, tandem mass spectrometers can produce hundreds of millions of tandem mass spectra. However, majority of tandem mass spectra are of poor quality, it wastes time to search them for peptides. Therefore, the quality assessment (before database search) is very useful in the pipeline of protein identification via tandem mass spectra, especially on the reduction of searching time and the decrease of false identifications. Most existing methods for quality assessment are supervised machine learning methods based on a number of features which describe the quality of tandem mass spectra. These methods need the training datasets with knowing the quality of all spectra, which are usually unavailable for the new datasets.

Results

This study proposes an unsupervised machine learning method for quality assessment of tandem mass spectra without any training dataset. This proposed method estimates the conditional probabilities of spectra being high quality from the quality assessments based on individual features. The probabilities are estimated through a constraint optimization problem. An efficient algorithm is developed to solve the constraint optimization problem and is proved to be convergent. Experimental results on two datasets illustrate that if we search only tandem spectra with the high quality determined by the proposed method, we can save about 56 % and 62% of database searching time while losing only a small amount of high-quality spectra.

Conclusions

Results indicate that the proposed method has a good performance for the quality assessment of tandem mass spectra and the way we estimate the conditional probabilities is effective.