Proteome Science
|
Viewing options:Associated material:Related literature:- Articles citing this article
- Other articles by authors
- Related articles/pages
Tools:Post to:
|
ResearchClassification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasetsWouter Wegdam1* , Perry D Moerland2* , Marrije R Buist1 , Emiel Ver Loren van Themaat2 , Boris Bleijlevens3 , Huub CJ Hoefsloot4 , Chris G de Koster3,4 and Johannes MFG Aerts3  1
Department of Gynaecologic Oncology, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands 2
Bioinformatics Laboratory, Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands 3
Clinical Proteomics Group, Department of Medical Biochemistry, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands 4
Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, the Netherlands author email corresponding author email* Contributed equally
Proteome Science 2009,
7:19doi:10.1186/1477-5956-7-19 Abstract
Background
Mass spectrometry is increasingly being used to discover proteins or protein profiles associated with disease. Experimental design of mass-spectrometry studies has come under close scrutiny and the importance of strict protocols for sample collection is now understood. However, the question of how best to process the large quantities of data generated is still unanswered. Main challenges for the analysis are the choice of proper pre-processing and classification methods. While these two issues have been investigated in isolation, we propose to use the classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods.
Results
Two in-house generated clinical SELDI-TOF MS datasets are used in this study as an example of high throughput mass-spectrometry data. We perform a systematic comparison of two commonly used pre-processing methods as implemented in Ciphergen ProteinChip Software and in the Cromwell package. With respect to reproducibility, Ciphergen and Cromwell pre-processing are largely comparable. We find that the overlap between peaks detected by either Ciphergen ProteinChip Software or Cromwell is large. This is especially the case for the more stringent peak detection settings. Moreover, similarity of the estimated intensities between matched peaks is high.
We evaluate the pre-processing methods using five different classification methods. Classification is done in a double cross-validation protocol using repeated random sampling to obtain an unbiased estimate of classification accuracy. No pre-processing method significantly outperforms the other for all peak detection settings evaluated.
Conclusion
We use classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. Both pre-processing methods lead to similar classification results on an ovarian cancer and a Gaucher disease dataset. However, the settings for pre-processing parameters lead to large differences in classification accuracy and are therefore of crucial importance. We advocate the evaluation over a range of parameter settings when comparing pre-processing methods. Our analysis also demonstrates that reliable classification results can be obtained with a combination of strict sample handling and a well-defined classification protocol on clinical samples. |