Log on / register
BioMed Central home | Journals A-Z | Feedback | Support | My details
Open AccessResearch

Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets

Wouter Wegdam1* email, Perry D Moerland2* email, Marrije R Buist1 email, Emiel Ver Loren van Themaat2 email, Boris Bleijlevens3 email, Huub CJ Hoefsloot4 email, Chris G de Koster3,4 email and Johannes MFG Aerts3 email

Department of Gynaecologic Oncology, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands

Bioinformatics Laboratory, Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands

Clinical Proteomics Group, Department of Medical Biochemistry, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands

Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, the Netherlands

author email corresponding author email* Contributed equally

Proteome Science 2009, 7:19doi:10.1186/1477-5956-7-19

Published: 14 May 2009

Additional files

Additional file 1:

Variance analysis (ovarian cancer dataset). Boxplots of the coefficient of variation (CV, standard deviation/mean peak intensity). Left panel: CV for all combinations of pre-processing method (Ciphergen: cyan, Cromwell: red) and peak selection setting (A, B, C) for the CM10 chip. Right panel: idem for Q10 chip.

Format: PDF Size: 25KB Download file

This file can be viewed with: Adobe Acrobat Reader

Additional file 2:

Variance analysis (Gaucher dataset). Boxplots of the coefficient of variation (CV, standard deviation/mean peak intensity). CV for all combinations of pre-processing method (Ciphergen: cyan, Cromwell: red) and peak selection setting (A, B, C).

Format: PDF Size: 18KB Download file

This file can be viewed with: Adobe Acrobat Reader

Additional file 3:

Cumulative plot of significance of detected peaks (ovarian cancer dataset. CM10). For each combination of pre-processing method and peak selection settings, the cumulative percentage of peaks with a p-value smaller than the value on the x-axis are shown. P-value of a peak is based on a t-test between the normalized intensities of the cancer and the control group.

Format: PDF Size: 31KB Download file

This file can be viewed with: Adobe Acrobat Reader

Additional file 4:

Cumulative plot of significance of detected peaks (ovarian cancer dataset, Q10). For each combination of pre-processing method and peak selection settings, the cumulative percentage of peaks with a p-value smaller than the value on the x-axis is shown. P-value of a peak is based on a t-test between the normalized intensities of the cancer and the control group.

Format: PDF Size: 31KB Download file

This file can be viewed with: Adobe Acrobat Reader

Additional file 5:

Cumulative plot of significance of detected peaks (Gaucher dataset). For each combination of pre-processing method and peak selection settings, the cumulative percentage of peaks with a p-value smaller than the value on the x-axis is shown. P-value of a peak is based on a t-test between the normalized intensities of the Gaucher and the control group.

Format: PDF Size: 31KB Download file

This file can be viewed with: Adobe Acrobat Reader

Additional file 6:

Comparison of classifiers and pre-processing methods (ovarian cancer dataset). Average classification accuracy on 1000 test sets (size of training sets: 14, size of test sets: 14) for each specific combination of pre-processing method and peak selection settings.

Format: XLS Size: 24KB Download file

This file can be viewed with: Microsoft Excel Viewer

Additional file 7:

Comparison of classifiers and pre-processing methods (Gaucher dataset). Average classification accuracy on 500 test sets (size of training sets: 27, size of test sets: 12) for each specific combination of pre-processing method and peak selection settings.

Format: XLS Size: 19KB Download file

This file can be viewed with: Microsoft Excel Viewer

Additional file 8:

Comparison of classifiers and pre-processing methods (ovarian cancer dataset). Each combination of chip type, pre-processing method and peak selection was ranked by its average classification accuracy on 1,000 test sets (size of training sets: 14, size of test sets: 14) for each classifier. The heatmap gives a colour coding of the ranks from 1 (highest accuracy, red) to 18 (lowest accuracy, light yellow). Columns of the heatmap are ranked by their average rank over all classifiers, with Ciphergen pre-processing using setting C and the combined CM10/Q10 data getting the highest rank. Classifiers are ordered by their average rank over all pre-processing combinations, with DLDA being the best ranked classifier.

Format: PDF Size: 76KB Download file

This file can be viewed with: Adobe Acrobat Reader

Additional file 9:

Comparison of classifiers and pre-processing methods (Gaucher dataset). Each combination of pre-processing method and peak selection was ranked by its average classification accuracy on 500 test sets (size of training sets: 27, size of test sets: 12) for each classifier. The heatmap gives a colour coding of the ranks from 1 (highest accuracy, red) to 6 (lowest accuracy, light yellow). Columns of the heatmap are ranked by their average rank over all classifiers, with Ciphergen pre-processing using setting A getting the highest rank. Classifiers are ordered by their average rank over all pre-processing combinations, with SVM being the best ranked classifier.

Format: PDF Size: 44KB Download file

This file can be viewed with: Adobe Acrobat Reader


© 1999-2010 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.