Email updates

Keep up to date with the latest news and content from Proteome Science and BioMed Central.

This article is part of the supplement: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2011: Proteome Science

Open Access Proceedings

Intrinsic disorder in putative protein sequences

Uros Midic1* and Zoran Obradovic2

Author Affiliations

1 Fels Institute for Cancer Research & Molecular Biology, Temple University School of Medicine, 3307 N. Broad St, Philadelphia, PA 19140, USA

2 Center for Data Analytics and Biomedical Informatics, Temple University, Room 303 Wachman Hall, 1805 N. Broad St, Philadelphia, PA 19122, USA

For all author emails, please log on.

Proteome Science 2012, 10(Suppl 1):S19  doi:10.1186/1477-5956-10-S1-S19

Published: 21 June 2012

Abstract

Background

Intrinsically disordered proteins (IDPs) and regions (IDRs) perform a variety of crucial biological functions despite lacking stable tertiary structure under physiological conditions in vitro. State-of-the-art sequence-based predictors of intrinsic disorder are achieving per-residue accuracies over 80%. In a genome-wide study of intrinsic disorder in human genome we observed a big difference in predicted disorder content between confirmed and putative human proteins. We investigated a hypothesis that this discrepancy is not correct, and that it is due to incorrectly annotated parts of the putative protein sequences that exhibit some similarities to confirmed IDRs, which lead to high predicted disorder content.

Methods

To test this hypothesis we trained a predictor to discriminate sequences of real proteins from synthetic sequences that mimic errors of gene finding algorithms. We developed a procedure to create synthetic peptide sequences by translation of non-coding regions of genomic sequences and translation of coding regions with incorrect codon alignment.

Results

Application of the developed predictor to putative human protein sequences showed that they contain a substantial fraction of incorrectly assigned regions. These regions are predicted to have higher levels of disorder content than correctly assigned regions. This partially, albeit not completely, explains the observed discrepancy in predicted disorder content between confirmed and putative human proteins.

Conclusions

Our findings provide the first evidence that current practice of predicting disorder content in putative sequences should be reconsidered, as such estimates may be biased.