<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1477-5956-5-20</ui>
   <ji>1477-5956</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Surface antigens and potential virulence factors from parasites detected by comparative genomics of perfect amino acid repeats</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Fankhauser</snm>
               <fnm>Niklaus</fnm>
               <insr iid="I1"/>
               <email>niklaus.fankhauser@izb.unibe.ch</email>
            </au>
            <au id="A2">
               <snm>Nguyen-Ha</snm>
               <fnm>Tien-Minh</fnm>
               <insr iid="I1"/>
               <email>nguyenha@hispeed.ch</email>
            </au>
            <au id="A3">
               <snm>Adler</snm>
               <fnm>Jo&#235;l</fnm>
               <insr iid="I2"/>
               <email>joel.adler@phbern.ch</email>
            </au>
            <au id="A4" ca="yes">
               <snm>M&#228;ser</snm>
               <fnm>Pascal</fnm>
               <insr iid="I1"/>
               <email>pascal.maeser@izb.unibe.ch</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>University of Bern, Institute of Cell Biology, Baltzerstrasse 4, CH-3012 Bern, Switzerland</p>
            </ins>
            <ins id="I2">
               <p>P&#228;dagogische Hochschule Bern, Gertrud Woker Strasse 5, CH-3012 Bern, Switzerland</p>
            </ins>
         </insg>
         <source>Proteome Science</source>
         <issn>1477-5956</issn>
         <pubdate>2007</pubdate>
         <volume>5</volume>
         <issue>1</issue>
         <fpage>20</fpage>
         <url>http://www.proteomesci.com/content/5/1/20</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18096064</pubid>
               <pubid idtype="doi">10.1186/1477-5956-5-20</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>28</day>
               <month>6</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>20</day>
               <month>12</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>20</day>
               <month>12</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Fankhauser et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Many parasitic organisms, eukaryotes as well as bacteria, possess surface antigens with amino acid repeats. Making up the interface between host and pathogen such repetitive proteins may be virulence factors involved in immune evasion or cytoadherence. They find immunological applications in serodiagnostics and vaccine development. Here we use proteins which contain perfect repeats as a basis for comparative genomics between parasitic and free-living organisms.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We have developed Reptile <url>http://reptile.unibe.ch</url>, a program for proteome-wide probabilistic description of perfect repeats in proteins. Parasite proteomes exhibited a large variance regarding the proportion of repeat-containing proteins. Interestingly, there was a good correlation between the percentage of highly repetitive proteins and mean protein length in parasite proteomes, but not at all in the proteomes of free-living eukaryotes. Reptile combined with programs for the prediction of transmembrane domains and GPI-anchoring resulted in an effective tool for in silico identification of potential surface antigens and virulence factors from parasites.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Systemic surveys for perfect amino acid repeats allowed basic comparisons between free-living and parasitic organisms that were directly applicable to predict proteins of serological and parasitological importance. An on-line tool is available at <url>http://genomics.unibe.ch/dora</url>.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Repetitive amino acid subsequences in polypeptides are of interest regarding the function as well as the evolution of proteins. At least 14% of all proteins contain internal repeats, the proportion being somewhat lower in prokaryote and higher in eukaryote proteomes <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Multicellular eukaryotes in particular, possess numerous adhesion proteins of repetitive nature in the extracellular matrix. Other highly repetitive proteins are those of the cytoskeleton <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. Typical motifs involved in protein-protein interaction are the tetratricopeptide repeat (34 aa), armadillo (47 aa), ankyrin (33 aa), and the leucine-rich repeat (about 20 aa) <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Several tools are available for the detection of repeats in proteins: Radar <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>, Repro <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>, Internal Repeats Finder <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>, TRIPS <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>, Trust <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>, Davros <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, RepSeq <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>, REP <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B17">17</abbr></abbrgrp>, Repper <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>, and ProtRepeatsDB <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. Apart from simply counting repetitive occurrences of amino acid subsequences in polypeptides, repeats can be detected by self-alignment or &#8211; if they are evenly distributed &#8211; by Fourier transform. Here we present Reptile, a simple tool for quantitative proteome-wide surveys of perfect amino acid repeats, and its use for the prediction of surface antigens and virulence factors from parasites.</p>
         <p>Pathogenic bacteria as well as eukaryotic parasites often possess surface proteins of repetitive nature, presumably to protect themselves against their hosts' defence responses <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. Examples are the procyclins of the sleeping sickness parasite <it>Trypanosoma brucei </it>with over twenty Glu-Pro (EP-type), respectively five Gly-Pro-Glu-Glu-Thr (GPEET-type) repeats <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>, the circumsporozoite protein of the malaria parasite <it>Plasmodium falciparum </it>with around forty Asn-Ala-Asn-Pro (NANP) repeats <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, or SdrE from <it>Staphylococcus aureus</it>, a determinant of staphylococcal sepsis with 83 Ser-Glu (SE) repeats <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. Such short, perfect repeats are usually very immunogenic. They may serve for serological diagnostics &#8211; the presence of repeat-directed antibodies in the serum indicating infection &#8211; as is the case with PfHRP2 <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, a malaria antigen with over fifty Ala-His-His (AHH) repeats. Repetitive amino acid sequences also find applications in synthetic vaccines <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. Furthermore, repeat-containing proteins from parasites may be virulence factors involved in immune evasion, cytoadherence, stress resistance, or biofilm formation <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr></abbrgrp>. The completion of the genome sequencing projects for <it>P. falciparum</it>, <it>T. brucei</it>, <it>Leishmania major</it>, and other parasites now permits systemic approaches to repeat-containing proteins. Here we identify all proteins from pathogens that contain repeats and use them for comparative genomics between parasitic and non-parasitic species. All data and programs are freely accessible via the world-wide web.</p>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <sec>
            <st>
               <p>Probabilistic description of perfect repeats with Reptile</p>
            </st>
            <p>In order to scan whole proteomes for repeat-containing proteins, we created the tool Reptile. It uses a "brute-force" algorithm that detects all perfect repeats and enables direct calculation of a P-value. For each input sequence, Reptile generates all possible substrings from length 2 to a user-defined maximum (the default is 20) and counts their occurrences. After removing redundant repeats that are contained within longer ones, the repeated sequences are returned by ascending P-value. The probability P to find at least n repeats of length r in a random sequence of length L (with nr &#8804; L &#8804; n20<sup>r</sup>) equals the number of possible sequences that contain the desired repeat, divided by the total number of possible sequences (20<sup>L</sup>).</p>
            <p>
               <display-formula id="M1">
                  <m:math name="1477-5956-5-20-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msup>
                              <m:mtext>P</m:mtext>
                              <m:mo>&#8727;</m:mo>
                           </m:msup>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mtext>n,r,L</m:mtext>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:msup>
                                    <m:mrow>
                                       <m:mn>20</m:mn>
                                    </m:mrow>
                                    <m:mtext>r</m:mtext>
                                 </m:msup>
                                 <m:msup>
                                    <m:mrow>
                                       <m:mn>20</m:mn>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mtext>L-nr</m:mtext>
                                    </m:mrow>
                                 </m:msup>
                              </m:mrow>
                              <m:mrow>
                                 <m:msup>
                                    <m:mrow>
                                       <m:mn>20</m:mn>
                                    </m:mrow>
                                    <m:mtext>L</m:mtext>
                                 </m:msup>
                              </m:mrow>
                           </m:mfrac>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mrow>
                                 <m:mtable>
                                    <m:mtr>
                                       <m:mtd>
                                          <m:mrow>
                                             <m:mtext>L-nr+n</m:mtext>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                    <m:mtr>
                                       <m:mtd>
                                          <m:mtext>n</m:mtext>
                                       </m:mtd>
                                    </m:mtr>
                                 </m:mtable>
                              </m:mrow>
                              <m:mo>)</m:mo>
                           </m:mrow>
                           <m:mo>=</m:mo>
                           <m:msup>
                              <m:mrow>
                                 <m:mn>20</m:mn>
                              </m:mrow>
                              <m:mrow>
                                 <m:mtext>-r</m:mtext>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mtext>n-</m:mtext>
                                 <m:mn>1</m:mn>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:msup>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mrow>
                                 <m:mtable>
                                    <m:mtr>
                                       <m:mtd>
                                          <m:mrow>
                                             <m:mtext>L-n</m:mtext>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mtext>r-</m:mtext>
                                             <m:mn>1</m:mn>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                    <m:mtr>
                                       <m:mtd>
                                          <m:mtext>n</m:mtext>
                                       </m:mtd>
                                    </m:mtr>
                                 </m:mtable>
                              </m:mrow>
                              <m:mo>)</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeeiuaa1aaWbaaSqabeaacqGHxiIkaaGccqGGOaakcqqGUbGBcqqGSaalcqqGYbGCcqqGSaalcqqGmbatcqGGPaqkcqGH9aqpjuaGdaWcaaqaaiabikdaYiabicdaWmaaCaaabeqaaiabbkhaYbaacqaIYaGmcqaIWaamdaahaaqabeaacqqGmbatcqqGTaqlcqqGUbGBcqqGYbGCaaaabaGaeGOmaiJaeGimaaZaaWbaaeqabaGaeeitaWeaaaaakmaabmaabaqbaeqabiqaaaqaaiabbYeamjabb2caTiabb6gaUjabbkhaYjabbUcaRiabb6gaUbqaaiabb6gaUbaaaiaawIcacaGLPaaacqGH9aqpcqaIYaGmcqaIWaamdaahaaWcbeqaaiabb2caTiabbkhaYjabcIcaOiabb6gaUjabb2caTiabigdaXiabcMcaPaaakmaabmaabaqbaeqabiqaaaqaaiabbYeamjabb2caTiabb6gaUjabcIcaOiabbkhaYjabb2caTiabigdaXiabcMcaPaqaaiabb6gaUbaaaiaawIcacaGLPaaaaaa@6477@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Where 20<sup>r </sup>is the number of possible repeat sequences, 20<sup>L-nr </sup>the number of possible sequences around the repeats, and the binomial equals the number of ways to place the n repeats in L. P* is an overestimate because the sequences with more than n repeats are counted too often. Taking this into account gives the correct formula for P:</p>
            <p>
               <display-formula id="M2">
                  <graphic file="1477-5956-5-20-i2.gif"/>
               </display-formula>
            </p>
            <p>Where <it>i </it>counts from n to the maximal number of repeats (L/r), switching signs with every increment according to the inclusion-exclusion principle <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. For practical purposes calculation of P*, the first summand of P, is sufficient since further summands decrease rapidly with increasing number of repeats. Reptile returns all repeats below a user-defined cut-off P-value (the default is 10<sup>-5</sup>, corresponding to an expectancy of one false positive in 100'000 sequences). Direct repeats are marked. The P-value being independent of the actual sequence of a repeat, Reptile also returns a measure of whether a detected repeat consists of rare or frequent amino acids. This "Amino acid abundance measure" (AM) was defined as follows:</p>
            <p>
               <display-formula id="M3">
                  <m:math name="1477-5956-5-20-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtext>AM</m:mtext>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mtext>repeat</m:mtext>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:msub>
                              <m:mrow>
                                 <m:mi>log</m:mi>
                                 <m:mo>&#8289;</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:mn>10</m:mn>
                              </m:mrow>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msup>
                              <m:mrow>
                                 <m:mn>20</m:mn>
                              </m:mrow>
                              <m:mtext>r</m:mtext>
                           </m:msup>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8719;</m:mo>
                                 <m:mrow>
                                    <m:mtext>i=1</m:mtext>
                                 </m:mrow>
                                 <m:mtext>r</m:mtext>
                              </m:munderover>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mtext>i</m:mtext>
                                 </m:msub>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo stretchy="false">)</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeeyqaeKaeeyta0KaeiikaGIaeeOCaiNaeeyzauMaeeiCaaNaeeyzauMaeeyyaeMaeeiDaqNaeiykaKIaeyypa0JagiiBaWMaei4Ba8Maei4zaC2aaSbaaSqaaiabigdaXiabicdaWaqabaGccqGGOaakcqaIYaGmcqaIWaamdaahaaWcbeqaaiabbkhaYbaakmaarahabaGaemOzay2aaSbaaSqaaiabbMgaPbqabaaabaGaeeyAaKMaeeypa0JaeeymaedabaGaeeOCaihaniabg+GivdGccqGGPaqkaaa@4E4D@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Where r is the length of the repeat and <it>f</it><sub>i </sub>is the frequency in the corresponding proteome &#8211; respectively set of sequences submitted by the user &#8211; of the amino acid at position i of the repeat. AM is symmetric to zero, negative values indicating that a repeat predominantly consists of rare amino acids (and vice versa). Reptile is running on-line <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> and accepts batch input of up to 50,000 sequences in any of the commonly used formats.</p>
            <p>Compared to other repeat-prediction programs (Table <tblr tid="T1">1</tblr>) the main strengths of Reptile are its quantitative assessment of the detected repeats and its infallibility regarding short perfect repeats, such as they occur in antigens from parasites. Reptile will spot in a given protein all recurring subsequences from length two to twenty, even if they are dispersed. In contrast to programs implementing self-alignment, however, Reptile does not properly recognize degenerate repeats. Though proteins harbouring degenerate repeats also exhibit low P-values and will not go unnoticed, Reptile will not identify the basic repetitive unit but several shorter ones contained within. Other programs (Table <tblr tid="T1">1</tblr>) should be used when studying large repeat regions or imperfect, diverging repeats.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Comparison of programs for the detection of repetitive subsequences in proteins</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Program</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Method used</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Detection of degenerate repeats</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Calculation of a P-Value</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Analysis of whole Proteomes</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>%Hits found in SwissProt</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Detection of T. brucei procyclin</b>
                           <sup>1</sup>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Reptile</p>
                     </c>
                     <c ca="left">
                        <p>Hashing<sup>2</sup></p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>15<sup>3</sup></p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>REP [2]</p>
                     </c>
                     <c ca="left">
                        <p>Profiles of known repeats</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>1.1</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RADAR [5]</p>
                     </c>
                     <c ca="left">
                        <p>Alignment</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>28</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>REPRO [7]</p>
                     </c>
                     <c ca="left">
                        <p>Alignment</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>n.a.</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Internal Repeats finder [8]</p>
                     </c>
                     <c ca="left">
                        <p>Alignment</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>14</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TRIPS [9]</p>
                     </c>
                     <c ca="left">
                        <p>Fourier transform</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RepSeq [10]</p>
                     </c>
                     <c ca="left">
                        <p>Hashing</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>n.a.</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ProtRepeatsDB [11]</p>
                     </c>
                     <c ca="left">
                        <p>Mixed</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>n.a.</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Repper [12]</p>
                     </c>
                     <c ca="left">
                        <p>Fourier transform</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>n.a.</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>1</sup>The <it>T. brucei </it>surface protein (GenBank accession <ext-link ext-link-type="gen" ext-link-id="AAK62893">AAK62893</ext-link>) with five GPEET repeats [25] was used for benchmarking.</p>
                  <p><sup>2</sup>Word count using a hash table.</p>
                  <p><sup>3</sup>Using P &lt; 0.001 (same as for Internal Repeats Finder).</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Genome-wide surveys for highly repetitive proteins</p>
            </st>
            <p>We defined highly repetitive proteins as proteins that contain perfect repeats of a P-value below 10<sup>-10</sup>. Reptile was used to screen for such proteins in predicted proteomes from fully sequenced genomes. The median proportion of highly repetitive proteins was 2.7% in eukaryote proteomes and 0.43% in prokaryotes, confirming the notion <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> that eukaryotes possess more repetitive proteins than bacteria (p &lt; 0.0001, Mann-Whitney test). The more repeats a protein has, the longer it becomes. In eukaryotic proteomes the percentage of highly-repetitive proteins correlated to some degree with the mean protein length (Spearman coefficient r<sub>S </sub>= 0.51, p = 0.011). When distinguishing free-living from (endo)parasitic eukaryotes (Table <tblr tid="T2">2</tblr>), it was evident that the correlation was caused entirely by the latter. Obligate parasites exhibited a good correlation between highly-repetitive proteins and mean protein length (r<sub>S </sub>= 0.82, p = 0.003) while free-living eukaryotes showed no correlation at all (Figure <figr fid="F1">1</figr>). The finding that the percentage of highly repetitive proteins predicts average protein length only in parasite proteomes reflects the significance of repeat-containing proteins for survival in the host, possibly counterbalanced by a selective pressure on parasites for shorter proteins <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Eukaryotic proteomes analyzed</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Organism</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Kingdom</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Type</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>Proteins</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Homo sapiens</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Metazoa</p>
                     </c>
                     <c ca="center">
                        <p>F</p>
                     </c>
                     <c ca="right">
                        <p>38220</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Mus musculus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Metazoa</p>
                     </c>
                     <c ca="center">
                        <p>F</p>
                     </c>
                     <c ca="right">
                        <p>35593</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Arabidopsis thaliana</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Viridiplantae</p>
                     </c>
                     <c ca="center">
                        <p>F</p>
                     </c>
                     <c ca="right">
                        <p>34554</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Caenorhabditis elegans</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Metazoa</p>
                     </c>
                     <c ca="center">
                        <p>F</p>
                     </c>
                     <c ca="right">
                        <p>22431</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Drosophila melanogaster</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Metazoa</p>
                     </c>
                     <c ca="center">
                        <p>F</p>
                     </c>
                     <c ca="right">
                        <p>16239</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Brachydanio rerio</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Metazoa</p>
                     </c>
                     <c ca="center">
                        <p>F</p>
                     </c>
                     <c ca="right">
                        <p>15647</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Anopheles gambiae</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Metazoa</p>
                     </c>
                     <c ca="center">
                        <p>F</p>
                     </c>
                     <c ca="right">
                        <p>13486</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Dictyostelium discoideum</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Protozoa</p>
                     </c>
                     <c ca="center">
                        <p>F</p>
                     </c>
                     <c ca="right">
                        <p>13017</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Rattus norvegicus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Metazoa</p>
                     </c>
                     <c ca="center">
                        <p>F</p>
                     </c>
                     <c ca="right">
                        <p>11987</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Yarrowia lipolytica</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Fungi</p>
                     </c>
                     <c ca="center">
                        <p>F</p>
                     </c>
                     <c ca="right">
                        <p>6525</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Saccharomyces cerevisiae</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Fungi</p>
                     </c>
                     <c ca="center">
                        <p>F</p>
                     </c>
                     <c ca="right">
                        <p>5810</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Kluyveromyces lactis</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Fungi</p>
                     </c>
                     <c ca="center">
                        <p>F</p>
                     </c>
                     <c ca="right">
                        <p>5326</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Schizosaccharomyces pombe</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Fungi</p>
                     </c>
                     <c ca="center">
                        <p>F</p>
                     </c>
                     <c ca="right">
                        <p>5009</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Entamoeba histolytica</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Protozoa</p>
                     </c>
                     <c ca="center">
                        <p>P</p>
                     </c>
                     <c ca="right">
                        <p>9772</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Giardia duodenalis</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Protozoa</p>
                     </c>
                     <c ca="center">
                        <p>P</p>
                     </c>
                     <c ca="right">
                        <p>9646</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Trypanosoma brucei</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Protozoa</p>
                     </c>
                     <c ca="center">
                        <p>P</p>
                     </c>
                     <c ca="right">
                        <p>9210</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Leishmania major</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Protozoa</p>
                     </c>
                     <c ca="center">
                        <p>P</p>
                     </c>
                     <c ca="right">
                        <p>8010</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Cryptococcus neoformans</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Fungi</p>
                     </c>
                     <c ca="center">
                        <p>P</p>
                     </c>
                     <c ca="right">
                        <p>6569</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Plasmodium falciparum</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Protozoa</p>
                     </c>
                     <c ca="center">
                        <p>P</p>
                     </c>
                     <c ca="right">
                        <p>5283</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Theileria parva</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Protozoa</p>
                     </c>
                     <c ca="center">
                        <p>P</p>
                     </c>
                     <c ca="right">
                        <p>4071</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Cryptosporidium hominis</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Protozoa</p>
                     </c>
                     <c ca="center">
                        <p>P</p>
                     </c>
                     <c ca="right">
                        <p>3886</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Theileria annulata</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Protozoa</p>
                     </c>
                     <c ca="center">
                        <p>P</p>
                     </c>
                     <c ca="right">
                        <p>3790</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Encephalitozoon cuniculi</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Fungi</p>
                     </c>
                     <c ca="center">
                        <p>P</p>
                     </c>
                     <c ca="right">
                        <p>1909</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>F, free-living; P, endoparasitic.</p>
               </tblfn>
            </tbl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Comparative genomics of repeat-containing proteins</p>
               </caption>
               <text>
                  <p><b>Comparative genomics of repeat-containing proteins</b>. Double logarithmic plot of the percentage of highly repetitive (P &lt; 10<sup>-10</sup>) proteins vs. mean protein length of eukaryotic proteomes. Ag, <it>A. gambiae</it>; At, <it>A. thaliana</it>; Br, <it>B. rerio</it>; Ce, <it>C. elegans</it>; Dd, <it>D. discoideum</it>; Dm, <it>D. melanogaster</it>; Hs, <it>H. sapiens</it>; Kl, <it>K. lactis</it>; Mm, <it>M. musculus</it>; Rn, <it>R. norvegicus</it>; Sc, <it>S. cerevisiae</it>; Sp, <it>S. pombe</it>; Yl, <it>Y. lipolytica</it>; Ch, <it>C. hominis</it>; Cn, <it>C. neoformans</it>; Ec, <it>E. cuniculi</it>; Eh, <it>E. histolytica</it>; Gd, <it>G. duodenalis</it>; Lm, <it>L. major</it>; Pf, <it>P. falciparum</it>; Ta, <it>T. annulata</it>; Tb, <it>T. brucei</it>; Tp, <it>T. parva</it>; r<sub>S</sub>, Spearman coefficient.</p>
               </text>
               <graphic file="1477-5956-5-20-1"/>
            </fig>
            <p>The eukaryote with the largest proportion of highly repetitive proteins, <it>Plasmodium falciparum </it>with 28%, and that with the smallest one, <it>Encephalitozoon cuniculi </it>with 0.42%, were both obligate parasites. The same applied to prokaryotes, where the highest proportions of highly repetitive proteins were exhibited by <it>Mycobacterium bovis </it>(3.0%), <it>M. tuberculosis </it>(2.9%) and <it>Parachlamydia </it>sp. (2.7%), and the lowest ones by <it>Bacillus anthracis </it>(Porton strain, 0.02%) and <it>Streptococcus pyogenes </it>(SSI strain, 0.05%) &#8211; however, it must be noted that with bacteria, the available genome sequences are biased towards pathogenic species. The most repetitive protein from eukaryotes was a hypothetical protein from the sleeping sickness parasite <it>T. brucei</it>, followed by the 11-1 gene product from <it>P. falciparum</it>, a known malaria antigen of more than 1 MD size <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. The most repetitive prokaryotic protein was a predicted cell wall surface anchor family member from <it>Streptococcus pneumoniae</it>, the leading cause of pneumonia. Table <tblr tid="T3">3</tblr> summarizes these and other highly repetitive proteins identified from pathogens, emphasizing on sequences with experimentally verified expression. The genome-wide surveys yielded other known virulence factors such as proteophosphoglycans of <it>Leishmania </it><abbrgrp><abbr bid="B40">40</abbr></abbrgrp> or PGRS (polymorphic GC-rich repetitive sequence) proteins of <it>Mycobacterium</it>, an antituberculosis vaccine candidate <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. The presence of avirulence proteins from phytopathogenic bacteria among the most repetitive proteins indicates that repeats also serve to specifically trigger host defence responses. Remarkably repetitive are also the ice nucleation proteins of plant pathogens. Table <tblr tid="T3">3</tblr> also shows examples of previously undescribed proteins. The complete datasets on repeat-containing proteins from 49 eukaryotes and 193 prokaryotes are accessible on-line in the archive REPository <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>A selection of the most repetitive proteins from pathogens</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Name, accession</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Sp</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>L</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Repeat</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>pP</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical protein, Tb927.1.1740</p>
                     </c>
                     <c ca="left">
                        <p>Tb</p>
                     </c>
                     <c ca="right">
                        <p>7154</p>
                     </c>
                     <c ca="left">
                        <p>132 &#215; LAEESQQHTARSEADIDE</p>
                     </c>
                     <c ca="right">
                        <p>2806</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Gene 11-1 protein*, Q8I6U6</p>
                     </c>
                     <c ca="left">
                        <p>Pf</p>
                     </c>
                     <c ca="right">
                        <p>10589</p>
                     </c>
                     <c ca="left">
                        <p>967 &#215; EEV</p>
                     </c>
                     <c ca="right">
                        <p>2457</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Conserved protein, LmjF29.0110</p>
                     </c>
                     <c ca="left">
                        <p>Lm</p>
                     </c>
                     <c ca="right">
                        <p>3418</p>
                     </c>
                     <c ca="left">
                        <p>146 &#215; AEEQARR</p>
                     </c>
                     <c ca="right">
                        <p>1080</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Proteophosphoglycan-like, LmjF35.0550</p>
                     </c>
                     <c ca="left">
                        <p>Lm</p>
                     </c>
                     <c ca="right">
                        <p>2425</p>
                     </c>
                     <c ca="left">
                        <p>105 &#215; SSSSSAPSA</p>
                     </c>
                     <c ca="right">
                        <p>1052</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Putative antigen*, Tb04.29M18.750</p>
                     </c>
                     <c ca="left">
                        <p>Tb</p>
                     </c>
                     <c ca="right">
                        <p>4455</p>
                     </c>
                     <c ca="left">
                        <p>66 &#215; NEQYETLQRTNAA</p>
                     </c>
                     <c ca="right">
                        <p>958</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Gb4*, Tb09.160.1200</p>
                     </c>
                     <c ca="left">
                        <p>Tb</p>
                     </c>
                     <c ca="right">
                        <p>8214</p>
                     </c>
                     <c ca="left">
                        <p>35 &#215; VVIIDCRLGSLLIDYKVI</p>
                     </c>
                     <c ca="right">
                        <p>701</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical protein, Chro.50162</p>
                     </c>
                     <c ca="left">
                        <p>Ch</p>
                     </c>
                     <c ca="right">
                        <p>1589</p>
                     </c>
                     <c ca="left">
                        <p>84 &#215; KKDAP</p>
                     </c>
                     <c ca="right">
                        <p>407</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical protein, Q8I455</p>
                     </c>
                     <c ca="left">
                        <p>Pf</p>
                     </c>
                     <c ca="right">
                        <p>2349</p>
                     </c>
                     <c ca="left">
                        <p>67 &#215; LKEEER</p>
                     </c>
                     <c ca="right">
                        <p>389</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Interspersed repeat antigen*, Q8I486</p>
                     </c>
                     <c ca="left">
                        <p>Pf</p>
                     </c>
                     <c ca="right">
                        <p>1720</p>
                     </c>
                     <c ca="left">
                        <p>67 &#215; QEPVT</p>
                     </c>
                     <c ca="right">
                        <p>313</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Putative antigen 332*, Q8IHN3</p>
                     </c>
                     <c ca="left">
                        <p>Pf</p>
                     </c>
                     <c ca="right">
                        <p>5507</p>
                     </c>
                     <c ca="left">
                        <p>144 &#215; EEI</p>
                     </c>
                     <c ca="right">
                        <p>274</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cell wall surface anchor family, Q97P71</p>
                     </c>
                     <c ca="left">
                        <p>Spn</p>
                     </c>
                     <c ca="right">
                        <p>4776</p>
                     </c>
                     <c ca="left">
                        <p>1074 &#215; SAS</p>
                     </c>
                     <c ca="right">
                        <p>3418</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cell surface SD repeat protein, Q88XB6</p>
                     </c>
                     <c ca="left">
                        <p>Lpl</p>
                     </c>
                     <c ca="right">
                        <p>3360</p>
                     </c>
                     <c ca="left">
                        <p>796 &#215; DS</p>
                     </c>
                     <c ca="right">
                        <p>1619</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical protein, Q8E473</p>
                     </c>
                     <c ca="left">
                        <p>Sag</p>
                     </c>
                     <c ca="right">
                        <p>1310</p>
                     </c>
                     <c ca="left">
                        <p>106 &#215; TSAS</p>
                     </c>
                     <c ca="right">
                        <p>447</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Putative peptidoglycan-bound, Q8Y697</p>
                     </c>
                     <c ca="left">
                        <p>Lmo</p>
                     </c>
                     <c ca="right">
                        <p>903</p>
                     </c>
                     <c ca="left">
                        <p>78 &#215; ADADA</p>
                     </c>
                     <c ca="right">
                        <p>403</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Avirulence protein, Q5GYF3</p>
                     </c>
                     <c ca="left">
                        <p>Xor</p>
                     </c>
                     <c ca="right">
                        <p>1790</p>
                     </c>
                     <c ca="left">
                        <p>20 &#215; ETVQRLLPVLCQDHGLTP</p>
                     </c>
                     <c ca="right">
                        <p>401</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Serine/threonine-rich antigen, Q99QY4</p>
                     </c>
                     <c ca="left">
                        <p>Sau</p>
                     </c>
                     <c ca="right">
                        <p>2271</p>
                     </c>
                     <c ca="left">
                        <p>163 &#215; STS</p>
                     </c>
                     <c ca="right">
                        <p>391</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PE-PGRS family, PG54_MYCTU</p>
                     </c>
                     <c ca="left">
                        <p>Mt</p>
                     </c>
                     <c ca="right">
                        <p>1901</p>
                     </c>
                     <c ca="left">
                        <p>136 &#215; GAG</p>
                     </c>
                     <c ca="right">
                        <p>326</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Structural toxin RtxA, Q5X7A6</p>
                     </c>
                     <c ca="left">
                        <p>Lpn</p>
                     </c>
                     <c ca="right">
                        <p>7679</p>
                     </c>
                     <c ca="left">
                        <p>29 &#215; RFEDDGPVV</p>
                     </c>
                     <c ca="right">
                        <p>247</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ice nucleation protein, Q8PD38</p>
                     </c>
                     <c ca="left">
                        <p>Xca</p>
                     </c>
                     <c ca="right">
                        <p>1333</p>
                     </c>
                     <c ca="left">
                        <p>52 &#215; GYGST</p>
                     </c>
                     <c ca="right">
                        <p>242</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PPE family protein, Q6MX44</p>
                     </c>
                     <c ca="left">
                        <p>Mtu</p>
                     </c>
                     <c ca="right">
                        <p>3300</p>
                     </c>
                     <c ca="left">
                        <p>95 &#215; NTG</p>
                     </c>
                     <c ca="right">
                        <p>184</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Eukaryotic proteins (top) whose expression is confirmed by the presence of expressed sequence tags (EST) in GenBank are marked with an asterisk. L, length; pP, negative logarithm of the P-value; Sp, species (Ch, <it>C. hominis</it>; Lm, <it>L. major</it>; Pf, <it>P. falciparum</it>; Tb, <it>T. brucei</it>; Lmo, <it>Listeria monocytogenes</it>; Lpl, <it>Lactobacillus plantarum</it>; Lpn, <it>Legionella pneumophila</it>; Mtu, <it>M. tuberculosis</it>; Sau, <it>S. aureus</it>; Spn, <it>S. pneumoniae</it>; Sag, <it>Streptococcus agalactiae</it>; Xca, <it>Xanthomonas campestris</it>; Xor, <it>X. oryzae</it>).</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Amino acid composition of the repeats</p>
            </st>
            <p>To further characterize the repeats, we investigated which amino acids are over- or underrepresented in repeats of P &lt; 10<sup>-10 </sup>compared to the rest of the respective proteome. Overall, the amino acid composition of the repeats was more biased in eukaryotes than in bacteria (Figure <figr fid="F2">2</figr>). Small amino acids occurred more frequently in the repeats than large ones in both eukaryotes and prokaryotes. Hydrophobic residues were underrepresented in the repeats, with the exception of leucine, which in bacterial repeats was even overrepresented (p &lt; 0.0001, two-tailed Wilcoxon signed rank test). Strongly overrepresented in the repeats were alanine (p &lt; 0.0001) in bacteria and serine (p = 0.0001) in eukaryotes (Figure <figr fid="F2">2</figr>). Thus "cheap" amino acids seem to be preferred over energetically expensive ones. Interestingly, asparagine was overrepresented in the repeats from eukaryotes (p = 0.057) but not from bacteria (Figure <figr fid="F2">2</figr>), suggesting that asparagines might be preferentially glycosylated in repeats. Contrary to expectation though, the probability of an asparagine to be in N-glycosylation consensus was significantly lower in repeats than in non-repetitive sequences (Figure <figr fid="F3">3</figr>). This was the case for free-living eukaryotes (p = 0.004) as well as for parasites (p = 0.027). The only exception was <it>T. brucei</it>, where the likelihood of an asparagine to be in N-glycosylation consensus was three-fold higher in repetitive than in non-repetitive sequences (Figure <figr fid="F3">3</figr>).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Amino acid composition of the repeats</p>
               </caption>
               <text>
                  <p><b>Amino acid composition of the repeats</b>. For each amino acid, the frequency in the repeats of P &lt; 10<sup>-10 </sup>is plotted vs. its frequency in the remainder of the proteome (r<sub>S</sub>, Spearman coefficient). Data are pooled for bacteria (n = 193) and eukaryotes (n = 49). The small diamonds at 0.05 mark the expected frequency for random distribution, the diagonal represents equal frequency in the repeats as in the remainder of the respective proteome. Complete datatables including standard deviation are provided as a supplementary file [Additional file <supplr sid="S1">1</supplr>].</p>
               </text>
               <graphic file="1477-5956-5-20-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Potential N-glycosylation sites in the repeats</p>
               </caption>
               <text>
                  <p><b>Potential N-glycosylation sites in the repeats</b>. The percentage of asparagines that are in glycosylation consensus (Asn-not Pro-Ser/Thr) is plotted for repeats of P &lt; 10<sup>-10 </sup>and for the remainders of the respective proteomes. Bars indicate the median. The organism with 30% of asparagines in the repeats in N-glycosylation consensus is <it>T. brucei</it>.</p>
               </text>
               <graphic file="1477-5956-5-20-3"/>
            </fig>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>Amino acid frequencies. Additional file <supplr sid="S1">1</supplr> is a MS Excel file with two tables on separate Worksheets. Table <tblr tid="T1">1</tblr> contains the amino acid frequencies in the predicted proteomes of 29 eukaryotes and 198 bacteria; Table <tblr tid="T2">2</tblr> contains the amino acid frequencies in the repeats of P &lt; 10<sup>-10 </sup>of the same proteomes.</p>
               </text>
               <file name="1477-5956-5-20-S1.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Prediction of repetitive surface antigens</p>
            </st>
            <p>In order to predict which of the repeat-containing proteins are at the cell surface, Reptile was combined with Phobius <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>, a program for prediction of transmembrane domains and N-terminal export signals, and GPI-SOM <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, a program that predicts C-terminal GPI (glycosylphosphatidyl-inositol) anchor attachment sites. The three programs were run over all available proteomes predicted from completely sequenced genomes. The identified repeats were scanned for potential N-glycosylation sites. The combined output was stored in a relational database called Dora, the <ul>d</ul>atabase <ul>o</ul>f <ul>r</ul>epetitive <ul>a</ul>ntigens, as outlined in Figure <figr fid="F4">4</figr>. At present, Dora contains data on 1,123,238 proteins from 242 different proteomes (among which 49 eukaryotic). A www interface <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> allows user-defined Boolean searches (Figure <figr fid="F4">4</figr>). With Dora, genome-wide prediction of potential surface antigens and virulence factors is straightforward. A search for repetitive membrane proteins in <it>P. falciparum </it>or <it>T. brucei </it>(Table <tblr tid="T4">4</tblr>) indeed returned important surface antigens and virulence factors: circumsporozoite protein (CSP), merozoite surface proteins (MSP), erythrocyte membrane proteins (EMP), glycophorin-binding proteins (GBP), apical membrane/erythrocyte binding antigen (MAEBL), ring-infected erythrocyte surface antigen (RESA), mature parasite-infected erythrocyte surface antigen (MESA) for malaria and for <it>T. brucei </it>the procyclins, cysteine-rich acidic membrane protein (CRAM), invariant surface glycoproteins (ISG) and even the variable surface glycoproteins (VSG), which contain a significant number of dipeptide repeats (mostly AA; to our knowledge the repetitive nature of VSG was not previously recognized). In addition to these known proteins there was a large number of uncharacterized ones, particularly from <it>P. falciparum </it>which possesses hundreds of extremely repetitive transmembrane proteins (not shown; please refer to Dora).</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Repetitive membrane proteins of <it>P. falciparum </it>(top) and <it>T. brucei </it>(bottom)</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Name, accession</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Topology</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Repeat</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>pP</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical protein, Q8IJ50</p>
                     </c>
                     <c ca="center">
                        <p>GPI</p>
                     </c>
                     <c ca="left">
                        <p>16 &#215; EESHNFYNPTH</p>
                     </c>
                     <c ca="right">
                        <p>184</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Circumsporozoite protein, Q7K740</p>
                     </c>
                     <c ca="center">
                        <p>GPI</p>
                     </c>
                     <c ca="left">
                        <p>38 &#215; ANPN</p>
                     </c>
                     <c ca="right">
                        <p>145</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Merozoite surface protein 8, Q8I476</p>
                     </c>
                     <c ca="center">
                        <p>GPI</p>
                     </c>
                     <c ca="left">
                        <p>32 &#215; NN</p>
                     </c>
                     <c ca="right">
                        <p>29</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Liver stage antigen, Q8IJ44</p>
                     </c>
                     <c ca="center">
                        <p>1 TM</p>
                     </c>
                     <c ca="left">
                        <p>45 &#215; AKEKLQEQQSDLEQER</p>
                     </c>
                     <c ca="right">
                        <p>839</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Erythrocyte membrane protein 3, O96124</p>
                     </c>
                     <c ca="center">
                        <p>1 TM</p>
                     </c>
                     <c ca="left">
                        <p>61 &#215; QQNTGLKNTP</p>
                     </c>
                     <c ca="right">
                        <p>665</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Trophozoite antigen, Q8IFL9</p>
                     </c>
                     <c ca="center">
                        <p>1 TM</p>
                     </c>
                     <c ca="left">
                        <p>60 &#215; NHKSD</p>
                     </c>
                     <c ca="right">
                        <p>287</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Glycophorin-binding protein, Q8I6U8</p>
                     </c>
                     <c ca="center">
                        <p>1 TM</p>
                     </c>
                     <c ca="left">
                        <p>10 &#215; DPEGQIMREYAADPEYRKHL</p>
                     </c>
                     <c ca="right">
                        <p>213</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>MAEBL, Q8IHP3</p>
                     </c>
                     <c ca="center">
                        <p>1 TM</p>
                     </c>
                     <c ca="left">
                        <p>19 &#215; EEKKKADELKK</p>
                     </c>
                     <c ca="right">
                        <p>213</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PF70 exoantigen, Q8IK15</p>
                     </c>
                     <c ca="center">
                        <p>3 TM</p>
                     </c>
                     <c ca="left">
                        <p>8 &#215; TKKPSKYTMNLDSPLLKGSS</p>
                     </c>
                     <c ca="right">
                        <p>165</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>MESA, Q8I492</p>
                     </c>
                     <c ca="center">
                        <p>1 TM</p>
                     </c>
                     <c ca="left">
                        <p>94 &#215; KE</p>
                     </c>
                     <c ca="right">
                        <p>97</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PfEMP1, Q8I519</p>
                     </c>
                     <c ca="center">
                        <p>1 TM</p>
                     </c>
                     <c ca="left">
                        <p>16 &#215; GGGGGS</p>
                     </c>
                     <c ca="right">
                        <p>77</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RESA, Q8IHN1</p>
                     </c>
                     <c ca="center">
                        <p>1 TM</p>
                     </c>
                     <c ca="left">
                        <p>33 &#215; EEN</p>
                     </c>
                     <c ca="right">
                        <p>63</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical protein, Tb11.02.2360</p>
                     </c>
                     <c ca="center">
                        <p>GPI</p>
                     </c>
                     <c ca="left">
                        <p>11 &#215; TAVTDVNDNNSANTSNEDE</p>
                     </c>
                     <c ca="right">
                        <p>229</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical protein, Tb11.1550</p>
                     </c>
                     <c ca="center">
                        <p>GPI</p>
                     </c>
                     <c ca="left">
                        <p>12 &#215; IIAHYC</p>
                     </c>
                     <c ca="right">
                        <p>68</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Procyclin (EP-type), Tb10.6k15.0020</p>
                     </c>
                     <c ca="center">
                        <p>GPI</p>
                     </c>
                     <c ca="left">
                        <p>29 &#215; PE</p>
                     </c>
                     <c ca="right">
                        <p>46</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical protein, Tb927.7.360</p>
                     </c>
                     <c ca="center">
                        <p>GPI</p>
                     </c>
                     <c ca="left">
                        <p>3 &#215; DKEKTERTEVEEVPKKDPEG</p>
                     </c>
                     <c ca="right">
                        <p>45</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Procyclin (GPEET-type), Tb927.6.510</p>
                     </c>
                     <c ca="center">
                        <p>GPI</p>
                     </c>
                     <c ca="left">
                        <p>6 &#215; EETGP</p>
                     </c>
                     <c ca="right">
                        <p>24</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VSG, Tb10.v4.0209</p>
                     </c>
                     <c ca="center">
                        <p>GPI</p>
                     </c>
                     <c ca="left">
                        <p>19 &#215; AA</p>
                     </c>
                     <c ca="right">
                        <p>13</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CRAM, Tb10.6k15.3510</p>
                     </c>
                     <c ca="center">
                        <p>1 TM</p>
                     </c>
                     <c ca="left">
                        <p>80 &#215; ITGDCNETDDC</p>
                     </c>
                     <c ca="right">
                        <p>1050</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical protein, Tb927.3.5530</p>
                     </c>
                     <c ca="center">
                        <p>2 TM</p>
                     </c>
                     <c ca="left">
                        <p>49 &#215; RLRAEEE</p>
                     </c>
                     <c ca="right">
                        <p>337</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical protein, Tb10.61.0660</p>
                     </c>
                     <c ca="center">
                        <p>3 TM</p>
                     </c>
                     <c ca="left">
                        <p>12 &#215; NEEVPAGVSARRGGVAMSF</p>
                     </c>
                     <c ca="right">
                        <p>241</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Procyclic surface glycoprotein, Tb10.26.0790</p>
                     </c>
                     <c ca="center">
                        <p>2 TM</p>
                     </c>
                     <c ca="left">
                        <p>5 &#215; YGQPPPPQ</p>
                     </c>
                     <c ca="right">
                        <p>31</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Invariant surface glycoprotein, Tb927.5.350</p>
                     </c>
                     <c ca="center">
                        <p>1 TM</p>
                     </c>
                     <c ca="left">
                        <p>18 &#215; EA</p>
                     </c>
                     <c ca="right">
                        <p>12</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>TM, transmembrane domain; GPI, glycosylphosphatidyl-inositol anchor; pP, negative logarithm of the P-value. See text for full protein names.</p>
               </tblfn>
            </tbl>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Flowchart to Dora, database of repetitive antigens</p>
               </caption>
               <text>
                  <p><b>Flowchart to Dora, database of repetitive antigens</b>. Reptile, Phobius [20], and GPI-SOM [43] are integrated into an automated pipeline for the classification of proteins (top). The data are stored in a database that is accessible on-line [44] via the depicted interface (bottom). This allows user-defined Boolean queries for repeat-containing surface proteins.</p>
               </text>
               <graphic file="1477-5956-5-20-4"/>
            </fig>
            <p>New specific and robust tests are urgently needed for the diagnosis of sleeping sickness, malaria, tuberculosis, and other neglected diseases <abbrgrp><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr></abbrgrp>. PCR not being applicable in the field, serology (i.e. the detection of parasite-specific antibodies) remains the principal method of detection for many tropical diseases. Dora provides a convenient portal for identification of candidate antigens for serological tests. In addition, it can be helpful for the selection of vaccine candidates. Dora returns the hits in Fasta format, which is suitable for subsequent bioinformatic analyses.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Reptile's simple algorithm allows large-scale and quantitative description of perfect amino acid repeats. Originally designed to scan parasite proteomes for potential antigens and virulence factors, Reptile detects any protein of repetitive nature and thereby complements existing tools which work by self-alignment. Parasite proteomes vary considerably regarding the proportion of repetitive proteins, in contrast to those of free-living eukaryotes which all contain around 3% highly repetitive (P &lt; 10<sup>-10</sup>) proteins. Furthermore, the proportion of highly repetitive proteins correlates with mean protein length in parasites but not in the proteomes of free-living eukaryotes, illustrating the importance of amino acid repeats for parasites.</p>
         <p>Scanning the predicted proteomes of parasites for amino acid repeats returned a large number of interesting proteins. Particularly useful was the combination of Reptile with prediction of glycosylation sites, export signals, transmembrane domains and GPI-anchor attachment sites, carried out on more than one million proteins from 242 different organisms. All data are accessible on-line via Dora, database of repetitive antigens. The approach was validated against <it>T. brucei </it>and <it>P. falciparum</it>, where a Dora search returned the known surface antigens, virulence factors, and vaccine candidates plus many new, so far uncharacterized proteins.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Proteome files</p>
            </st>
            <p>Predicted proteome files were obtained from the Integr8 database of the European Bioinformatics Institute <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. The download was automated with a Python script that periodically checks for newly available proteomes, respectively for updates to previous proteome files.</p>
         </sec>
         <sec>
            <st>
               <p>Statistics</p>
            </st>
            <p>Statistical tests were performed with Prism 4.0 (GraphPad Software). Since the percentages of repeats in proteomes as well as the frequencies of amino acids were not normally distributed, non-parametric tests were used: Mann-Whitney test <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>, Wilcoxon signed rank test <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>, and Spearman correlation <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Reptile</p>
            </st>
            <p>The repeat detection algorithm is described under Results. The program is written in C++ and the web-interface in Perl-CGI. Reptile uses sreformat from the HMMer package <abbrgrp><abbr bid="B51">51</abbr></abbrgrp> to convert different input formats (Fasta, GenBank, EMBL, Swiss-Prot, PIR, GCG) to Fasta. Reptile runs on a vmware (virtual infrastructure) server. Availability and requirements:</p>
            <p>Project name: Reptile</p>
            <p>Project home page: <url>http://genomics.unibe.ch/software/reptile.tar.gz</url></p>
            <p>Operating systems: Linux, Unix</p>
            <p>Programming language: C++</p>
            <p>Licence: GNU GPL</p>
         </sec>
         <sec>
            <st>
               <p>Dora</p>
            </st>
            <p>A Python script periodically runs Reptile, GPI-SOM, and Phobius over all new or updated proteome files of Integr8. The results are stored in a MySQL database. For sake of simplicity, for each protein only the repeat with the lowest P-value is stored. A Perl script is used to interconvert Fasta format and SQL. The web interface of Dora is written in PhP. The database and all the programs run on the vmware server of the Informatics Services of the University of Bern.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The author(s) declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>NF developed all software and generated all the data. TN designed the MySQL database and created the user interface of Dora. JA derived the formula for the calculation of the P-value. NF and PM conceived the study, wrote the manuscript, and designed the figures. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We wish to thank the Informatikdienste of the University of Bern for resources and support. This work was supported by the Swiss National Science Foundation, the Roche Research Foundation, and Biomedizin-Naturwissenschaft-Forschung Bern (TN).</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>A census of protein repeats</p>
            </title>
            <aug>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Pellegrini</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yeates</snm>
                  <fnm>TO</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1999</pubdate>
            <volume>293</volume>
            <fpage>151</fpage>
            <lpage>160</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1999.3136</pubid>
                  <pubid idtype="pmpid" link="fulltext">10512723</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Homology-based method for identification of protein repeats using statistical significance estimates</p>
            </title>
            <aug>
               <au>
                  <snm>Andrade</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>298</volume>
            <fpage>521</fpage>
            <lpage>537</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.3684</pubid>
                  <pubid idtype="pmpid" link="fulltext">10772867</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Protein repeats: structures, functions, and evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Andrade</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Perez-Iratxeta</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
            </aug>
            <source>J Struct Biol</source>
            <pubdate>2001</pubdate>
            <volume>134</volume>
            <fpage>117</fpage>
            <lpage>131</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jsbi.2001.4392</pubid>
                  <pubid idtype="pmpid" link="fulltext">11551174</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Rapid automatic detection and alignment of repeats in protein sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Heger</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Holm</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2000</pubdate>
            <volume>41</volume>
            <fpage>224</fpage>
            <lpage>237</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/1097-0134(20001101)41:2&lt;224::AID-PROT70>3.0.CO;2-Z</pubid>
                  <pubid idtype="pmpid" link="fulltext">10966575</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Radar</p>
            </title>
            <url>http://www.ebi.ac.uk/Radar</url>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Repro</p>
            </title>
            <url>http://ibivu.cs.vu.nl/programs/reprowww</url>
         </bibl>
         <bibl id="B7">
            <title>
               <p>The REPRO server: finding protein internal sequence repeats through the Web</p>
            </title>
            <aug>
               <au>
                  <snm>George</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Heringa</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Trends Biochem Sci</source>
            <pubdate>2000</pubdate>
            <volume>25</volume>
            <fpage>515</fpage>
            <lpage>517</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0968-0004(00)01643-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">11203383</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>A fast algorithm for genome-wide analysis of proteins with repeated sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Pellegrini</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Yeates</snm>
                  <fnm>TO</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>1999</pubdate>
            <volume>35</volume>
            <fpage>440</fpage>
            <lpage>446</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/(SICI)1097-0134(19990601)35:4&lt;440::AID-PROT7>3.0.CO;2-Y</pubid>
                  <pubid idtype="pmpid" link="fulltext">10382671</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Internal Repeats Finder</p>
            </title>
            <url>http://nihserver.mbi.ucla.edu/Repeats</url>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Amino acid repeat patterns in protein sequences: their diversity and structural-functional implications</p>
            </title>
            <aug>
               <au>
                  <snm>Katti</snm>
                  <fnm>MV</fnm>
               </au>
               <au>
                  <snm>Sami-Subbu</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ranjekar</snm>
                  <fnm>PK</fnm>
               </au>
               <au>
                  <snm>Gupta</snm>
                  <fnm>VS</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>2000</pubdate>
            <volume>9</volume>
            <fpage>1203</fpage>
            <lpage>1209</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2144659</pubid>
                  <pubid idtype="pmpid" link="fulltext">10892812</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>TRIPS</p>
            </title>
            <url>http://www.ncl-india.org/trips</url>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Tracking repeats using significance and transitivity</p>
            </title>
            <aug>
               <au>
                  <snm>Szklarczyk</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Heringa</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20 Suppl 1</volume>
            <fpage>I311</fpage>
            <lpage>I317</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth911</pubid>
                  <pubid idtype="pmpid" link="fulltext">15262814</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Trust</p>
            </title>
            <url>http://zeus.cs.vu.nl/programs/trustwww/</url>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Toward the detection and validation of repeats in protein structure</p>
            </title>
            <aug>
               <au>
                  <snm>Murray</snm>
                  <fnm>KB</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2004</pubdate>
            <volume>57</volume>
            <fpage>365</fpage>
            <lpage>380</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.20202</pubid>
                  <pubid idtype="pmpid" link="fulltext">15340924</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>RepSeq--a database of amino acid repeats present in lower eukaryotic pathogens</p>
            </title>
            <aug>
               <au>
                  <snm>Depledge</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Lower</snm>
                  <fnm>RP</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>DF</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>122</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1854910</pubid>
                  <pubid idtype="pmpid" link="fulltext">17428323</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-8-122</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>RepSeq</p>
            </title>
            <url>http://www.repseq.gugbe.com</url>
         </bibl>
         <bibl id="B17">
            <title>
               <p>REP</p>
            </title>
            <url>http://www.embl-heidelberg.de/~andrade/papers/rep/search.html</url>
         </bibl>
         <bibl id="B18">
            <title>
               <p>REPPER--repeats and their periodicities in fibrous proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Gruber</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Soding</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lupas</snm>
                  <fnm>AN</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>W239</fpage>
            <lpage>43</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1160166</pubid>
                  <pubid idtype="pmpid" link="fulltext">15980460</pubid>
                  <pubid idtype="doi">10.1093/nar/gki405</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Repper</p>
            </title>
            <url>http://toolkit.tuebingen.mpg.de/repper</url>
         </bibl>
         <bibl id="B20">
            <title>
               <p>ProtRepeatsDB: a database of amino acid repeats in genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Kalita</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Ramasamy</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Duraisamy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chauhan</snm>
                  <fnm>VS</fnm>
               </au>
               <au>
                  <snm>Gupta</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>336</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1538635</pubid>
                  <pubid idtype="pmpid" link="fulltext">16827924</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-7-336</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>ProtRepeatsDB</p>
            </title>
            <url>http://bioinfo.icgeb.res.in/repeats</url>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Parasite defense mechanisms for evasion of host attack; a review</p>
            </title>
            <aug>
               <au>
                  <snm>Leid</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Suquet</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Tanigoshi</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Vet Parasitol</source>
            <pubdate>1987</pubdate>
            <volume>25</volume>
            <fpage>147</fpage>
            <lpage>162</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0304-4017(87)90101-4</pubid>
                  <pubid idtype="pmpid">3307120</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Leucine-rich repeats in host-pathogen interactions</p>
            </title>
            <aug>
               <au>
                  <snm>Kedzierski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Montgomery</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Curtis</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Handman</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Arch Immunol Ther Exp (Warsz)</source>
            <pubdate>2004</pubdate>
            <volume>52</volume>
            <fpage>104</fpage>
            <lpage>112</lpage>
            <xrefbib>
               <pubid idtype="pmpid">15179324</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Expression of a polypeptide containing a dipeptide repeat is confined to the insect stage of Trypanosoma brucei</p>
            </title>
            <aug>
               <au>
                  <snm>Roditi</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Carrington</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Turner</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1987</pubdate>
            <volume>325</volume>
            <fpage>272</fpage>
            <lpage>274</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/325272a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">3808022</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Multiple procyclin isoforms are expressed differentially during the development of insect forms of Trypanosoma brucei</p>
            </title>
            <aug>
               <au>
                  <snm>Vassella</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Acosta-Serrano</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Studer</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Englund</snm>
                  <fnm>PT</fnm>
               </au>
               <au>
                  <snm>Roditi</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>312</volume>
            <fpage>597</fpage>
            <lpage>607</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.5004</pubid>
                  <pubid idtype="pmpid" link="fulltext">11575917</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>DNA cloning of Plasmodium falciparum circumsporozoite gene: amino acid sequence of repetitive epitope</p>
            </title>
            <aug>
               <au>
                  <snm>Enea</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Ellis</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zavala</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Arnot</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Asavanich</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Masuda</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Quakyi</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Nussenzweig</snm>
                  <fnm>RS</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1984</pubdate>
            <volume>225</volume>
            <fpage>628</fpage>
            <lpage>630</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.6204384</pubid>
                  <pubid idtype="pmpid" link="fulltext">6204384</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Virulent combinations of adhesin and toxin genes in natural populations of Staphylococcus aureus</p>
            </title>
            <aug>
               <au>
                  <snm>Peacock</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>CE</fnm>
               </au>
               <au>
                  <snm>Justice</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kantzanou</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Story</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Mackie</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>O'Neill</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>NP</fnm>
               </au>
            </aug>
            <source>Infect Immun</source>
            <pubdate>2002</pubdate>
            <volume>70</volume>
            <fpage>4987</fpage>
            <lpage>4996</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">128268</pubid>
                  <pubid idtype="pmpid" link="fulltext">12183545</pubid>
                  <pubid idtype="doi">10.1128/IAI.70.9.4987-4996.2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Diagnosis of malaria by detection of Plasmodium falciparum HRP-2 antigen with a rapid dipstick antigen-capture assay</p>
            </title>
            <aug>
               <au>
                  <snm>Beadle</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>GW</fnm>
               </au>
               <au>
                  <snm>Weiss</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>McElroy</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Maret</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Oloo</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Hoffman</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Lancet</source>
            <pubdate>1994</pubdate>
            <volume>343</volume>
            <fpage>564</fpage>
            <lpage>568</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0140-6736(94)91520-2</pubid>
                  <pubid idtype="pmpid">7906328</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>The vaccine is dead--long live the vaccine</p>
            </title>
            <aug>
               <au>
                  <snm>Snounou</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Renia</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Trends Parasitol</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <fpage>129</fpage>
            <lpage>132</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.pt.2007.02.001</pubid>
                  <pubid idtype="pmpid" link="fulltext">17300988</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>MAAP: Malarial adhesins and adhesin-like proteins predictor</p>
            </title>
            <aug>
               <au>
                  <snm>Ansari</snm>
                  <fnm>FA</fnm>
               </au>
               <au>
                  <snm>Kumar</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Bala Subramanyam</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gnanamani</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ramachandran</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2007</pubdate>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17879344</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>The surface protein Srr-1 of Streptococcus agalactiae binds human keratin 4 and promotes adherence to epithelial HEp-2 cells</p>
            </title>
            <aug>
               <au>
                  <snm>Samen</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Eikmanns</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Reinscheid</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Borges</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Infect Immun</source>
            <pubdate>2007</pubdate>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17709412</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Enterococcal leucine-rich repeat-containing protein involved in virulence and host inflammatory response</p>
            </title>
            <aug>
               <au>
                  <snm>Brinster</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Posteraro</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bierne</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Alberti</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Makhzami</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sanguinetti</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Serror</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Infect Immun</source>
            <pubdate>2007</pubdate>
            <volume>75</volume>
            <fpage>4463</fpage>
            <lpage>4471</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/IAI.00279-07</pubid>
                  <pubid idtype="pmpid" link="fulltext">17620355</pubid>
                  <pubid idtype="pmcid">1951196</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>EtMIC4: a microneme protein from Eimeria tenella that contains tandem arrays of epidermal growth factor-like repeats and thrombospondin type-I repeats</p>
            </title>
            <aug>
               <au>
                  <snm>Tomley</snm>
                  <fnm>FM</fnm>
               </au>
               <au>
                  <snm>Billington</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Bumstead</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Monaghan</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Int J Parasitol</source>
            <pubdate>2001</pubdate>
            <volume>31</volume>
            <fpage>1303</fpage>
            <lpage>1310</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0020-7519(01)00255-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">11566298</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Adhesion of outer membrane proteins containing tandem repeats of Anaplasma and Ehrlichia species (Rickettsiales: Anaplasmataceae) to tick cells</p>
            </title>
            <aug>
               <au>
                  <snm>de la Fuente</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Garcia-Garcia</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Barbet</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Blouin</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Kocan</snm>
                  <fnm>KM</fnm>
               </au>
            </aug>
            <source>Vet Microbiol</source>
            <pubdate>2004</pubdate>
            <volume>98</volume>
            <fpage>313</fpage>
            <lpage>322</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.vetmic.2003.11.001</pubid>
                  <pubid idtype="pmpid" link="fulltext">15036540</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>The formation of Escherichia coli curli amyloid fibrils is mediated by prion-like peptide repeats</p>
            </title>
            <aug>
               <au>
                  <snm>Cherny</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Rockah</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Levy-Nissenbaum</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Gophna</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Ron</snm>
                  <fnm>EZ</fnm>
               </au>
               <au>
                  <snm>Gazit</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2005</pubdate>
            <volume>352</volume>
            <fpage>245</fpage>
            <lpage>252</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16083908</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Inclusion-exclusion principle</p>
            </title>
            <url>http://en.wikipedia.org/wiki/Inclusion-exclusion_principle</url>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Reptile</p>
            </title>
            <url>http://reptile.unibe.ch</url>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi</p>
            </title>
            <aug>
               <au>
                  <snm>Katinka</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Duprat</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Cornillot</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Metenier</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Thomarat</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Prensier</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Barbe</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Peyretaillade</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Brottier</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Wincker</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Delbac</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>El Alaoui</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Peyret</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Saurin</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Gouy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Weissenbach</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Vivares</snm>
                  <fnm>CP</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>414</volume>
            <fpage>450</fpage>
            <lpage>453</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35106579</pubid>
                  <pubid idtype="pmpid" link="fulltext">11719806</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>The gene product of the Plasmodium falciparum 11.1 locus is a protein larger than one megadalton</p>
            </title>
            <aug>
               <au>
                  <snm>Petersen</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Leech</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jensen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wollish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Scherf</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Mol Biochem Parasitol</source>
            <pubdate>1990</pubdate>
            <volume>42</volume>
            <fpage>189</fpage>
            <lpage>195</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0166-6851(90)90161-E</pubid>
                  <pubid idtype="pmpid">2270101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Proteophosphoglycans of Leishmania</p>
            </title>
            <aug>
               <au>
                  <snm>Ilg</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Parasitol Today</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>489</fpage>
            <lpage>497</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0169-4758(00)01791-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">11063860</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>The PGRS domain of Mycobacterium tuberculosis PE_PGRS Rv1759c antigen is an efficient subunit vaccine to prevent reactivation in a murine model of chronic tuberculosis</p>
            </title>
            <aug>
               <au>
                  <snm>Campuzano</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Aguilar</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Arriaga</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Leon</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Salas-Rangel</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Gonzalez-y-Merchand</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hernandez-Pando</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Espitia</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Vaccine</source>
            <pubdate>2007</pubdate>
            <volume>25</volume>
            <fpage>3722</fpage>
            <lpage>3729</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.vaccine.2006.12.042</pubid>
                  <pubid idtype="pmpid" link="fulltext">17399860</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>A combined transmembrane topology and signal peptide prediction method</p>
            </title>
            <aug>
               <au>
                  <snm>Kall</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>338</volume>
            <fpage>1027</fpage>
            <lpage>1036</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2004.03.016</pubid>
                  <pubid idtype="pmpid" link="fulltext">15111065</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Identification of GPI anchor attachment signals by a Kohonen self-organizing map</p>
            </title>
            <aug>
               <au>
                  <snm>Fankhauser</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Maser</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>1846</fpage>
            <lpage>1852</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti299</pubid>
                  <pubid idtype="pmpid" link="fulltext">15691858</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Dora</p>
            </title>
            <url>http://genomics.unibe.ch/dora</url>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Neglected tests for neglected patients</p>
            </title>
            <aug>
               <au>
                  <snm>Usdin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Guillerm</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chirac</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>441</volume>
            <fpage>283</fpage>
            <lpage>284</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/441283a</pubid>
                  <pubid idtype="pmpid" link="fulltext">16710396</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>FIND diagnostics</p>
            </title>
            <url>http://www.finddiagnostics.org</url>
         </bibl>
         <bibl id="B47">
            <title>
               <p>The Integr8 project--a resource for genomic and proteomic data</p>
            </title>
            <aug>
               <au>
                  <snm>Pruess</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kersey</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>In Silico Biol</source>
            <pubdate>2005</pubdate>
            <volume>5</volume>
            <fpage>179</fpage>
            <lpage>185</lpage>
            <url>ftp://ftp.ebi.ac.uk/pub/databases/integr8/</url>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15972013</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Mann Whitney test</p>
            </title>
            <url>http://en.wikipedia.org/wiki/Mann-Whitney_U</url>
         </bibl>
         <bibl id="B49">
