Novel protein domains and repeats in Drosophila melanogaster: insights into structure, function, and evolution

Genome Res. 2001 Dec;11(12):1996-2008. doi: 10.1101/gr.198701.

Abstract

Sequence database searching methods such as BLAST, are invaluable for predicting molecular function on the basis of sequence similarities among single regions of proteins. Searches of whole databases however, are not optimized to detect multiple homologous regions within a single polypeptide. Here we have used the prospero algorithm to perform self-comparisons of all predicted Drosophila melanogaster gene products. Predicted repeats, and their homologs from all species, were analyzed further to detect hitherto unappreciated evolutionary relationships. Results included the identification of novel tandem repeats in the human X-linked retinitis pigmentosa type-2 gene product, repeated segments in cystinosin, associated with a defect in cystine transport, and 'nested' homologous domains in dysferlin, whose gene is mutated in limb girdle muscular dystrophy. Novel signaling domain families were found that may regulate the microtubule-based cytoskeleton and ubiquitin-mediated proteolysis, respectively. Two families of glycosyl hydrolases were shown to contain internal repetitions that hint at their evolution via a piecemeal, modular approach. In addition, three examples of fruit fly genes were detected with tandem exons that appear to have arisen via internal duplication. These findings demonstrate how completely sequenced genomes can be exploited to further understand the relationships between molecular structure, function, and evolution.

MeSH terms

  • Amino Acid Sequence / genetics
  • Amino Acid Transport Systems, Neutral
  • Animals
  • Antigens, Differentiation, B-Lymphocyte / chemistry
  • Antigens, Differentiation, B-Lymphocyte / genetics
  • Antigens, Differentiation, B-Lymphocyte / physiology
  • Aspartate-tRNA Ligase / chemistry
  • Aspartate-tRNA Ligase / genetics
  • Aspartate-tRNA Ligase / physiology
  • Cystinosis / genetics
  • Drosophila Proteins / chemistry*
  • Drosophila Proteins / genetics
  • Drosophila Proteins / physiology*
  • Drosophila melanogaster / chemistry*
  • Drosophila melanogaster / enzymology
  • Drosophila melanogaster / genetics
  • Evolution, Molecular*
  • Exons / genetics
  • Eye Proteins*
  • GTP-Binding Proteins
  • Gene Duplication
  • Glycoproteins*
  • Glycoside Hydrolases / chemistry
  • Glycoside Hydrolases / genetics
  • Glycoside Hydrolases / physiology
  • Histocompatibility Antigens Class II / chemistry
  • Histocompatibility Antigens Class II / genetics
  • Histocompatibility Antigens Class II / physiology
  • Humans
  • Insect Proteins / chemistry
  • Insect Proteins / genetics
  • Insect Proteins / physiology
  • Intracellular Signaling Peptides and Proteins
  • Membrane Proteins / chemistry
  • Membrane Proteins / genetics
  • Membrane Proteins / physiology
  • Membrane Transport Proteins
  • Molecular Sequence Data
  • Muscular Dystrophies / genetics
  • Protein Structure, Secondary
  • Protein Structure, Tertiary
  • Proteins / chemistry
  • Proteins / genetics
  • Proteins / physiology
  • Repetitive Sequences, Amino Acid*
  • Retinitis Pigmentosa / genetics
  • Signal Transduction / genetics
  • Species Specificity
  • Tandem Repeat Sequences

Substances

  • Amino Acid Transport Systems, Neutral
  • Antigens, Differentiation, B-Lymphocyte
  • CTNS protein, human
  • Drosophila Proteins
  • Eye Proteins
  • Glycoproteins
  • Histocompatibility Antigens Class II
  • Insect Proteins
  • Intracellular Signaling Peptides and Proteins
  • Membrane Proteins
  • Membrane Transport Proteins
  • Proteins
  • RP2 protein, human
  • invariant chain
  • tim protein, Drosophila
  • Glycoside Hydrolases
  • GTP-Binding Proteins
  • Aspartate-tRNA Ligase