Structural organization of the human type VII collagen gene (COL7A1), composed of more exons than any previously characterized gene

Genomics. 1994 May 1;21(1):169-79. doi: 10.1006/geno.1994.1239.

Abstract

The human type VII collagen (COL7A1) gene is the locus for mutations in at least some cases of dystrophic epidermolysis bullosa. Here we describe the entire intron/exon organization of COL7A1, which is shown to have 118 exons, more than any previously described gene. Despite this complexity, COL7A1 is compact. Consisting of 31,132 bp from transcription start site to polyadenylation site, it is only about three times the size of type VII collagen mRNA. Thus, COL7A1 introns are small. A 71-nucleotide COL7A1 intron is the smallest intron yet reported in a collagen gene, and only one COL7A1 intron is greater than 1 kb in length. All exons in the COL7A1 triple helix coding region that do not begin with sequences corresponding to imperfections of the triple helix begin with intact codons for Gly residues of Gly-X-Y repeats. This is reminiscent of the structure of fibrillar rather than other nonfibrillar collagen genes. In addition, the COL7A1 triple helix coding region contains many exons of recurring sizes (e.g., 25 exons are 36 bp, 12 exons are 45 bp, 8 exons are 63 bp), suggesting an evolutionary origin distinct from those of other nonfibrillar collagen genes. Sequences from the 5' portion of COL7A1 are presented along with the 3766-bp intergenic sequence, which separates COL7A1 from the upstream gene encoding the core I protein of the cytochrome bc1 complex. The COL7A1 promoter region is found to lack extensive homologies with promoter regions of other genes expressed primarily in skin.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Base Sequence
  • Cloning, Molecular
  • Collagen / genetics*
  • Epidermolysis Bullosa Dystrophica / genetics
  • Exons*
  • Genes
  • Humans
  • Molecular Sequence Data
  • RNA Splicing
  • Repetitive Sequences, Nucleic Acid

Substances

  • Collagen

Associated data

  • GENBANK/L23982