Mathematical Biology Seminar

Mark Yandell
Department of Human Genetics, Eccles Institute of Human Genetics
University of Utah

"Using Annotated Genomes as resources for explorations of gene structure, function and evolution"

We have developed an open-source software library designed to facilitate the use of genome annotations as substrates for computation and experimentation. We call the library 'CGL' an acronym for Comparative Genomics Library, and pronounce it 'Seagull'. CGL provides an informatics infrastructure for a lab, department or research institute engaged in the large-scale analysis of genomes and their annotations. To date we have employed CGL in two major projects-one largely experimental in nature, and the other computational.

The goal of the experimental project has been to obtain an accurate estimate of the true number of protein-coding genes in D. melanogaster. Estimates of gene number have been one of the most hotly debated aspects of annotation in every genome sequenced to date. D. melanogaster is no exception. In order to resolve this issue we've collected over 12,000 additional gene-predictions from a variety of sources and used CGL to coordinate a large, PCR-based, prediction validation project. Our results demonstrate the utility of CGL as tool for genome management, and speak to the completeness and accuracy of a key genome's annotations.

The second project is comparative in its focus. We have used CGL to carry out a very large-scale study of the evolution of gene-structure using eleven different animal genomes and their associated annotations; perhaps the largest comparative genomics analysis to date. CGL makes it possible to compare annotations to one another in new ways, and thus opens new perspectives on genes and their evolution. Our results provide a glimpse of how gene structures have evolved and diversified over the last 500 million or so years of animal evolution.