Share Email Print

Proceedings Paper

Mining biological databases for candidate disease genes
Author(s): Terry A. Braun; Todd Scheetz; Gregg Lewis Webster; Thomas L. Casavant
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

The publicly-funded effort to sequence the complete nucleotide sequence of the human genome, the Human Genome Project (HGP), has currently produced more than 93% of the 3 billion nucleotides of the human genome into a preliminary `draft' format. In addition, several valuable sources of information have been developed as direct and indirect results of the HGP. These include the sequencing of model organisms (rat, mouse, fly, and others), gene discovery projects (ESTs and full-length), and new technologies such as expression analysis and resources (micro-arrays or gene chips). These resources are invaluable for the researchers identifying the functional genes of the genome that transcribe and translate into the transcriptome and proteome, both of which potentially contain orders of magnitude more complexity than the genome itself. Preliminary analyses of this data identified approximately 30,000 - 40,000 human `genes.' However, the bulk of the effort still remains -- to identify the functional and structural elements contained within the transcriptome and proteome, and to associate function in the transcriptome and proteome to genes. A fortuitous consequence of the HGP is the existence of hundreds of databases containing biological information that may contain relevant data pertaining to the identification of disease-causing genes. The task of mining these databases for information on candidate genes is a commercial application of enormous potential. We are developing a system to acquire and mine data from specific databases to aid our efforts to identify disease genes. A high speed cluster of Linux of workstations is used to analyze sequence and perform distributed sequence alignments as part of our data mining and processing. This system has been used to mine GeneMap99 sequences within specific genomic intervals to identify potential candidate disease genes associated with Bardet-Biedle Syndrome (BBS).

Paper Details

Date Published: 27 July 2001
PDF: 12 pages
Proc. SPIE 4528, Commercial Applications for High-Performance Computing, (27 July 2001); doi: 10.1117/12.434869
Show Author Affiliations
Terry A. Braun, Univ. of Iowa (United States)
Todd Scheetz, Univ. of Iowa (United States)
Gregg Lewis Webster, Univ. of Iowa (United States)
Thomas L. Casavant, Univ. of Iowa (United States)

Published in SPIE Proceedings Vol. 4528:
Commercial Applications for High-Performance Computing
Howard Jay Siegel, Editor(s)

© SPIE. Terms of Use
Back to Top