RESEARCH INTERESTS

Virology, computational biology and molecular evolution: genomic analysis of RNA viruses and cellular multiple gene families, development and analysis of software tools.

What is Bioinformatics?

Bioinformatics is the generation of new knowledge from existing data. This type of research includes the development and testing of software tools necessary to generate new knowledge from primary source information deposited in databases and the literature.

What is in silico Research?

All of our research is conducted in silico, i.e., with in the computer environment. There is no experimental laboratory component to our research. However, many of the results of our analyses are predictions that can be tested by laboratory experimentation.

What Does the McClure Laboratory do for a Living?

Given the pace at which biological sequence and structural information is deposited into international databases it is imperative that 21st century scientists train in both discovery and hypothesis-based research that fully utilizes these growing data resources. The McClure Laboratory has been engaged in the Bioinformatic analyses for over a decade and we are well qualified to provide state-of-the-art Bioinformatic training.

Our in silico research is dedicated to the Bioinformatic analyses of RNA-based life forms, i.e., all RNA viruses, and the development and testing of the necessary tools for these studies. Our research is an interplay between empirically derived biological data sources, bioinformatic tools, and human decision making in the creation of new knowledge about the evolution, structure, and function of RNA-based life forms.

There are two components to our research agenda. The first is analytical and concentrated in, but not limited to, two distinct areas of sequence information, that of all Retroid agents, (the genetic agents that replicate or transpose themselves via an RNA intermediate (e.g., HIV); and the viral order Mononegavirales, (e.g., Ebola, Measles, Rabies). The second is methodology-based, involving the testing and development of software tools used to analyze the data.

Analytical Studies

All levels and ways of analyzing and thinking about sequences are used to derive the maximum information regarding evolution, structure, and function of genomes, genes and gene products that are critical for the lifecycles of both RNA-based life forms and their hosts. Much of our initial work revolved around the identification of protein functions in the newly sequenced genomes of Retroid agents. Using a combination of database-search and multiple alignment algorithms we successfully described the residues involved in catalysis for the reverse transcriptase, ribonuclease H, aspartic acid protease and integrase proteins, which were later validated by crystallographic analysis.

More recently we have studied the evolution of an auxiliary function found in some retroviruses, and other viruses found thought the three domains of life. These studies address issues pertaining to genome and gene architecture, as well as the horizontal transfer of genes from hosts to pathogens (see Baldo and McClure, 1999).

Some of our on going work includes the construction of hidden Markov Models (see technical section) reprsenting the sequence information of all protiens encoded by the Retroid agents. Upon completion these models will provide a new approach to the study of highly divergent seqeunce information.

Why Are Retroid Agents of Interest?

The Retroid agents provide the largest model system of related genomes available to study and analyze the evolution of genes and genomes. They are ubiquitous in Eukaryotes, comprising 50-90% of the genomic information in some cases. The relationship between Retroid and host genomes is complex. In some cases, Retroid agents are involved in pathogenicity, while in others they have co-evolved to become integral to their hosts' survival. Given the ubiquity of these agents in Eukaryotes, studies in comparative genomes will necessarily include the analysis of the relationships among and between Retroid and host genomes.

A subset of our collection of sequences from the viral order Mononegavirales, was recently used to address the evolution of single genes that encode multiple functions via RNA-editing and frame-shifting found among various lineages of the rhabdo- and paramyxo families (see Jordan, Sutter and McClure).

We are also constructing HMMs representing the three proteins of the replication/transcription complex of the order Mononegavirales. Modeling such highly divergent protein data representing the proteins of the replication/transcription complex will aid in further understanding the details of the lifecycles of these medically important viruses.

Methodology studies

The second component of our research agenda focuses on the comparison and development of software tools for the identification and analysis of biologically informative patterns. Many existing software tools still require human pattern recognition skills for identification and refinement of patterns found among highly divergent sequences. Our early work established bench-mark sequence data to evaluate the ability of multiple alignment algorithms in identifying the ordered-series-of-motifs that convey functional and structural integrity in a given protein class (McClure, Vasi, and Fitch, 1994).

Most recently we have used these data to evaluate the efficiency of motif identification from a variety of new methods (Hudak and McClure, 1999). We also participated in proof-of-concept studies using a hidden Markov approach to the multiple protein sequence problem, (Baldi, Chavin, Hunkapillar and McClure, 1994). In the last few years we have developed a better strategy of HMM construction for the representation of highly divergent protein sequence information (see McClure, Kowalski, and Hudak, 1998, and McClure and Kowalski, 1999). Our efforts to develop and test meaningful approaches to sequence analyses are on ongoing part of our research agenda.


Available data sets: glob12.fasta, kin12.fasta, pro12.fasta, rh12.fasta.

COMPUTATIONAL BIOLOGY CONFERENCES