|

The Genome Analysis Unit
he
recent, explosive growth of genomic and proteomic data has dramatically
changed the face of biomedical research. The formal scientific discipline
of bioinformatics has emerged to address the formidable challenges
associated with storing, analyzing, and integrating genetic and
other biological information through computer technology. While
opening many doors for new areas of investigation, bioinformatics
and its associated deluge of data also present many new challengeschallenges
that bench researchers are often ill equipped to face. Many bioinformatics
approaches are now necessary components of modern molecular biology
research, and just as changes in sequencing technology mobilized
the field of molecular biology to move more away from sequencing
a single gene to sequencing an entire genome, so bioinformatics
is now more frequently being used to address issues involving large
classes or families of genes rather than single genes or proteins.
At the CCR, the Genome Analysis Unit (GAU) was created to address
some of the issues resulting from the explosive growth in genomic
data and to provide a central resource to enhance the research productivity
of CCR scientists. The GAU serves this function through a variety
of avenues, such as collaborative projects, developing general-purpose
web tools, and presenting and organizing training seminars. This
article focuses on two of its projects: 1) the NCI Bioinformatics
Community Resource, which is a rating guide for using particular
web-based tools, and 2) a collaborative project in which bioinformatics
was used to help bench researchers locate a small non-coding regulatory
RNA.
The NCI Bioinformatics Community Resource
While the Internet has provided researchers with unprecedented
access to repositories of data, literature, analysis tools, and
other assorted research information, making effective use of these
tools is far from straightforward. In fact, “information overload”
is a major problem. Simply keeping track of all the resources can
be a full-time job. For example, “The Molecular Biology Database
Collection” (Nucleic Acids Res, 2005, vol. 33 [Database
issue]: D5D24) listed no fewer than 719 different databases
ranging from the well-known databases, such as GenBank (an annotated
collection of all publicly available nucleotide and protein sequences),
to lesser-known systems such as the Aptamer database (a collection
of small RNA or DNA molecules capable of binding ligands, ranging
from small organic compounds to whole organisms)! Not only is the
number of databases overwhelming, but making efficient and effective
use of them is made more difficult by the fact that different viewing
and analysis tools may exist at different sites. To address this
issue, the GAU has recently launched the NCI Bioinformatics Community
Resource (NBCR) (http://genome.nci.nih.gov/nbcr).
The NBCR is a repository (database) of links to an array of bioinformatics
resources useful in the analysis of DNA and protein sequence data.
Designed to be a community-managed resource, researchers are invited
to provide meaningful reviews of the listed sites and suggestions
for new sites. The goal is to construct a rating guide via peer
review of those resources that may prove valuable to the NCI community
and to provide direction on how best to navigate the ever-growing
sea of information associated with those resources. We expect that
the NBCR will be a uniquely valuable tool via a rating scheme that
reflects “real-world” utility.
Collaborative Research and Custom Tool Development
During the past year, the GAU has been involved in a number of
successful collaborative projects with several CCR scientists. The
focus of these projects has included custom oligo design for micoarray
chip production, development of simple web-based tools for identification
and extraction of promoter regions, development of gene annotation
and DNA codon modification tools, a genome-wide analysis of human
promoter regions, and the search for small regulatory RNA (sRNA)
candidates in both eukaryotic and prokaryotic organisms. These collaborations
have been quite successful in moving NCI science ahead. One such
collaborative project is detailed below.
Identification of Tandem Duplicate
Regulatory Small RNAs in Pseudomonas aeruginosa
Involved in Iron Homeostasis
Wilderman
PJ*, Sowa NA, FitzGerald DJ, FitzGerald
PC, Gottesman S, Ochsner UA*,
and Vasil ML*. Proc Natl Acad Sci U S A 101: 97927,
2004.
* University of Colorado Health Sciences Center,
Denver, CO
National Cancer Institute, National Institutes
of Health, Bethesda, MD
Small non-coding RNAs (sRNA) are located predominantly in the intergenic
(IG) regions of bacterial genomes. One of the challenges in understanding
their contribution to gene regulation has been simply locating them.
Previously, Massé and Gottesman 2002 (Proc Natl Acad Sci
U S A 99: 46205) demonstrated that the expression of a
Fur-regulated sRNA (RyhB) is responsible for the regulation of an
assortment of genes in Escherichia coli that are expressed
under iron-replete conditions. Sequence homologs of these sRNAs
were also identified in other Enterobacteriaceae (e.g., Salmonella,
Klebsiella, and Shigella). However, this sequence
homology did not extend to the genus Pseudomonas. Because
the vast majority of the sRNAs that have been described in E.
coli are encoded in IG regions and no microarray chips are available
that cover the IGs of Pseudomonas, a different approach was
needed. Thus, a RyhB functional homolog was sought by querying all
the IG regions, derived from the whole genome sequence (GenBank
id: NC_002516), of the PAO1 strain of P. aeruginosa for two
predicted properties of such a functional homolog: regulation by
a Fur box, and a rho-independent transcription terminator. This
analysis yielded only three candidates. Two of the candidates (now
termed PrrF1 and PrrF2) were located in tandem between the genes
PA4704 and phuW (PA4705). Microarray and expression
studies, as well as gene deletion experiments, demonstrated that
both members of this tandem pair are Fur- and iron-regulated, and
that they are functional, but not sequence, homologs of RyhB. Moreover,
while homology searches found two putative prrF sequence
homologs in P. putida, P. fluorescens, and P. syringae,
they are considerably distal to each other in these organisms in
contrast to their tandem location in P. aeruginosa. We conclude
from this study that this type of bioinformatics approach is likely
to be successful in finding other sRNAs regulated by any well-defined
regulatory protein in any sequenced organism that is known to use
rho-independent terminators.
In conclusion, the above-mentioned project is just one example
where, by bridging the gap between molecular biology and bioinformatics,
the GAU has collaborated with CCR scientists to produce a successful
outcome not as easily achieved by either partner alone.
NCI scientists wishing to contact the GAU can do so by sending
an email to pcf@helix.nih.gov
or by visiting our web site at http://genome.nci.nih.gov.
Peter FitzGerald, PhD
Head, Genome Analysis Unit
Office of Science and Technology Partnerships
Office of the Director
NCIBethesda, Bldg. 37/Rm. 2012
Tel: 301-402-3044
Fax: 301-402-3044
pcf@helix.nih.gov
back to top |