|
|
![]() |
|
Fusion Gene Transcripts in Expressed Sequence Tags Database
Chromosome translocations can be discovered by cytogenetic experiments, but it is difficult to tell if a fusion gene has been created by the translocation and, if so, to identify it. Here we describe a procedure for identifying fusion genes by an analysis of the expressed sequence tags (EST) database. ESTs are short (~500 bp) sequences of randomly selected cDNAs prepared from a variety of tissues. The current database holds more than 6 million human ESTs, about half of which are from cancer tissues or derived cancer cell lines. The ESTs from fusion genes in this database can be identified because they map to two different locations in the human genome. A complicating factor is that many such chimeric transcripts in the EST database are cloning artifacts generated during the cDNA library construction process. However, these can be separated from genuine fusion gene transcripts because the fusion point usually occurs in an exon for the former, whereas it usually occurs at an exon-exon boundary for the latter. We developed a semi-automatic procedure for systematic identification of fusion gene transcripts in the mRNA and EST databases based on these principles. Using this procedure, we could identify 118 mRNAs and 196 ESTs as fusion gene transcript sequences, from a total of 237 putative fusion genes. Among the mRNA sequences, 96 were previously annotated as fusion transcripts, including most of the BCR/ABL1 fusion transcript sequences. The procedure also identified 177 novel fusion gene candidates. We experimentally verified one of these, the IRA1/RGS17 fusion, which was supported by three independent EST clones (Figure 1). A reverse transcriptase (RT)-PCR experiment using an mRNA sample from the MCF7 breast cancer cell line yielded a clear band with the correct size. A fluorescence in situ hybridization (FISH) experiment using two BAC clones containing IRA1 and RGS17 genes, respectively, detected a derivative chromosome, most likely the previously identified t(3;6)(q26;q25)del(3)(p14). The 5´-UTR exon 1 of IRA1 on 3q26.32 is fused with the start codonbearing exon 2 of RGS17 on 6q25.2. The RGS17 protein is a member of the GTPase-activating proteins that act as regulators of G-protein signaling. Components in the G-proteincoupled receptor-signaling pathways, including RGS proteins, are known to be involved in many cancers and considered as potential therapeutic targets in cancer therapy. Figure 1. Prediction and verification of the IRA1/RGS17 fusion resulting from a chromosome translocation. A) Schematic representation of the IRA1/RGS17 fusion. Boxes represent the exons, and broken lines the introns. The fusion event is indicated by an arc. Arrows indicate the transcription start sites. Exons are numbered as they occur in the original genes. Primers for the reverse transcriptase (RT)PCR reaction are indicated (T530 and T531). ORFs (open reading frames) are marked with grey boxes. B) RT-PCR detection of the fusion transcripts in MCF7 cells. The fusion gene transcripts for the previously known BCAS4/BCAS3 and the predicted IRA1/RGS17 fusions were detected in the cells. The β actin (ACTB) was used as the positive control. The product sizes of ACTB, BCAS4/BCAS3, and IRA1/RGS17 are 600, 328, and 367 bp, respectively. C) Detection of the 3;6 translocation in MCF7 cells by a fluorescence in situ hybridization (FISH) experiment. A representative result is presented. The IRA1 gene (red) and the RGS17 gene (green) are on the chromosomes 3 and 6, respectively. Besides two copies each of chromosomes 3 and 6, a 3;6 translocation was detected (white arrow). We expect to collect more fusion gene candidates in the future as the EST database continues to expand. A large collection of cancer-related gene fusions, attained through a combination of computational prediction and experimental verification, should present a new opportunity to uncover novel molecular mechanisms of carcinogenesis. |