|
April 2006
|
![]() |
|
Genome-scale Profiling of Gene Expression in Hepatocellular Carcinoma: Classification and Survival Prediction
While gene expression profiling technology has previously been applied to some specific aspects of HCC, we investigated the possibility that variations in gene expression of HCC at diagnosis would permit the identification of distinct subclasses of HCC patients with different prognoses. We applied three independent but complementary approaches for data analysis to uncover subclasses of HCC and the underlying biological differences between them. First, unsupervised classification methods based solely on gene expression patterns were applied. Hierarchical clustering of the data as well as multidimensional scaling (MDS) plot revealed two subclasses of HCC strongly associated with the length of patients’ survival (Figure 1). Second, we applied five independent prediction algorithms to determine whether gene expression patterns could be used to predict survival. HCC patients were randomly divided into two equal groups: a training set (n = 45) that was used to develop the HCC classifiers, and a validation set (n = 44) that was used to evaluate the test. Briefly, we started to identify the most differentially expressed genes between two clusters in the training set. The number of genes in the classifiers was optimized to minimize misclassification errors during the leave-one-out cross-validation of the tumors in the training set. When applied to the validation set, all five models successfully separated patients with poor survival (cluster/subclass A) from patients with longer survival (cluster/subclass B). These results demonstrated not only a strong association between gene expression patterns and the survival of patients but also a robust reproducibility of these gene expression-based predictors. Third, a univariate Cox regression model was used to identify individual genes whose expression is highly correlated with the length of survival. Application of survival-associated genes for subclass prediction was highly accurate, as illustrated by the fact that averaged gene expression indices from the selected 406 “survival genes” were sufficient to segregate the two subclasses even without the use of sophisticated prediction models. Figure 1. Classification of hepatocellular carcinoma (HCC) based on genome-wide survey of gene expression. A) Hierarchical clustering of 91 HCC tumors. Genes with an expression ratio that had at least a 2-fold difference relative to reference in at least 9 tissues were selected for hierarchical analysis (4,187 gene features). B) Multidimensional scaling (MDS) plot of HCC tissues using 4,187 genes. MDS plotting was based on a matrix of Pearson correlation coefficients from the complete pair-wise comparison of all experiments. The MDS plot displays the position of each HCC tissue in three-dimensional Euclidean space with the distance between HCC tissues reflecting their approximate degree of correlation. Red and green balls represent HCC tissues in cluster A and cluster B, respectively. C) Kaplan-Meier plot of overall survival of HCC patients grouped on the basis of gene expression profiling. Information obtained from knowledge-based annotation of the 406 survival genes provided insight into the underlying biological differences between the two subclasses of HCC. Out of several biological groups of the survival genes, the cell proliferation group was the best predictor of an unfavorable outcome of the disease. Expression of typical cell proliferation markers like PNCA, and cell cycle regulators such as CDK4, CCNB1, CCNA2, and CKS2 was greater in subclass A than in subclass B. Not surprisingly, many genes that are expressed more in subclass A are anti-apoptotic. Interestingly, higher expression of genes involved in ubiquitination and sumoylation was observed in subclass A. The ubiquitin system is often deregulated in cancers. In HCC, the degree of ubiquitination is highly correlated with cell proliferation and survival of patients and has also been proposed as a possible predictive marker for recurrence of human HCC. Also, enhanced activation of ubiquitin-dependent protein degradation may account for deregulation of cell cycle control and faster cell proliferation in the poor survival group (subclass A). This result is highly concordant with our recent study with mouse models (Lee JS et al. Comparative functional genomics to identify the best-fit mouse cancer models for studying human HCC. Nat Genet 36: 130611, 2004). We found that the ubiquitination index is much higher in mouse models that mimic poor human prognosis (subclass A). The severity of HCC and the lack of good diagnostic markers and treatment strategies have rendered the disease a major challenge. Systematic analysis of gene expression patterns provides an insight into the biology and pathogenesis of HCC. Our results indicate that HCC prognosis can be predicted from the gene expression profiles of the primary tumors. Since the microarray-based measurement of gene expression reflects the abundance of expressed mRNA and proteins in the HCC, a limited set of quantitative RT-PCR and/or immunohistochemical staining assays may be sufficient to predict the prognosis of patients at the time of diagnosis. |