April 2006
Volume 5

Center for Cancer Research: Frontiers in Science
   

Cell Biology/Genomics

Genome-scale Profiling of Gene Expression in Hepatocellular Carcinoma: Classification and Survival Prediction

Lee J-S, Chu I-S, Heo J, Calvisi DF, Sun Z, Roskams T, Durnez A, Demetris AJ, and Thorgeirsson SS. Classification and prediction of survival in hepatocellular carcinoma by gene expression profiling. Hepatology 40: 667–76, 2004.

Much is known about the sequential cellular changes that precede the formation of hepatocellular carcinoma (HCC) and the etiologic agents (i.e., hepatitis B virus [HBV] and hepatitis C virus [HCV] infection and alcohol) responsible for the majority of HCC cases. Nevertheless, the molecular pathogenesis of HCC is not well understood. Although much progress has been made by using clinical information and pathological classification to provide information at diagnosis on survival and treatment options, many issues still remain unresolved. For example, a staging system that reliably separates patients with early and intermediate-to-advanced HCC into homogeneous groups with respect to prognosis does not exist. This is important because the natural course of early HCC is unknown and the progression of intermediate and advanced HCC are quite heterogeneous. Thus, improving the classification of HCC patients would at minimum improve the application of currently available treatment modalities and at best provide new treatment strategies.

While gene expression profiling technology has previously been applied to some specific aspects of HCC, we investigated the possibility that variations in gene expression of HCC at diagnosis would permit the identification of distinct subclasses of HCC patients with different prognoses. We applied three independent but complementary approaches for data analysis to uncover subclasses of HCC and the underlying biological differences between them. First, unsupervised classification methods based solely on gene expression patterns were applied. Hierarchical clustering of the data as well as multidimensional scaling (MDS) plot revealed two subclasses of HCC strongly associated with the length of patients’ survival (Figure 1). Second, we applied five independent prediction algorithms to determine whether gene expression patterns could be used to predict survival. HCC patients were randomly divided into two equal groups: a training set (n = 45) that was used to develop the HCC classifiers, and a validation set (n = 44) that was used to evaluate the test. Briefly, we started to identify the most differentially expressed genes between two clusters in the training set. The number of genes in the classifiers was optimized to minimize misclassification errors during the leave-one-out cross-validation of the tumors in the training set. When applied to the validation set, all five models successfully separated patients with poor survival (cluster/subclass A) from patients with longer survival (cluster/subclass B). These results demonstrated not only a strong association between gene expression patterns and the survival of patients but also a robust reproducibility of these gene expression-based predictors. Third, a univariate Cox regression model was used to identify individual genes whose expression is highly correlated with the length of survival. Application of survival-associated genes for subclass prediction was highly accurate, as illustrated by the fact that averaged gene expression indices from the selected 406 “survival genes” were sufficient to segregate the two subclasses even without the use of sophisticated prediction models.

Click to view full-size image.

Figure 1. Classification of hepatocellular carcinoma (HCC) based on genome-wide survey of gene expression. A) Hierarchical clustering of 91 HCC tumors. Genes with an expression ratio that had at least a 2-fold difference relative to reference in at least 9 tissues were selected for hierarchical analysis (4,187 gene features). B) Multidimensional scaling (MDS) plot of HCC tissues using 4,187 genes. MDS plotting was based on a matrix of Pearson correlation coefficients from the complete pair-wise comparison of all experiments. The MDS plot displays the position of each HCC tissue in three-dimensional Euclidean space with the distance between HCC tissues reflecting their approximate degree of correlation. Red and green balls represent HCC tissues in cluster A and cluster B, respectively. C) Kaplan-Meier plot of overall survival of HCC patients grouped on the basis of gene expression profiling.

Information obtained from knowledge-based annotation of the 406 survival genes provided insight into the underlying biological differences between the two subclasses of HCC. Out of several biological groups of the survival genes, the cell proliferation group was the best predictor of an unfavorable outcome of the disease. Expression of typical cell proliferation markers like PNCA, and cell cycle regulators such as CDK4, CCNB1, CCNA2, and CKS2 was greater in subclass A than in subclass B. Not surprisingly, many genes that are expressed more in subclass A are anti-apoptotic. Interestingly, higher expression of genes involved in ubiquitination and sumoylation was observed in subclass A. The ubiquitin system is often deregulated in cancers. In HCC, the degree of ubiquitination is highly correlated with cell proliferation and survival of patients and has also been proposed as a possible predictive marker for recurrence of human HCC. Also, enhanced activation of ubiquitin-dependent protein degradation may account for deregulation of cell cycle control and faster cell proliferation in the poor survival group (subclass A). This result is highly concordant with our recent study with mouse models (Lee JS et al. Comparative functional genomics to identify the best-fit mouse cancer models for studying human HCC. Nat Genet 36: 1306–11, 2004). We found that the ubiquitination index is much higher in mouse models that mimic poor human prognosis (subclass A).

The severity of HCC and the lack of good diagnostic markers and treatment strategies have rendered the disease a major challenge. Systematic analysis of gene expression patterns provides an insight into the biology and pathogenesis of HCC. Our results indicate that HCC prognosis can be predicted from the gene expression profiles of the primary tumors. Since the microarray-based measurement of gene expression reflects the abundance of expressed mRNA and proteins in the HCC, a limited set of quantitative RT-PCR and/or immunohistochemical staining assays may be sufficient to predict the prognosis of patients at the time of diagnosis.

Ju-Seog Lee, PhD
Research Fellow
leeju@mail.nih.gov

Snorri S. Thorgeirsson, MD, PhD
Principal Investigator
Laboratory of Experimental Carcinogenesis
NCI-Bethesda, Bldg. 37/Rm 4146
Tel: 301-496-5688
Fax: 301-496-0734
snorri_s_thorgeirsson@nih.gov