Hepatocellular carcinoma (HCC) is a primary liver malignancy that mainly occurs in patients with chronic liver disease and cirrhosis. Risk factors for HCC include hepatitis B virus (HBV) infection. However, the specific role of HBV infection in HCC development is not yet completely understood. In order to reveal the effects of HBV on HCC, we compare the genes of HCC patients infected with HBV with those who are not infected.
We encoded the genes of these two types of HCC in databases using enrichment scores of Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway terms. A random forest algorithm was employed in order to distinguish these two types in the classifier, and a series of feature selection approaches was used in order to select their optimal features. Novel HBV-associated and -non-associated HCC genes were predicted, respectively, based on their optimal features in the classifier. A shortest-path algorithm was also employed in order to find all of the shortest-paths genes connecting the known related genes.
A total of 54 different features between HBV-associated and -non-associated HCC genes were identified. In total, 1236 and 881 novel related genes were predicted for HBV-associated and -non-associated HCC, respectively. By integrating the predicted genes and shortest path genes in their gene interaction network, we identified 679 common genes involved in the two types of HCC.
We identified the significantly different genetic features between two types of HCC. We also predicted related genes for the two types based on their specific features. Finally, we determined the common genes and features that were involved in both of these two types of HCC.