The Second Affiliated Hospital of Nanchang University

Online prediction tool for hepatocellular carcinoma in Hepatitis B related Cirrhosis patients with low-level AFP

Hepatocellular carcinoma (hcc) is the main type of primary liver cancer, which has a high degree of malignancy and poor prognosis. The number of liver cancer patients in China ranks first in the world, most of which are middle-aged men. Hepatitis B virus (HBV) infection is the main cause, and many liver cancer patients are in the middle and late stages when diagnosed [1]. Although there are more and more therapeutic methods for liver cancer, the therapeutic effect of middle and late stage liver cancer is still not ideal, but the 5-year survival rate of liver cancer can reach 60%~70% by early hepatectomy [2]. Therefore, early diagnosis and timely treatment are the key to improve the prognosis and survival rate of patients with liver cancer.

The detection of serum tumor markers and liver imaging is the main means of clinical diagnosis of liver cancer. The detection of serum tumor markers has the characteristics of non-invasive, simple, objective and repeatable. It is often used in the screening and diagnosis of liver cancer and the monitoring and management of liver cancer. Alpha fetoprotein (AFP) is the most widely used serum marker of hepatocellular carcinoma (HCC). At present, the common screening method for high-risk groups of liver cancer is to conduct serum AFP detection and abdominal ultrasound every six months. However, AFP has a high false negative rate and poor sensitivity in the diagnosis of early liver cancer or liver cancer with small tumor size. Combined detection of multiple serum tumor markers can improve the accuracy of HCC diagnosis and reduce the rate of missed diagnosis. At present, there are many tumor markers for liver cancer, and markers are constantly found and applied [4]. However, in different regions and hospitals, there is no unified standard for joint testing projects. It is still unclear which joint testing can not only give consideration to the economic interests of patients, but also improve the diagnostic effect.

Parameter explanation:

  1. Gender: Gender (male, female)
  2. Hb: Hemoglobin (unit: g/L)
  3. Fbg: Fibrinogen (unit: g/L)
  4. AFP: Alpha fetoprotein (unit: ng/ml)
  5. CEA: Carcinoembryonic antigen (unit: ng/ml)
  6. GPT: Glutamic pyruvic transaminase (unit: u/l)
  7. ALP: Alkaline phosphatase (unit: u/l)
  8. DB: Direct bilirubin (unit: umol/l)
  9. K: Potassium (unit: umol/l)
  10. Ca: Calcium (unit: umol/l)
  11. Age: Age (unit: years)
  12. HBP: Hypertension (with or without)
1、 Model introduction:

XGboost (extreme Gradient Boosting) is an improved algorithm based on the GDBT (Gradient Boosting Decision Tree) algorithm. Traditional GDBT Model uses only the first derivative in the optimization, but XGBoost preforms the second-order Taylor expansion of the cost function and adds a regularization item into the cost function for better performance.As an integrated learning algorithm, it combines the predictions from an ensemble of weak regression trees, which are added sequentially to the model in order to maximize predictive performance and minimize model complexity. At the same time, XGboost adds a complexity control model and learns from random forests to reduce the calculation, making the model not easy to over-fitting. With above characteristics, XGboost has been draw more attention for prediction model constructing and risk identification in medical field .

2、 Training sample:

Sample size: 5611 cases in the training set were subject to 50 fold cross validation, and 1396 cases in the test set.

3、 Model parameters:

objective: binary:logistic

learning_rate: 0.1

max_depth: 3

min_child_weight: 2

reg_lambda: 1