Machine Learning Approaches for Binary Classification to Discover Liver Diseases using Clinical Data

Published 25 Apr 2021 in stat.ML, cs.LG, and stat.AP | (2104.12055v2)

Abstract: For a medical diagnosis, health professionals use different kinds of pathological ways to make a decision for medical reports in terms of patients medical condition. In the modern era, because of the advantage of computers and technologies, one can collect data and visualize many hidden outcomes from them. Statistical machine learning algorithms based on specific problems can assist one to make decisions. Machine learning data driven algorithms can be used to validate existing methods and help researchers to suggest potential new decisions. In this paper, multiple imputation by chained equations was applied to deal with missing data, and Principal Component Analysis to reduce the dimensionality. To reveal significant findings, data visualizations were implemented. We presented and compared many binary classifier machine learning algorithms (Artificial Neural Network, Random Forest, Support Vector Machine) which were used to classify blood donors and non-blood donors with hepatitis, fibrosis and cirrhosis diseases. From the data published in UCI-MLR [1], all mentioned techniques were applied to find one better method to classify blood donors and non-blood donors (hepatitis, fibrosis, and cirrhosis) that can help health professionals in a laboratory to make better decisions. Our proposed ML-method showed better accuracy score (e.g. 98.23% for SVM). Thus, it improved the quality of classification.

Abstract PDF Upgrade to Chat

Citations (2)

View on Semantic Scholar

Summary

The paper demonstrates that SVM achieved 98.23% accuracy, outperforming ANN and RF in liver disease detection.
It employs MICE for robust missing data handling and PCA for effective dimensionality reduction of clinical features.
The study underscores the potential of ML models to enhance clinical diagnostic precision and operational efficiency.

The paper "Machine Learning Approaches for Binary Classification to Discover Liver Diseases using Clinical Data" presents an empirical study on the effectiveness of various ML algorithms for the binary classification of liver disease from clinical data. The work is centered primarily on leveraging computational techniques to improve the diagnostic process for liver-related conditions.

Methodology:

Data Acquisition and Preprocessing:
- The dataset utilized is sourced from the University of California Irvine Machine Learning Repository (UCI-MLR) and involves clinical data from blood donors and non-donors with hepatitis, fibrosis, and cirrhosis.
- The dataset comprises 615 observations and 14 attributes, including biochemical markers like Albumin (ALB), Alkaline Phosphatase (ALP), Bilirubin (BIL), and Gamma Glutamyl-Transferase (GGT).
Handling Missing Data:
- The study employs Multiple Imputation by Chained Equations (MICE) to address the prevalent issue of missing data. MICE is implemented using a predictive mean matching technique which simulates missing values through chained equations, enhancing data integrity by assuming Missing At Random (MAR).
Dimensionality Reduction:
- Principal Component Analysis (PCA) is utilized for dimensionality reduction, allowing the reduction of feature space while retaining significant variance to streamline computational processing.
Classification Techniques:
- The paper evaluates the performance of different ML classifiers, specifically the Support Vector Machine (SVM), Artificial Neural Network (ANN), and Random Forests (RF), to distinguish between blood donors and non-donors suffering from liver ailments.
- Key classifiers function as follows:
  - SVM utilizes hyperplane separation in a high-dimensional feature space.
  - ANN mimics human neuron functionality for pattern recognition.
  - RF constructs an ensemble of de-correlated decision trees to tackle classification through majority voting.

Results and Evaluation:

The study records an SVM accuracy score of 98.23%, indicating the superiority of SVM over other methods in terms of sensitivity and specificity, scoring complete sensitivity.
Evaluation metrics include precision, accuracy, sensitivity, and specificity, analyzed via confusion matrices and Receiver Operating Characteristic (ROC) curves.
Feature importance ranking determined by mean decrease in accuracy indicates ALT, AST, ALP, and GGT as the most significant predictors.

Discussion:

The empirical results demonstrate that sophisticated machine learning models can proficiently aid in medical diagnosis by classifying clinical data into categories that identify liver diseases. Interestingly, while the SVM was most accurate in classification, ANN offered substantial accuracy but required more computational time. The research endorses the integration of ML into clinical settings to enhance diagnostic precision and operational efficiency.

The study also highlights considerations for future research, such as exploring multinomial classification due to limited sample size across certain conditions and the potential of deploying alternative ML techniques to cross-validate findings.

Conclusion:

While the study reaffirms the potential of ML models like SVM and RF in diagnosing liver diseases effectively, it advocates using these tools as supplementary aids to human expertise rather than replacements. The results underscore the capability of data-driven methods in healthcare, emphasizing the potential time and cost savings in diagnostic processes.

Markdown Report Issue