Convolutional XGBoost in Brain MRI Tumor Detection
- Convolutional XGBoost is a hybrid neural-symbolic model combining DenseNet-121 feature extraction with XGBoost classification for early brain tumor detection.
- It reduces overall model complexity and mitigates overfitting by leveraging dropout, regularization, and tree-based boosting on imbalanced MRI datasets.
- Empirical results demonstrate enhanced accuracy, sensitivity, and faster convergence compared to traditional CNN-only architectures in neuro-oncology.
Convolutional XGBoost (C-XGBoost) is a hybrid neural-symbolic architecture specifically developed for the early detection of brain tumors in MRI classification tasks. It integrates a convolutional neural network (CNN) based on DenseNet-121 for automated feature extraction with an extreme gradient boosting (XGBoost) classifier to perform final decision making. The model is designed to reduce overall complexity, mitigate overfitting, and offer enhanced robustness when handling imbalanced or noisy medical imaging datasets, such as those commonly encountered in neuro-oncological applications (Babayomi et al., 2023).
1. CNN Feature Extractor Design
The C-XGBoost pipeline uses as its backbone a DenseNet-121 CNN, with all architectural layers maintained as per the standard DenseNet-121 (cf. Huang et al., 2017). The configuration processes MRI scans represented as 224×224×3 input arrays. Initial layers (up to layer 200) remain frozen during training and include a succession of convolutional, batch normalization, and pooling layers, followed by dense blocks and transition layers:
- conv1: 7×7 convolution, 64 filters, stride 2, batch normalization, ReLU;
- 3×3 max-pooling, stride 2;
- Four dense blocks (6, 12, 24, 16 bottleneck layers respectively, growth rate 32) separated by transitions (1×1 convolution + 2×2 average pooling);
- Dropout (rate 0.8) after the DenseNet backbone;
- GlobalAveragePooling2D, resulting in a 1024-dimensional feature vector per image;
- Additional dropout (rate 0.8);
- Dense layer with 3 softmax units and regularization ().
Batch normalization is intrinsic within each DenseNet convolutional layer. The CNN is trained using the Adam optimizer (, ) with an initial learning rate of and categorical cross-entropy loss over three tumor classes (glioma, meningioma, pituitary).
2. Feature Vector Transformation and XGBoost Interface
Upon completion of CNN training, the network is truncated at the GlobalAveragePooling2D layer, effectively discarding the terminal fully connected and softmax layers. For an image , the CNN feature extractor outputs . The entire dataset is transformed into a feature matrix , where denotes the sample count. This matrix, without further dimensionality reduction, feeds directly into the XGBoost classifier, which operates on floating-point vectors with associated integer class labels (Babayomi et al., 2023).
3. Loss Functions and Objective Formalism
CNN Loss
The CNN component is trained to minimize categorical cross-entropy:
for each sample , where is the true class indicator and the predicted softmax output.
XGBoost Objective
For multiclass classification, XGBoost minimizes the regularized log-loss summed over all boosting iterations :
where incorporates penalties for tree complexity (number of leaves , leaf weights , hyperparameters and ). Tree growth and node splits are computed using the first and second derivatives of the loss, with optimal leaf weight and gain defined analytically as in the standard XGBoost formalism.
4. Training Procedure and Hyperparameterization
The C-XGBoost training workflow is defined by the following sequential stages:
- Load and split the dataset into 90% training (with augmentation) and 10% test partitions.
- Preprocess images: extract intensity matrix from .mat files, normalize to , resize to 224×224, convert to BGR, standardize to zero mean/unit variance.
- Augment the training set, generating three transformed variants (rotation, flip, zoom) per original.
- Train the CNN with batch size 1 for up to 25 epochs, deploying early stopping if validation loss does not improve for five epochs.
- Extract 1024-dimensional CNN features for all samples in the training set.
- Train an XGBoost classifier with the following parameters:
| Parameter | Value |
|---|---|
| Objective | multi:softmax |
| Learning rate | 0.1 |
| Max depth | 15 |
| n_estimators | 500 |
| γ, λ, subsample | default |
- Evaluate the classifier on the held-out test set.
5. Dataset Characteristics
The model was assessed on a public brain MRI repository (Figshare DOI:10.6084/m9.figshare.1512427) containing 3,064 images in Matlab (.mat) format. The dataset is approximately balanced across three classes: glioma, meningioma, and pituitary tumors. Preprocessing consists of pixel normalization, resizing to 224×224 with BGR channels, and standardization. Data augmentation ensures greater robustness and simulates imaging variability. The official split is 90% training (with augmented images included), 10% test.
6. Performance Metrics and Comparative Results
Evaluation on the held-out test subset demonstrates a quantitative performance advantage for C-XGBoost over a baseline end-to-end CNN:
| Metric | CNN only | C-XGBoost |
|---|---|---|
| Accuracy | 98.80% | 99.02% |
| F1-score | 0.97 | 0.98 |
| Sensitivity | 87.4% | 91.5% |
| Specificity | 95.2% | 97.4% |
| AUC | – (not reported) | – (not reported) |
Normalized confusion matrices indicate a reduction in misclassification rate for the C-XGBoost model. Training/validation loss curves show improved generalization, with lower validation loss and higher accuracy on unseen data (Babayomi et al., 2023).
7. Model Complexity and Robustness Considerations
Offloading the final classification stage to XGBoost eliminates large fully connected CNN head layers, significantly reducing the number of trainable parameters. The use of XGBoost introduces robust tree-based regularization (, ) and early stopping, directly addressing overfitting. XGBoost’s boosting mechanism emphasizes samples that are misclassified by previous trees, naturally improving learning with imbalanced datasets. Tree-based methods also exhibit higher resilience to missing or noisy features compared to a purely neural softmax output. Empirical results show that C-XGBoost achieves faster convergence, higher overall accuracy, and a narrower train-validation generalization gap compared to a CNN-only architecture (Babayomi et al., 2023).