Papers
Topics
Authors
Recent
Search
2000 character limit reached

Convolutional XGBoost in Brain MRI Tumor Detection

Updated 21 January 2026
  • Convolutional XGBoost is a hybrid neural-symbolic model combining DenseNet-121 feature extraction with XGBoost classification for early brain tumor detection.
  • It reduces overall model complexity and mitigates overfitting by leveraging dropout, regularization, and tree-based boosting on imbalanced MRI datasets.
  • Empirical results demonstrate enhanced accuracy, sensitivity, and faster convergence compared to traditional CNN-only architectures in neuro-oncology.

Convolutional XGBoost (C-XGBoost) is a hybrid neural-symbolic architecture specifically developed for the early detection of brain tumors in MRI classification tasks. It integrates a convolutional neural network (CNN) based on DenseNet-121 for automated feature extraction with an extreme gradient boosting (XGBoost) classifier to perform final decision making. The model is designed to reduce overall complexity, mitigate overfitting, and offer enhanced robustness when handling imbalanced or noisy medical imaging datasets, such as those commonly encountered in neuro-oncological applications (Babayomi et al., 2023).

1. CNN Feature Extractor Design

The C-XGBoost pipeline uses as its backbone a DenseNet-121 CNN, with all architectural layers maintained as per the standard DenseNet-121 (cf. Huang et al., 2017). The configuration processes MRI scans represented as 224×224×3 input arrays. Initial layers (up to layer 200) remain frozen during training and include a succession of convolutional, batch normalization, and pooling layers, followed by dense blocks and transition layers:

  • conv1: 7×7 convolution, 64 filters, stride 2, batch normalization, ReLU;
  • 3×3 max-pooling, stride 2;
  • Four dense blocks (6, 12, 24, 16 bottleneck layers respectively, growth rate 32) separated by transitions (1×1 convolution + 2×2 average pooling);
  • Dropout (rate 0.8) after the DenseNet backbone;
  • GlobalAveragePooling2D, resulting in a 1024-dimensional feature vector per image;
  • Additional dropout (rate 0.8);
  • Dense layer with 3 softmax units and L2L_2 regularization (λ=1e-4\lambda=1\text{e-4}).

Batch normalization is intrinsic within each DenseNet convolutional layer. The CNN is trained using the Adam optimizer (β1=0.9\beta_1=0.9, β2=0.999\beta_2=0.999) with an initial learning rate of 1e-31\text{e-3} and categorical cross-entropy loss over three tumor classes (glioma, meningioma, pituitary).

2. Feature Vector Transformation and XGBoost Interface

Upon completion of CNN training, the network is truncated at the GlobalAveragePooling2D layer, effectively discarding the terminal fully connected and softmax layers. For an image xkx_k, the CNN feature extractor outputs fkR1024f_k \in \mathbb{R}^{1024}. The entire dataset is transformed into a feature matrix XRN×1024X \in \mathbb{R}^{N\times 1024}, where NN denotes the sample count. This matrix, without further dimensionality reduction, feeds directly into the XGBoost classifier, which operates on floating-point vectors with associated integer class labels (Babayomi et al., 2023).

3. Loss Functions and Objective Formalism

CNN Loss

The CNN component is trained to minimize categorical cross-entropy:

LCNN=c=13yi,clogy^i,cL_\text{CNN} = - \sum_{c=1}^3 y_{i,c} \log \hat{y}_{i,c}

for each sample ii, where yi,cy_{i,c} is the true class indicator and y^i,c\hat{y}_{i,c} the predicted softmax output.

XGBoost Objective

For multiclass classification, XGBoost minimizes the regularized log-loss summed over all boosting iterations tt:

L(t)=i=1nl(yi,y^i(t))+k=1tΩ(fk)L^{(t)} = \sum_{i=1}^n l(y_i, \hat{y}_i^{(t)}) + \sum_{k=1}^t \Omega(f_k)

where Ω(f)=γT+12λj=1Twj2\Omega(f) = \gamma T + \frac{1}{2} \lambda \sum_{j=1}^T w_j^2 incorporates penalties for tree complexity (number of leaves TT, leaf weights wjw_j, hyperparameters γ\gamma and λ\lambda). Tree growth and node splits are computed using the first and second derivatives of the loss, with optimal leaf weight and gain defined analytically as in the standard XGBoost formalism.

4. Training Procedure and Hyperparameterization

The C-XGBoost training workflow is defined by the following sequential stages:

  1. Load and split the dataset into 90% training (with augmentation) and 10% test partitions.
  2. Preprocess images: extract intensity matrix from .mat files, normalize to [0,1][0,1], resize to 224×224, convert to BGR, standardize to zero mean/unit variance.
  3. Augment the training set, generating three transformed variants (rotation, flip, zoom) per original.
  4. Train the CNN with batch size 1 for up to 25 epochs, deploying early stopping if validation loss does not improve for five epochs.
  5. Extract 1024-dimensional CNN features for all samples in the training set.
  6. Train an XGBoost classifier with the following parameters:
Parameter Value
Objective multi:softmax
Learning rate 0.1
Max depth 15
n_estimators 500
γ, λ, subsample default
  1. Evaluate the classifier on the held-out test set.

5. Dataset Characteristics

The model was assessed on a public brain MRI repository (Figshare DOI:10.6084/m9.figshare.1512427) containing 3,064 images in Matlab (.mat) format. The dataset is approximately balanced across three classes: glioma, meningioma, and pituitary tumors. Preprocessing consists of pixel normalization, resizing to 224×224 with BGR channels, and standardization. Data augmentation ensures greater robustness and simulates imaging variability. The official split is 90% training (with augmented images included), 10% test.

6. Performance Metrics and Comparative Results

Evaluation on the held-out test subset demonstrates a quantitative performance advantage for C-XGBoost over a baseline end-to-end CNN:

Metric CNN only C-XGBoost
Accuracy 98.80% 99.02%
F1-score 0.97 0.98
Sensitivity 87.4% 91.5%
Specificity 95.2% 97.4%
AUC – (not reported) – (not reported)

Normalized confusion matrices indicate a reduction in misclassification rate for the C-XGBoost model. Training/validation loss curves show improved generalization, with lower validation loss and higher accuracy on unseen data (Babayomi et al., 2023).

7. Model Complexity and Robustness Considerations

Offloading the final classification stage to XGBoost eliminates large fully connected CNN head layers, significantly reducing the number of trainable parameters. The use of XGBoost introduces robust tree-based regularization (γ\gamma, λ\lambda) and early stopping, directly addressing overfitting. XGBoost’s boosting mechanism emphasizes samples that are misclassified by previous trees, naturally improving learning with imbalanced datasets. Tree-based methods also exhibit higher resilience to missing or noisy features compared to a purely neural softmax output. Empirical results show that C-XGBoost achieves faster convergence, higher overall accuracy, and a narrower train-validation generalization gap compared to a CNN-only architecture (Babayomi et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Convolutional XGBoost (C-XGBoost).