Collision Cross-Entropy Explained
- Collision cross-entropy is a loss function defined as the negative log of the collision probability between two probability distributions, aligning model predictions with soft targets.
- It suppresses training influence from maximally uncertain (uniform) targets, thereby improving robustness to noise compared to Shannon's cross-entropy.
- This loss function shows strong performance in deep clustering and weakly-supervised learning, aided by an efficient EM-based pseudo-label optimization.
Collision cross-entropy is a loss function designed for scenarios where class labels are soft (i.e., represented by categorical probability distributions), offering distinct advantages over the traditional Shannon cross-entropy, especially in contexts of ambiguous or uncertain targets. It is defined as the negative logarithm of the probability that two independent samples—one drawn from the predicted distribution and one from the target distribution—collide in class assignment. This formulation leads to robust learning dynamics in supervised classification with noisy labels and in deep clustering with self-labeling frameworks, providing improved stability and performance under label uncertainty (Zhang et al., 2023).
1. Formal Definition and Mathematical Properties
Given two categorical distributions and over classes, the collision cross-entropy (of order 2) is defined as
This expression computes the negative log of the total "collision probability"—the chance that a label drawn independently from and coincides. Minimizing the collision cross-entropy with respect to the model predictions is equivalent to maximizing this collision probability, driving alignment between the predicted and target distributions (Zhang et al., 2023).
2. Relationship to Shannon’s Cross-Entropy and One-hot Labels
For one-hot target labels (i.e., for some , ), the collision cross-entropy reduces to the standard cross-entropy: which exactly matches Shannon's cross-entropy for such labels. Therefore, in traditional settings with hard labels, collision cross-entropy and Shannon's cross-entropy are equivalent (Zhang et al., 2023).
However, the distinction becomes significant for soft targets. When is uniform (), is constant, and its gradient with respect to is zero. Thus, training gradients vanish for data points with maximally uncertain targets; these points do not influence parameter updates. In contrast, Shannon's cross-entropy retains dependence on for soft labels, causing the model to mimic or "copy" the uncertainty in soft targets, potentially amplifying noise or label ambiguity (Zhang et al., 2023).
3. Symmetry and Behavior with Soft Labels
A distinctive property of collision cross-entropy is symmetry in its arguments: This symmetry is not shared by Shannon's cross-entropy in general (), and has practical importance in unsupervised and self-labeling clustering setups, where both pseudo-labels and predictions are variable distributions (Zhang et al., 2023).
For soft categorical targets, the collision cross-entropy is selective: it disregards points where maximum uncertainty is present (i.e., uniform targets), effectively suppressing the contribution of ambiguous data to the training signal. This behavior is advantageous in settings where label uncertainty reflects lack of information rather than inherent ambiguity in the data (Zhang et al., 2023).
4. Application in Deep Clustering and EM Optimization
Collision cross-entropy has particular utility in deep clustering, especially where pseudo-labels (latent targets) are estimated jointly with model parameters. A typical self-labeling loss uses
where are pseudo-labels, are model predictions, is the uniform prior, is the mean pseudo-label vector, and enforces balance across clusters.
An efficient EM (Expectation-Maximization) algorithm is derived for pseudo-label estimation with collision cross-entropy. The E-step introduces hidden cluster supports and applies Jensen's inequality for a tractable surrogate, while the M-step optimizes the pseudo-label assignments per sample via root-finding in a single variable. This approach achieves significant computational efficiency: each E- and M-step operates in time per minibatch (where is batch size and number of classes), and empirical convergence is reported within 10–20 EM iterations, which is substantially faster (by factors of 10–100) than projected-gradient descent on the simplex (Zhang et al., 2023).
5. Empirical Performance and Robustness
Collision cross-entropy demonstrates enhanced robustness to label noise and soft targets across multiple experimental domains:
- Supervised learning with label corruption: On the Natural Scene dataset, tan accuracy of collision cross-entropy remains above 80% up to 30% synthetic label corruption rate (), while Shannon’s cross-entropy degrades rapidly beyond (Zhang et al., 2023).
- Clustering with fixed features: Using ResNet-50 features, collision cross-entropy outperforms k-means and other losses on benchmarks such as STL10 (92.3% vs. 85.2%), CIFAR10 (73.5% vs. 67.8%), CIFAR100-20 (43.7% vs. 43.0%), and MNIST (58.4% vs. 47.6%) (Zhang et al., 2023).
- End-to-end deep clustering: With a VGG-4 backbone, collision cross-entropy achieves 95.11% accuracy on MNIST, compared to IIC (82.5%) and MIADM (78.9%). On STL10, performance with self-augmentation is 25.98% (±1.1%), with substantial improvements over reference losses (Zhang et al., 2023).
- Transfer with pretext-trained features: Improvements are observed for both CIFAR10 (83.27% vs. 81.8%) and STL10 (78.12% vs. 75.5%), with corresponding ARI/NMI gains (Zhang et al., 2023).
- Weakly-supervised classification: The method yields gains in low-label regimes; for instance, on STL10, accuracy increases from 26.1% (seed labels only) to 27.2% using collision cross-entropy (Zhang et al., 2023).
In every evaluated case, collision cross-entropy matches or outperforms Shannon CE and other information-theoretic objectives (IIC, IMSAT, MIADM), exhibiting higher stability under label uncertainty (Zhang et al., 2023).
6. Comparison with Other Information-Theoretic Losses
Collision cross-entropy distinguishes itself from existing information-theoretic losses in several dimensions. Unlike Shannon's cross-entropy, it does not reinforce uncertainty if the target itself is uncertain, making it an attractive loss under weak supervision or noisy label conditions. In deep clustering, where both pseudo-labels and predictions may be ambiguous, its symmetry and selective disregard of maximally uncertain data are advantageous (Zhang et al., 2023). Empirical comparisons show superior or comparable performance to IIC, IMSAT, and MIADM losses across both classification and clustering benchmarks.
7. Significance and Practical Considerations
The design of collision cross-entropy addresses specific pathologies encountered when training with soft labels, especially the tendency of Shannon’s cross-entropy to propagate uncertainty from noisy or ambiguous supervision into the model. By coupling zero-gradient behavior on fully uncertain targets with symmetric properties suitable for self-labeling, collision cross-entropy offers a robust, computationally efficient, and theoretically grounded alternative for deep learning tasks involving uncertain or estimated label distributions (Zhang et al., 2023). The associated EM procedure further enhances practicality by accelerating pseudo-label optimization in clustering settings. A plausible implication is improved model generalization and resilience to annotation noise in weakly labeled or self-supervised machine learning pipelines.