TFFM: Topological Feature Fusion
- TFFM is a neural module that fuses standard feature representations with topological descriptors using dual-branch designs and persistent homology.
- It employs mechanisms like squeeze-and-excitation and soft attention to seamlessly integrate local numerical and global structural features.
- Empirical validations show TFFM enhances accuracy in image classification, improves retinal segmentation metrics, and counters over-smoothing in graph neural networks.
The Topological Feature Fusion Module (TFFM) is a class of neural network module architectures designed to bridge the gap between standard feature representations and topological or structural information. By design, TFFMs integrate local or numerical features with abstract or global connectivity descriptors, using topological data analysis, graph reasoning, or hierarchical feature fusion. TFFMs have been instantiated across diverse domains, including image classification, graph neural network (GNN) topology optimization, and topology-aware image segmentation, with each instantiation precisely coupling local and global information flows.
1. Multimodal and Topology-Preserving Motivation
Conventional convolutional neural networks (CNNs) operate on local pixel patterns and multi-scale receptive fields, excelling at feature abstraction from visual data. However, they lack sensitivity to global or topological structures such as connectivity, loops, or object persistence, which are critical in domains like high-dimensional scientific data or biomedical images. In contrast, topological data analysis (TDA) captures multi-scale connectivity via persistent homology, offering robustness to noise and invariance to diffeomorphisms, at the cost of discarding fine-scale texture details (Han et al., 2024). Analogously, standard GNNs can suffer from over-smoothing, losing distinctive node features when stacking multiple layers, thus motivating wiring schemes that selectively preserve hierarchy and diversity in learned representations (Wei et al., 2021).
TFFMs address these deficiencies through dual- or multi-branch architectures. Each branch targets a complementary feature modality—one extracting local/numerical descriptors, another encoding structural or topological attributes. Fusion mechanisms, typically involving channel concatenation and attention-based reweighting, combine these heterogeneous representations, enabling downstream classifiers or decoders to exploit both numerically fine and topologically robust patterns.
2. Architectures and Mathematical Formulation
2.1 Image + TDA Fusion Modules
In image classification, TFFMs are realized by a bifurcated architecture (Han et al., 2024):
- CNN branch: Performs standard convolutional abstraction, generating a feature tensor .
- TDA branch: Computes the persistent homology of the input, projecting it through a persistence image (PI) pipeline. Given an image , the birth–death pairs for each feature in (homology dimension ) are mapped to birth–persistence . Each is convolved with a 2D Gaussian to obtain a continuous PI. Separate PIs for are stacked as channels and processed by a small CNN to yield .
- Fusion and Attention: The outputs are concatenated along channel dimension:
passed through a Squeeze-and-Excitation (SE) block to compute adaptive weights :
The result is pooled or flattened for final classification.
2.2 Selection–Fusion–Aggregation in GNNs
In GNN topology search frameworks, TFFM manifests as the SFA (“Selection, Fusion, Aggregation”) block (Wei et al., 2021):
- Selection: For layer , candidate predecessor features are projected via selector functions (typically parameterized soft “ZERO” or “IDENTITY” switches).
- Fusion: The set of selected inputs is merged using a learnable function , chosen from a pool that includes (the last being soft-attention fusion over inputs); this yields .
- Aggregation: Finally, a standard GNN kernel (e.g., GCN, GAT, SAGE, GIN) is applied to and adjacency , producing the output for the layer.
Differentiable neural architecture search is employed to learn architecture logits for selection, fusion, and aggregation, resulting in dataset-specific, depth-adaptive wiring.
2.3 Latent Graph Reasoning for Topology Preservation
In topology-aware medical image segmentation, TFFMs are implemented by:
- Mapping decoder features to a latent graph via scale-adaptive pooling, obtaining node features .
- Adjacency construction is performed using top- nearest neighbors by cosine similarity: if .
- Message passing is achieved via masked, multi-head Graph Attention Networks (GATs), updating node features .
- Graph-to-grid fusion: The GAT-refined features are returned to the feature map, concatenated with upsampled auxiliary features, and convolved before channel/spatial attention refinement and gated residual fusion (Ahmed et al., 27 Jan 2026).
3. Fusion and Reweighting Mechanisms
Attention-based fusion is central to TFFM across domains:
- Channel-wise Squeeze-and-Excitation: Following concatenation, channel-statistics are globally pooled and reweighted through two fully connected layers and a sigmoid, with learned weights applied to each fused feature channel (Han et al., 2024).
- Learnable gating in segmentation: Gated residual connections interpolate between original and topology-refined features, with gate parameters learned via a sigmoid-activated fusion network (Ahmed et al., 27 Jan 2026).
- Softmax-parameterized op selection: In GNN TFFM, architecture search assigns temperature-controlled soft fusion/selection weights, discretized post-search (Wei et al., 2021).
This fusion ensures that the network adaptively emphasizes informative modalities per dataset, depth, or task.
4. Training Objectives and Optimization
TFFM-equipped networks utilize standard loss formulations for their respective tasks:
- Image classification: Cross-entropy loss is applied to the classifier output after fusion. No explicit topological regularizer is used; the topology guidance enters through the fused representation only (Han et al., 2024).
- Topology-aware segmentation: A hybrid loss combining Tversky loss (; modulating false negatives/positives) and soft clDice loss (; topology alignment via soft skeletonization) is employed:
- GNN topology search: The architecture search is supervised by task-specific loss (e.g., node classification cross-entropy), with architecture and weights optimized jointly via Adam and temperature-annealed softmax (Wei et al., 2021).
5. Empirical Validation and Performance
5.1 Image Classification and Clustering
On datasets including Intel Image, CelebA Gender, and Chinese Calligraphy Styles, TFFM-equipped CNNs yield consistent performance gains:
- VGG16 accuracy rises from 79.95% to 97.45% (+17.5%) on Calligraphy.
- DenseNet121 accuracy improves from 93.86% to 94.71% (+0.85%) on Gender; on Intel Image: +0.98%.
- GoogleNet accuracy increases on Intel Image from 88.69% to 91.23% (+2.54%).
Ablation demonstrates that persistence image fusion plus SE attention (full TFFM) secures maximal gains. t-SNE feature visualizations confirm improved separability via tighter intra-class clusters and larger inter-class margins (Han et al., 2024).
5.2 Retinal Vessel Segmentation
On Fundus-AVSeg, integrating TFFM into U-Net++ leads to:
- Dice: 90.97%
- HD95: 3.50 px
- clDice: 85.55%
- Number of predicted topological components reduced by ~38%.
Ablation reveals that TFFM alone drastically curtails topological error (Betti0–Err down ~39.5%), while adding soft clDice makes centerlines more coherent. Zero-shot tests on multiple retinal datasets (DRIVE, CHASEDB1, HRF, RETA, STARE) confirm robust topology preservation (Ahmed et al., 27 Jan 2026).
5.3 GNN Topology and Graph Learning
On eight graph datasets (homophily and heterophily), TFFM (as SFA-block in F²GNN) achieves top-3 mean accuracy ranks, e.g.:
- Cora dataset: F²GAT achieves 88.31% ± 0.12% accuracy,
- Outperforming human-designed and prior NAS baselines by 20–30% average error reduction on homophily graphs.
TFFM adaptively fuses early-layer and deep features, countering over-smoothing and broadening the spectrum of feasible topology-aware architectures (Wei et al., 2021).
6. Domain-Specific Adaptations and Integrations
TFFMs are designed to dovetail seamlessly with established backbones:
- CNNs: TFFMs are inserted post-final convolutional block, prior to pooling/classification layers, ensuring matching spatial dimensions for channel-wise fusion (Han et al., 2024).
- U-Net++/EfficientNet: TFFMs are embedded at each decoder stage, immediately following skip-concatenation and upsampling, fusing graph-refined features before subsequent convolutions (Ahmed et al., 27 Jan 2026).
- GNNs: Every new block replaces standard aggregation with TFFM SFA-blocks, the latter fully controlling the “wiring” and fusion among prior layers (Wei et al., 2021).
This modularity allows TFFMs to be retrofitted into pre-existing topologies as plug-and-play units.
7. Impact on Representation Learning and Open Challenges
TFFMs represent a principled mechanism for uniting local, global, and topological reasoning in neural architectures, achieving improvements in both accuracy and topological preservation over standard approaches. Empirically, they consistently produce more structurally coherent segmentations and more discriminative feature embeddings, supported by quantitative gains in accuracy, cluster separation, and Betti-0 reductions.
A plausible implication is that further advances might involve automating the selection and weighting of topological and local features at higher orders, or extending TFFM principles to multimodal and self-supervised contexts. Cross-domain performance, adaptivity to diverse graph/feature distributions, and theoretical understanding of fusion dynamics remain open research directions.
Key References:
- "Research on fusing topological data analysis with convolutional neural network" (Han et al., 2024)
- "Designing the Topology of Graph Neural Networks: A Novel Feature Fusion Perspective" (Wei et al., 2021)
- "TFFM: Topology-Aware Feature Fusion Module via Latent Graph Reasoning for Retinal Vessel Segmentation" (Ahmed et al., 27 Jan 2026)