Papers
Topics
Authors
Recent
Search
2000 character limit reached

Local Continuity Module (LCM)

Updated 2 December 2025
  • LCM is a module that integrates domain-specific local correspondence and compact modeling to capture fine-grained spatial and inter-image relationships.
  • In co-salient object detection, LCM uses multi-stage pairwise correlation and 3D convolutions to fuse local and global features, significantly improving accuracy.
  • For point cloud masked modeling, LCM leverages a locally constrained encoder and a Mamba-based decoder to reduce computational cost while boosting reconstruction fidelity.

The Local Continuity Module (LCM) designates two distinct, high-impact architectural strategies for modeling fine-grained local relationships: (1) Local Correspondence Modeling in co-salient object detection, and (2) Locally Constrained Compact Models for efficient masked point modeling. Both lines of work replace or augment standard attention frameworks with domain-specific locality-aware components to encode spatial or inter-image affinities, achieving improvements in both accuracy and computational efficiency. The principal designs are exemplified by the LCM in GLNet for co-salient object detection (Cong et al., 2022), and the Locally Constrained Compact Model for point-cloud masked modeling (Zha et al., 2024).

1. LCM in Co-Salient Object Detection: Architecture and Operations

In the context of co-salient object detection (CoSOD), the Local Correspondence Modeling (LCM) module is a core component of the global-and-local collaborative learning architecture (GLNet), engineered to explicitly capture local inter-image correspondence for robust co-saliency prediction (Cong et al., 2022).

The LCM operates on a feature map FianRC×H×WF^n_{ia} \in \mathbb{R}^{C \times H \times W} for each image nn in a group of NN images, typically with C=512C=512 for VGG16-based backbones. For each image kk, LCM computes pairwise local correspondences with all other images jkj\neq k via a multi-stage Pairwise Correlation Transformation (PCT):

  • Subspace Mapping: 1×1 convolution projects each FianF^n_{ia} to UnRC×H×WU^n \in \mathbb{R}^{C \times H \times W}.
  • Affinity Estimation: Each UkU^k and UjU^j are reshaped to nn0; their affinities nn1 are computed as transposed matrix product, measuring pixel-wise similarity.
  • Score Pooling and Normalization: For image nn2, globally pooled local maxima nn3 and softmax normalization yield a weighting map nn4.
  • Feature Fusion with Attention: Local affinity maps are broadcast and fused into residual-attention-weighted feature flows: nn5.
  • Inter-image Aggregation: These nn6 local maps for each nn7 are stacked, followed by stacked 3D convolutions (kernel nn8), yielding the local inter-image descriptor nn9.

Internal attention mechanisms (SE-based channel attention and CBAM-style spatial attention) refine both fusion and local context. Key architectural details include two 3D convolutions (approx. NN0M parameters for NN1), a NN2 conv, and attention modules, totaling NN3–NN4M parameters per image NN5.

2. LCM Contribution to Global-and-Local Feature Fusion

The LCM’s output NN6 provides fine-grained pairwise local descriptors, which are fused with global group-level features NN7 from the Global Correspondence Modeling (GCM) module. Fusion occurs via the Global-and-Local Correspondence Aggregation (GLA):

  • NN8 is fused using a NN9 3D convolution, ReLU, and subsequent channel/spatial attention operations.
  • This yields the final inter-image feature C=512C=5120 incorporated into downstream co-saliency prediction.

Ablation studies demonstrate that removing the LCM results in significant performance degradation: on the Cosal2015 dataset, C=512C=5121 drops from C=512C=5122 to C=512C=5123 and C=512C=5124 increases from C=512C=5125 to C=512C=5126, indicating substantial loss of co-saliency discrimination especially in groups with strong intra-class variance (Cong et al., 2022).

3. LCM in Point Cloud Modeling: Locally Constrained Compact Model Design

Separately, the Locally Constrained Compact Model (LCM) for masked point modeling (Zha et al., 2024) establishes a locality-driven alternative to quadratic-complexity Transformer frameworks, targeting redundancy reduction and linear scaling.

The architecture consists of two principal modules:

  • Locally Constrained Compact Encoder (LCCE): Replaces global self-attention with local aggregation layers. Each patch token finds its C=512C=5127-nearest neighbors via geometric KNN on patch centers, aggregating local structure using concatenation and local MLPs, followed by channel-wise max-pooling. The static neighbor graph (C=512C=5128) is shared across all encoder layers, enforcing locality and continuity.
  • Locally Constrained Mamba-Based Decoder (LCMD): Integrates a linear-time State-Space Model (SSM, as in Mamba) with a local-constrained feed-forward network (LCFFN). The decoder preserves mutual information for masked patch reconstruction by ensuring only geometric neighbors communicate, achieving robustness to patch ordering and high reconstruction fidelity.

This design compresses a Point-MAE backbone to C=512C=5129M parameters (from kk0M) and kk1G FLOPs (from kk2G), while increasing ScanObjectNN OBJ-BG accuracy from kk3 to kk4 and ScanNetV2 APkk5 from kk6 to kk7 (+kk8) (Zha et al., 2024).

4. Mathematical Operations in LCM Modules

  • Affinity matrix between kk9:

jkj\neq k0

  • Global score and weighting:

jkj\neq k1

  • Fusion:

jkj\neq k2

  • 3D Convolutional Aggregation:

jkj\neq k3

  • Local Aggregation Layer:

For each token jkj\neq k4:

jkj\neq k5

jkj\neq k6

  • Mutual Information Guarantee (Decoder): The mutual information preserved by the Mamba SSM decoder:

jkj\neq k7

due to the data processing inequality and the linear nature of the SSM.

5. Computational Characteristics and Ablation Insights

Both forms of LCM dramatically reduce computational cost by confining feature interactions to local neighborhoods.

  • Parameter Efficiency: In point cloud MPM, LCM reduces parameter count by jkj\neq k8 (from jkj\neq k9M to FianF^n_{ia}0M) and FLOPs by FianF^n_{ia}1 (from FianF^n_{ia}2G to FianF^n_{ia}3G).
  • Accuracy Gains: LCM-Point-MAE outperforms Transformer-based Point-MAE by FianF^n_{ia}4 (OBJ-BG), FianF^n_{ia}5 (OBJ-ONLY), and FianF^n_{ia}6 (PB-T50-RS).
  • Key Hyperparameters: Local aggregation with FianF^n_{ia}7 neighbors provides optimal accuracy/resource tradeoff; geometric KNN matches or slightly outperforms dynamic/feature-space KNN.
  • Ablations: In the point-cloud domain, inclusion of both local aggregation and FFN in the encoder yields optimal accuracy (FianF^n_{ia}8 PB-T50-RS), but local aggregation is the dominant driver, accounting for most performance improvements.
  • Decoder Variants: Mamba+LCFFN configuration yields the highest masked reconstruction accuracy (FianF^n_{ia}9 on ScanObjectNN PB-T50-RS).
Model/Setting Params FLOPs Acc./Metric
Transformer (PM) 22.1M 4.8G OBJ-BG 92.67%
LCM (PM) 2.7M 1.3G OBJ-BG 94.51%
GLNet w/ LCM 10–11M/𝑘 UnRC×H×WU^n \in \mathbb{R}^{C \times H \times W}0
GLNet w/o LCM <10M UnRC×H×WU^n \in \mathbb{R}^{C \times H \times W}1

6. Theoretical Significance and Limitations

LCM-based designs replace non-local self-attention with constrained, neighborhood-preserving aggregation, underpinned by the principle that most relevant contextual information in highly structured domains (e.g., 3D space, local object saliency) is localized. In point cloud modeling, information-theoretic analysis shows that the locally constrained Mamba decoder retains at least as much mutual information about masked regions as a Transformer, while relying on linear operations.

A plausible implication is that for structured data with clear geometric or semantic neighborhoods, LCM-like modules can deliver superior efficiency–accuracy profiles compared to transformer-based paradigms, provided domain knowledge about locality is available. However, for modeling long-range dependencies or highly non-local relationships, purely local architectures may require auxiliary modules or hybrid fusion.

7. Impact and Current Use

LCMs have proven critical both for improved model performance and for making inference or pretraining feasible on larger, more realistic inputs without the quadratic cost of classical attention. In image co-saliency detection, LCM enables fine-grained correspondence learning between images, overcoming limitations of global feature pooling. In point cloud masked modeling, the Locally Constrained Compact Model supports scalable pretraining and robust transfer across 3D tasks, with empirical evidence showing up to UnRC×H×WU^n \in \mathbb{R}^{C \times H \times W}2 parameter reductions with no loss—and sometimes improvement—in downstream accuracy (Cong et al., 2022, Zha et al., 2024). In both cases, architectural modularity allows seamless integration with global contextual modeling, supporting a hierarchy of correspondence cues.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Local Continuity Module (LCM).