CoLa: Chinese Character Decomposition with Compositional Latent Components

Published 4 Jun 2025 in cs.CV | (2506.03798v1)

Abstract: Humans can decompose Chinese characters into compositional components and recombine them to recognize unseen characters. This reflects two cognitive principles: Compositionality, the idea that complex concepts are built on simpler parts; and Learning-to-learn, the ability to learn strategies for decomposing and recombining components to form new concepts. These principles provide inductive biases that support efficient generalization. They are critical to Chinese character recognition (CCR) in solving the zero-shot problem, which results from the common long-tail distribution of Chinese character datasets. Existing methods have made substantial progress in modeling compositionality via predefined radical or stroke decomposition. However, they often ignore the learning-to-learn capability, limiting their ability to generalize beyond human-defined schemes. Inspired by these principles, we propose a deep latent variable model that learns Compositional Latent components of Chinese characters (CoLa) without relying on human-defined decomposition schemes. Recognition and matching can be performed by comparing compositional latent components in the latent space, enabling zero-shot character recognition. The experiments illustrate that CoLa outperforms previous methods in both character the radical zero-shot CCR. Visualization indicates that the learned components can reflect the structure of characters in an interpretable way. Moreover, despite being trained on historical documents, CoLa can analyze components of oracle bone characters, highlighting its cross-dataset generalization ability.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a deep latent variable model for zero-shot Chinese character recognition by automatically decomposing characters into compositional latent components.
It employs an ELBO framework that optimizes reconstruction and prediction, achieving a 47% accuracy improvement on HWDB zero-shot tests.
The method demonstrates robust cross-dataset generalization, making it applicable in historical script analysis and future character recognition tasks.

CoLa: Chinese Character Decomposition with Compositional Latent Components

Introduction

The paper "CoLa: Chinese Character Decomposition with Compositional Latent Components" (2506.03798) introduces a novel approach to Chinese character recognition (CCR) leveraging the cognitive principles of compositionality and learning-to-learn. Traditional methods often rely on predefined radical or stroke decomposition, which can limit generalization to novel characters. Instead, CoLa uses deep latent variable models to learn compositional latent components directly from data, bypassing human-defined decomposition schemes.

Figure 1: Different types of Chinese character recognition methods explored, highlighting CoLa's approach of automatic decomposition.

Methodology

CoLa employs a deep latent variable model to decompose characters into compositional latent components within a latent space. This model supports zero-shot recognition by comparing these latent components from input characters with those from template images. Unlike previous methods that are constrained by predefined decomposition rules, CoLa automatically learns to decompose characters by reconstructing visual features derived from a pretrained teacher encoder, aligning components in an interpretable manner.

Figure 2: The generative process of CoLa, showing extraction, decoding, and prediction mechanisms.

The model's probabilistic framework involves an evidence lower bound (ELBO) that combines reconstruction, prediction, and regularization terms to optimize performance. Effective recognition is achieved by minimizing the reconstruction error and maximizing the likelihood of correct class prediction. CoLa's design inherently supports permutations in component order, allowing flexibility in how decomposition is approached.

Experimental Results

CoLa's performance was evaluated on multiple datasets, including handwritten (HWDB) and printed Chinese characters, as well as historical documents. Significant improvements were observed in zero-shot settings, with the model achieving superior accuracy over existing radical and stroke-based methods. For instance, CoLa improved recognition accuracy by approximately 47% in character zero-shot settings on HWDB.

Figure 3: Visualization of compositional latent components, comparing these to human-defined radical decompositions.

Furthermore, CoLa has demonstrated robust cross-dataset generalization capabilities, analyzing characters from oracle bone scripts without retraining on these significantly different scripts. This highlights its potential applicability in historical script analysis, demonstrating a profound capacity for component recognition across diverse datasets.

Implications and Future Work

CoLa represents a significant advancement in the field of CCR by circumventing the limitations of human-defined decomposition schemes. The ability to automatically discover and utilize compositional latent components opens new avenues for applications across domain recognition tasks. However, future work should address CoLa's performance in complex scene images where background noise and resolution variability could affect decomposition accuracy.

Conclusion

"CoLa: Chinese Character Decomposition with Compositional Latent Components" (2506.03798) provides a compelling approach to zero-shot Chinese character recognition by leveraging a deep latent variable framework to learn decompositional structures. This method not only surpasses traditional approaches in performance but also extends the possibility of cross-dataset applications, particularly in historical contexts. These findings pave the way for further innovation in character recognition and similar fields requiring nuanced component-based interpretation.

Markdown Report Issue