KeyNMF: Transformer-Enhanced Topic Modelling

Updated 18 February 2026

KeyNMF is a topic modelling framework that integrates transformer-based contextual embeddings with non-negative matrix factorization for both static and dynamic analysis.
It constructs a non-negative keyword–document matrix using cosine similarity and applies multiplicative update rules to optimize factorization performance.
Demonstrated on Chinese diaspora media, KeyNMF achieves a strong balance between topic diversity and external coherence, enabling clear analysis of information dynamics.

KeyNMF is a topic modelling framework that integrates transformer-based contextual embeddings with stable non-negative matrix factorization (NMF), designed for both static and dynamic modelling of topical information in large text corpora, particularly in the context of Chinese diaspora media. The approach optimizes topical coherence and diversity while enabling the quantitative analysis of information dynamics over time (Kristensen-McLachlan et al., 2024).

1. Mathematical Formulation of Static KeyNMF

KeyNMF operates on a corpus of $D$ documents with a vocabulary of $V$ candidate keywords. Each document $d$ is embedded as a vector $x_d \in \mathbb{R}^E$ and each candidate keyword $w$ as $v_w \in \mathbb{R}^E$ via a pre-trained transformer encoder. A non-negative keyword–document matrix $M \in \mathbb{R}_+^{D \times V}$ is constructed such that: $M_{dw} = \begin{cases} \max(\cos(x_d, v_w), 0) & \text{if } w \in K_d \ 0 & \text{otherwise} \end{cases}$ where $K_d$ is the set of the top $N$ words most similar to $x_d$ by cosine similarity.

The model seeks a low-rank non-negative factorization $M \approx W H$ , with $W \in \mathbb{R}_+^{D \times K}$ and $H \in \mathbb{R}_+^{K \times V}$ , by minimizing

$L(W, H) = \|M - W H\|^2_F + \lambda_W \|W\|^2_F + \lambda_H \|H\|^2_F$

where the regularization terms are optional ( $\lambda_W = \lambda_H = 0$ in practice).

2. Algorithmic Procedure for Model Fitting

Parameter optimization is conducted via block-coordinate descent using multiplicative update rules, following principles similar to Cichocki & Phan. The algorithm iteratively alternates updates of $H$ and $W$ as follows:

Input: keyword matrix M, topics K, max_iters, tol
Initialize W←random_+(D×K), H←random_+(K×V)
for iter in 1…max_iters:
  H ← H ⊙ (Wᵀ M)  / (Wᵀ W H + λ_H H)
  W ← W ⊙ (M Hᵀ) / (W H Hᵀ + λ_W W)
  Compute objective L_new
  if |L_old - L_new|/L_old < tol: break
  L_old ← L_new
return W, H

Element-wise multiplication and division are denoted by “⊙” and “/”. Convergence is determined by the relative decrease in

L

below a threshold (e.g.,

10^{-4}

) or maximum iterations (e.g., 200 or 300).

3. Dynamic Extension and Temporal Information Dynamics

For modelling temporal progression, the corpus is divided into $T$ time slices. Given submatrices $M_t$ and $W_t$ for each time slice $t$ , the method first learns a global $W, H$ across all data, then fixes $W_t$ and solves for slice-specific $H_t$ by minimizing $\|M_t - W_t H_t\|_F^2$ .

Topic activation over time is quantified by: $I_{tj} = \sum_{d \in t} W_{dj}, \qquad \hat{P}_{tj} = \frac{I_{tj}}{\sum_{j'=1}^K I_{tj'}}$ The L1-normalized $\{\hat{P}_{tj}\}$ form pseudo-probability distributions for entropy-based novelty and resonance analysis, enabling the detection and interpretation of real-world event signals in media streams.

4. Experimental Workflow and Hyperparameters

The experimental pipeline includes:

Data collection: Scraping five Chinese-language diaspora news sites every six hours between late April and mid-June 2024.
Preprocessing: Article body extraction, tokenization using jieba, stopword removal.
Embedding: Both documents and candidate words are embedded with paraphrase-multilingual-MiniLM-L12-v2 (sequence truncation: 128 tokens).
Modelling parameters:
- Number of nearest keywords per document $N = 15$ .
- Topics per site: $K \in \{10, 25, 50\}$ (fitted individually).
- Window size for novelty/resonance: 12 time-points (≈3 days).
- Smoothing span: 56.
- NMF solver: max 300 iterations, tolerance $1 \times 10^{-4}$ .

5. Comparative Evaluation and Performance Metrics

KeyNMF was benchmarked against S³ (Kardos et al. 2024), Top2Vec, BERTopic, two Contextualized Topic Models (CTM), classical NMF, and LDA using the following metrics:

Diversity ( $d$ ): Unique words across topics.
Internal coherence ( $C_{\mathrm{in}}$ ): Average pairwise cosine similarity between topic words in embedding space.
External coherence ( $C_{\mathrm{ex}}$ ): Consistency using paraphrase-multilingual MiniLM embeddings.

On all five Chinese news corpora analyzed, KeyNMF outperformed classical NMF and LDA, and was competitive with state-of-the-art contextual models. An example from the Chinanews corpus summarizes typical results:

Model	d	C_in	C_ex
KeyNMF	0.93	0.29	0.63
Top2Vec	0.78	0.14	0.71
BERTopic	0.91	0.16	0.47
NMF	0.74	0.27	0.57
LDA	0.61	0.19	0.57

KeyNMF achieves the highest balance between diversity and coherence, with particularly strong external coherence, indicating robust alignment between the learned keyword-term matrices and transformer embedding space.

6. Empirical Insights: 2024 European Parliament Election Case Study

Dynamic KeyNMF with $K=10$ revealed interpretable information flow and persistence around major political events. The analysis of news corpora showed that:

Xi Jinping’s European tour (May 5–10) induced spikes in novelty and resonance, with surge topics such as “Paris / state visit” and “President / Xi Jinping.”
Putin’s state visit to China (May 16–17) produced pronounced novelty and resonance peaks, tied to “China News Service” and “Russia / Ukraine / Putin.”
EU parliamentary elections (June 6–9) showed increased novelty and resonance before and after the election, with dominant topics varying by site (e.g., “EU Parliament,” “Spanish PM,” “UK elections,” “Europe overview”).

These findings demonstrate that the joint use of KeyNMF and novelty/resonance analysis detects significant information flows and relates them to concrete topics and real-world events.

7. Limitations and Prospects for Extension

Several limitations of KeyNMF were noted:

Contextual embeddings are truncated to 128 tokens; this may omit subtleties in longer documents.
The dynamic extension maintains a fixed global $W$ ; future work could allow both $W$ and $H$ to evolve smoothly (e.g., with temporal regularization).
Absence of an explicit probabilistic interpretation; a possible direction is to develop a Bayesian NMF variant.
Topic interpretability and causal modeling would benefit from richer metadata (e.g., author, location) and more extensive qualitative analysis.
Further research is needed to link detected topical dynamics to persuasive framing and influence operations.

In summary, KeyNMF constitutes a robust and extensible framework for transformer-aware, non-negative topic modelling and dynamic information flow analysis, as demonstrated in the large-scale study of Chinese diaspora media during sensitive political periods (Kristensen-McLachlan et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to KeyNMF.

KeyNMF: Transformer-Enhanced Topic Modelling

1. Mathematical Formulation of Static KeyNMF

2. Algorithmic Procedure for Model Fitting

3. Dynamic Extension and Temporal Information Dynamics

4. Experimental Workflow and Hyperparameters

5. Comparative Evaluation and Performance Metrics

6. Empirical Insights: 2024 European Parliament Election Case Study

7. Limitations and Prospects for Extension

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

KeyNMF: Transformer-Enhanced Topic Modelling

1. Mathematical Formulation of Static KeyNMF

2. Algorithmic Procedure for Model Fitting

3. Dynamic Extension and Temporal Information Dynamics

4. Experimental Workflow and Hyperparameters

5. Comparative Evaluation and Performance Metrics

6. Empirical Insights: 2024 European Parliament Election Case Study

7. Limitations and Prospects for Extension

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research