Papers
Topics
Authors
Recent
Search
2000 character limit reached

KeyNMF: Transformer-Enhanced Topic Modelling

Updated 18 February 2026
  • KeyNMF is a topic modelling framework that integrates transformer-based contextual embeddings with non-negative matrix factorization for both static and dynamic analysis.
  • It constructs a non-negative keyword–document matrix using cosine similarity and applies multiplicative update rules to optimize factorization performance.
  • Demonstrated on Chinese diaspora media, KeyNMF achieves a strong balance between topic diversity and external coherence, enabling clear analysis of information dynamics.

KeyNMF is a topic modelling framework that integrates transformer-based contextual embeddings with stable non-negative matrix factorization (NMF), designed for both static and dynamic modelling of topical information in large text corpora, particularly in the context of Chinese diaspora media. The approach optimizes topical coherence and diversity while enabling the quantitative analysis of information dynamics over time (Kristensen-McLachlan et al., 2024).

1. Mathematical Formulation of Static KeyNMF

KeyNMF operates on a corpus of DD documents with a vocabulary of VV candidate keywords. Each document dd is embedded as a vector xdREx_d \in \mathbb{R}^E and each candidate keyword ww as vwREv_w \in \mathbb{R}^E via a pre-trained transformer encoder. A non-negative keyword–document matrix MR+D×VM \in \mathbb{R}_+^{D \times V} is constructed such that: Mdw={max(cos(xd,vw),0)if wKd 0otherwiseM_{dw} = \begin{cases} \max(\cos(x_d, v_w), 0) & \text{if } w \in K_d \ 0 & \text{otherwise} \end{cases} where KdK_d is the set of the top NN words most similar to xdx_d by cosine similarity.

The model seeks a low-rank non-negative factorization MWHM \approx W H, with WR+D×KW \in \mathbb{R}_+^{D \times K} and HR+K×VH \in \mathbb{R}_+^{K \times V}, by minimizing

L(W,H)=MWHF2+λWWF2+λHHF2L(W, H) = \|M - W H\|^2_F + \lambda_W \|W\|^2_F + \lambda_H \|H\|^2_F

where the regularization terms are optional (λW=λH=0\lambda_W = \lambda_H = 0 in practice).

2. Algorithmic Procedure for Model Fitting

Parameter optimization is conducted via block-coordinate descent using multiplicative update rules, following principles similar to Cichocki & Phan. The algorithm iteratively alternates updates of HH and WW as follows:

1
2
3
4
5
6
7
8
9
Input: keyword matrix M, topics K, max_iters, tol
Initialize Wrandom_+(D×K), Hrandom_+(K×V)
for iter in 1max_iters:
  H  H  (Wᵀ M)  / (Wᵀ W H + λ_H H)
  W  W  (M Hᵀ) / (W H Hᵀ + λ_W W)
  Compute objective L_new
  if |L_old - L_new|/L_old < tol: break
  L_old  L_new
return W, H
Element-wise multiplication and division are denoted by “⊙” and “/”. Convergence is determined by the relative decrease in LL below a threshold (e.g., 10410^{-4}) or maximum iterations (e.g., 200 or 300).

3. Dynamic Extension and Temporal Information Dynamics

For modelling temporal progression, the corpus is divided into TT time slices. Given submatrices MtM_t and WtW_t for each time slice tt, the method first learns a global W,HW, H across all data, then fixes WtW_t and solves for slice-specific HtH_t by minimizing MtWtHtF2\|M_t - W_t H_t\|_F^2.

Topic activation over time is quantified by: Itj=dtWdj,P^tj=Itjj=1KItjI_{tj} = \sum_{d \in t} W_{dj}, \qquad \hat{P}_{tj} = \frac{I_{tj}}{\sum_{j'=1}^K I_{tj'}} The L1-normalized {P^tj}\{\hat{P}_{tj}\} form pseudo-probability distributions for entropy-based novelty and resonance analysis, enabling the detection and interpretation of real-world event signals in media streams.

4. Experimental Workflow and Hyperparameters

The experimental pipeline includes:

  • Data collection: Scraping five Chinese-language diaspora news sites every six hours between late April and mid-June 2024.
  • Preprocessing: Article body extraction, tokenization using jieba, stopword removal.
  • Embedding: Both documents and candidate words are embedded with paraphrase-multilingual-MiniLM-L12-v2 (sequence truncation: 128 tokens).
  • Modelling parameters:
    • Number of nearest keywords per document N=15N = 15.
    • Topics per site: K{10,25,50}K \in \{10, 25, 50\} (fitted individually).
    • Window size for novelty/resonance: 12 time-points (≈3 days).
    • Smoothing span: 56.
    • NMF solver: max 300 iterations, tolerance 1×1041 \times 10^{-4}.

5. Comparative Evaluation and Performance Metrics

KeyNMF was benchmarked against S³ (Kardos et al. 2024), Top2Vec, BERTopic, two Contextualized Topic Models (CTM), classical NMF, and LDA using the following metrics:

  • Diversity (dd): Unique words across topics.
  • Internal coherence (CinC_{\mathrm{in}}): Average pairwise cosine similarity between topic words in embedding space.
  • External coherence (CexC_{\mathrm{ex}}): Consistency using paraphrase-multilingual MiniLM embeddings.

On all five Chinese news corpora analyzed, KeyNMF outperformed classical NMF and LDA, and was competitive with state-of-the-art contextual models. An example from the Chinanews corpus summarizes typical results:

Model d C_in C_ex
KeyNMF 0.93 0.29 0.63
Top2Vec 0.78 0.14 0.71
BERTopic 0.91 0.16 0.47
NMF 0.74 0.27 0.57
LDA 0.61 0.19 0.57

KeyNMF achieves the highest balance between diversity and coherence, with particularly strong external coherence, indicating robust alignment between the learned keyword-term matrices and transformer embedding space.

6. Empirical Insights: 2024 European Parliament Election Case Study

Dynamic KeyNMF with K=10K=10 revealed interpretable information flow and persistence around major political events. The analysis of news corpora showed that:

  • Xi Jinping’s European tour (May 5–10) induced spikes in novelty and resonance, with surge topics such as “Paris / state visit” and “President / Xi Jinping.”
  • Putin’s state visit to China (May 16–17) produced pronounced novelty and resonance peaks, tied to “China News Service” and “Russia / Ukraine / Putin.”
  • EU parliamentary elections (June 6–9) showed increased novelty and resonance before and after the election, with dominant topics varying by site (e.g., “EU Parliament,” “Spanish PM,” “UK elections,” “Europe overview”).

These findings demonstrate that the joint use of KeyNMF and novelty/resonance analysis detects significant information flows and relates them to concrete topics and real-world events.

7. Limitations and Prospects for Extension

Several limitations of KeyNMF were noted:

  • Contextual embeddings are truncated to 128 tokens; this may omit subtleties in longer documents.
  • The dynamic extension maintains a fixed global WW; future work could allow both WW and HH to evolve smoothly (e.g., with temporal regularization).
  • Absence of an explicit probabilistic interpretation; a possible direction is to develop a Bayesian NMF variant.
  • Topic interpretability and causal modeling would benefit from richer metadata (e.g., author, location) and more extensive qualitative analysis.
  • Further research is needed to link detected topical dynamics to persuasive framing and influence operations.

In summary, KeyNMF constitutes a robust and extensible framework for transformer-aware, non-negative topic modelling and dynamic information flow analysis, as demonstrated in the large-scale study of Chinese diaspora media during sensitive political periods (Kristensen-McLachlan et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to KeyNMF.