Papers
Topics
Authors
Recent
Search
2000 character limit reached

Knowledge Dispersion Index (KDI)

Updated 10 February 2026
  • KDI is a quantitative metric that operationalizes knowledge dispersion through precise mathematical models applied to LLM responses, organizational metrics, and citation structures.
  • It systematically measures diversity and reliability by analyzing response variance, intellectual capital flows, and influence mapping in scientific networks.
  • Empirical analyses show KDI’s effectiveness in benchmarking model accuracy and forecasting scholarly impact, supporting robust model selection and strategic insights.

The Knowledge Dispersion Index (KDI) refers to a class of quantitative metrics designed to capture the spread, @@@@1@@@@, or reliability of knowledge within a system, model, or organization. Multiple KDI formulations exist in the research literature, each contingent on the underlying domain—LLM response evaluation, intellectual capital, scientific influence, field-mapping, or information-theoretic model comparison. All share the goal of operationalizing the concept of “dispersion” as it applies to knowledge, information flow, or citation structure.

1. KDI for LLM Response Dispersion

The most recent instantiation of KDI is as a black-box metric for benchmarking domain-specific knowledge in LLMs without requiring labeled datasets or access to model internals (II, 2024). Here, the KDI—termed “response dispersion”—quantifies the variability in an LLM’s answers to repeated prompts within a fixed domain. The precise definition is as follows:

  • Run N independent generations of an LLM on a fixed opinion-style prompt within a domain (e.g., N=100, prompt asks for a favorite single-word topic in a category).
  • Embed each single-word response tit_i into a vector in Rd\mathbb{R}^d using one of:
    • OpenAI’s text-embedding-3-large (d≈1536)
    • Reference-Sentence-Similarity (RSS) embeddings (d=N, based on pairwise string similarity).
  • Stack responses into an ERN×dE\in\mathbb{R}^{N\times d} embedding matrix.
  • Compute the singular value decomposition E=UΣVE = U\Sigma V^\top, with singular values σ1σ2\sigma_1\ge\sigma_2\ge\dots.
  • Let τ=0.95\tau=0.95 (95% of total variance). Define Var(k)=i=1kσi2i=1mσi2\mathit{Var}(k)=\frac{\sum_{i=1}^{k}\sigma_i^2}{\sum_{i=1}^m\sigma_i^2}, m=min(N,d)m=\min(N,d). The KDI is k=min{k:Var(k)τ}k^* = \min\{k:\mathit{Var}(k)\ge \tau\}.

Thus, KDI specifies the number of principal directions required to account for 95% (or another chosen threshold) of the response space variability for a topic+model pair.

Empirical findings indicate an average Spearman rank correlation of approximately 0.59-0.59 between KDI and gold-standard QA accuracy across models and domains (lower KDI, signifying less diverse answers, corresponds to higher QA accuracy). For pairwise model selection, KDI matches or approximates accuracy-based rankings in roughly 74–89% of cases, depending on chosen tolerance for accuracy trade-off (II, 2024).

2. KDI in Organizational Intellectual Capital

KDI was originally introduced to quantify knowledge flow and intellectual capital at the micro- (organizational) and macro- (economic sector) levels (Dhillon, 2011). The construction is based on aggregating normalized indicators:

  • Micro-KDI: Weighted sum of up to 23 human- and function-oriented organizational metrics (e.g., patents pending, % R&D in basic research, IT capacity). Raw values MiM_i are normalized mi=(Mimini)/(maximini)m_i=(M_i-\min_i)/(\max_i-\min_i), then KDImicro=i=123wimi_{micro}=\sum_{i=1}^{23} w_i m_i with weights wi0w_i\ge 0 s.t. wi=1\sum w_i=1.
  • Macro-KDI: Aggregates flow metrics for industrial (inter-firm technical knowledge) and consumer (public broadcast and media dissemination) sectors. Let Iindustrial,IconsumerI_{industrial},I_{consumer} be suitable flow indices, then KDImacro=βIindustrial+(1β)Iconsumer_{macro} = \beta I_{industrial} + (1-\beta) I_{consumer}.
  • Total KDI: KDItotal=αKDImicro+(1α)KDImacroKDI_{total} = \alpha KDI_{micro} + (1-\alpha) KDI_{macro}, with analyst-chosen weights α,β[0,1]\alpha,\beta\in[0,1].

Theoretical extension involves mapping knowledge flow as a network graph with nodes, capacity constraints, and self-correcting equilibria (modeled analogously to max-flow/min-cut). Robustness is assessed by studying network “super-families” and propagation under perturbations (Dhillon, 2011).

3. KDI via Influence Dispersion in Citation Networks

Another formalization of KDI is tied to the topological structure of a scientific paper’s influence in citation networks, notably through the Influence Dispersion Tree (IDT) model (Mohapatra et al., 2019). The construction is as follows:

  • Build the IDT for a focal paper PP with nn citing papers CPC_P, assigning edges so as to maximize tree depth (parent selection favors farthest ancestor).
  • Compute the Influence Dispersion Index (IDI): IDI(P)=Leaves(TP)distT(P,)IDI(P) = \sum_{\ell\in\text{Leaves}(T_P)} \text{dist}_T(P,\ell).
  • The Normalized Influence Divergence (NID) is defined as NID(P)=IDI(P)nIDImax(n)nNID(P) = \frac{IDI(P)-n}{IDI^{max}(n)-n}, where IDImax(n)IDI^{max}(n) is the maximal dispersion tree given nn.
  • The Knowledge Dispersion Index is then KDI(P)=1NID(P)KDI(P) = 1 - NID(P), so KDI[0,1]KDI\in[0,1] with higher values denoting balanced dispersion (depth \approx breadth n\approx \sqrt{n}).

Empirical analysis on large bibliometric corpora shows that KDIKDI is more predictive than simple citation counts in forecasting future impact and identifying highly influential papers (Mohapatra et al., 2019).

4. KDI as Diversity-Coherence Composite in Knowledge Mapping

KDI is also used as a composite metric integrating cognitive diversity and coherence, especially in knowledge integration and diffusion studies (Rafols, 2014). The method proceeds as:

  • Diversity (Rao–Stirling): DRS=i=1Nj=1NpipjdijD_{RS} = \sum_{i=1}^N\sum_{j=1}^N p_i p_j d_{ij}, where pip_i is categorical share and dijd_{ij} the cognitive distance (often $1$ minus cosine similarity).
  • Coherence: CRS=i=1Nj=1NijdijC_{RS} = \sum_{i=1}^N\sum_{j=1}^N \ell_{ij} d_{ij}, where ij\ell_{ij} indicates relational intensity (e.g., citation, co-occurrence).
  • Composite KDI: KDI=wD(DRS/DRSmax)+wC(CRS/CRSmax)KDI = w_D(D_{RS}/D_{RS}^{max}) + w_C(C_{RS}/C_{RS}^{max}), with normalized terms and wD+wC=1w_D+w_C=1.

The approach is sensitive to the categorization scheme and the definition of cognitive distance. Visualization overlays map diversity and coherence values onto “basemaps” of science or technology (Rafols, 2014).

5. KDI as Dispersion Index in Information Theory

In information theory, KDI describes the variance of the pointwise Kullback–Leibler (KL) divergence between two probability distributions ff and gg (Buono et al., 2021). The definition is:

VarK(f ⁣: ⁣g)=f(x)log2f(x)g(x)dx[f(x)logf(x)g(x)dx]2.\mathrm{VarK}(f\!:\!g) = \int f(x) \log^2 \frac{f(x)}{g(x)}\, dx - \left[\int f(x) \log \frac{f(x)}{g(x)}\, dx \right]^2.

This quantity measures the spread (reliability) of the instantaneous KL divergence and is used to supplement the mean divergence in model selection via a mean–variance trading rule. KDI is always nonnegative, vanishes iff f=gf=g almost surely, and is not a metric. It is particularly recommended when models have similar average divergence but differ in variance of fit (Buono et al., 2021).

6. Cross-Contextual Properties and Implementation Considerations

Despite domain diversity, KDIs share systematic features:

  • Normalization and Intelligibility: KDIs are either directly interpretable as (dimension-count, network) indices or are normalized to [0,1][0,1] for comparison.
  • Embedding and Representation: Construction may depend on data embedding (e.g., response embeddings for LLMs, vectorized categories for knowledge mapping, trees for citation analysis).
  • Robustness: KDI metrics must be stress-tested against category choice, sample size, and parametrization (e.g., choice of thresholds τ\tau or normalization bases).
  • Empirical Utility: Where validated, KDIs serve as surrogates or complements to more direct but expensive measures (e.g., QA accuracy, human-judged impact, manual diversity analyses).
  • Limiting Factors: Predictive power and robustness are context sensitive—e.g., KDI may underperform in domains with inherently high answer variability or ill-defined conceptual distances.

7. Summary Table: Major KDI Variants

Context Definition/Formula Application Domain
LLM Response Dispersion min{k:Var(k)τ}\min\{k: \mathit{Var}(k)\ge \tau\} in SVD of response embeddings LLM benchmarking (II, 2024)
Organizational Knowledge Weighted sum of normalized micro/macro indicators Intellectual capital (Dhillon, 2011)
Citation Influence $1 -$ Normalized Influence Divergence (NID) Scholarly impact (Mohapatra et al., 2019)
Diversity-Coherence Mapping wDDRS/DRSmax+wCCRS/CRSmaxw_D D_{RS}/D_{RS}^{max} + w_C C_{RS}/C_{RS}^{max} Science mapping (Rafols, 2014)
Information-Theoretic VarK(f ⁣: ⁣g)\mathrm{VarK}(f\!:\!g) variance of log(f/g)\log(f/g) Model selection (Buono et al., 2021)

Each KDI variant is anchored in a precise mathematical formulation and an explicit workflow, as detailed in the cited works. Adherence to these domain-specific definitions is necessary for valid computation, interpretation, and comparative analysis.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Knowledge Dispersion Index (KDI).