Demographic Salience Score Overview
- Demographic Salience Score (DSS) is a metric family that quantifies the prominence and retention of demographic attributes in network data and generated summaries.
- The framework employs entropy calculations, normalization, and entity matching to provide interpretable, mathematically grounded measures across identity-graph and LLM summary contexts.
- Empirical analyses using DSS reveal significant discrepancies in demographic representation, offering a tool for diagnosing bias in social media follow patterns and biomedical summarization.
The Demographic Salience Score (DSS) is a family of metrics conceived to quantify the prominence and retention of demographic characteristics within relational or generated data—most notably applied in the analysis of social media follow patterns across identity dimensions (Fulay et al., 2023) and in the evaluation of demographic fidelity in LLM summaries of biomedical evidence (Aghaebe et al., 8 Nov 2025). DSS frameworks formalize the notion of salience as the degree to which a particular demographic dimension stands out or is preserved, offering direct, mathematically interpretable measures for both internal (ego-centric) and external (audience-centric) perspectives, as well as for entity retention and hallucination within generative workflows.
1. Formalization and Mathematical Foundations
DSS is contextually instantiated for two distinct data modalities:
Identity-Graph DSS (Fulay et al., 2023):
- Ego-centric DSS: For user and dimension , is calculated to reflect the normalized salience of among ’s followees in set (influencers).
- Audience-centric DSS: For influencer , is defined as the mean over all audience members who follow .
For categorical dimensions (e.g., race, gender), the process is:
- Count followee-category distributions:
- Calculate entropy:
- Z-normalization and negation:
where and are the mean and standard deviation of over all .
For partially-tagged dimensions (e.g., religion, politics, LGBTQIA+):
Entity-Retention DSS in Document Generation (Aghaebe et al., 8 Nov 2025):
- Entity Retention Score (ERS):
- Hallucination Penalty (HP):
where includes entities matched by exact string or cosine-similarity .
- Over-length Penalty (OP):
- Adjusted Hallucination:
- Raw DSS:
- Normalized DSS (clipped in ):
Parameters are tunable; normalizes across cases.
2. Computational Pipeline and Workflow
Social Graph DSS (Fulay et al., 2023):
- Step 1: Sample influencer set , audience set (with minimum engagement threshold).
- Step 2: Tag influencers via semi-automated intersection of Wikipedia categories and external lists for partially-tagged dimensions.
- Step 3: Construct binary followee matrix by crawl/query.
- Step 4: For each user and dimension:
- Categorical: build , compute entropy , z-score to .
- Tagged: compute (ratio), z-score to .
- Step 5: For influencers, average audience to obtain .
LLM Summary DSS (Aghaebe et al., 8 Nov 2025):
- Step 1: Extract gold entities from reference abstracts (regex, LLM-assisted NER).
- Step 2: Extract entities from generated summaries by identical pipeline.
- Step 3: Compute ERS (retention), HP (hallucination), OP (overlength) per summary.
- Step 4: Aggregate DSS scores per review and age stratum.
Pseudocode formalizes the procedure for reproducibility and scaling in automated pipelines.
3. Empirical DSS Analysis and Interpretation
Identity-Based DSS (Fulay et al., 2023)
Across influencers:
- Race: Salience markedly higher in ego-profiles versus audience-centric profiles (mean difference z-units, ).
- Religion & Politics: Audiences display higher salience than influencers themselves (mean differences and , both ).
- Gender & LGBTQIA+: Pronounced right-skew observed; some influencer cliques (notably athletes) exhibit near-exclusive same-gender followership.
Significance robust under bootstrap, paired t-test, Wilcoxon, and KS test with Bonferroni correction at .
LLM Summary DSS (Aghaebe et al., 8 Nov 2025)
Empirically, DSS distributions by age group and model:
| Age Group | Model | ERS | HP | Omission | DSS |
|---|---|---|---|---|---|
| Adults | GPT-4.1 Nano | 0.81 | 0.12 | 0.19 | 0.69 |
| Qwen-2.5 | 0.78 | 0.74 | 0.22 | 0 | |
| Longformer | 0.45 | 0.18 | 0.50 | 0.27 | |
| Children | GPT-4.1 | 0.84 | 0.12 | 0.16 | 0.72 |
| Qwen-2.5 | 0.97 | 0.58 | 0.02 | 0 | |
| Longformer | 0.91 | 0.33 | 0.09 | 0.63 | |
| OlderAdult | GPT-4.1 | 0.92 | 0.14 | 0.08 | 0.78 |
| Qwen-2.5 | 0.98 | 0.11 | 0.02 | 0.79 | |
| Longformer | 0.95 | 0.07 | 0.05 | 0.78 |
Key observations:
- DSS : Very high fidelity (older adults best preserved).
- DSS $0.5$–$0.8$: Moderate fidelity (children).
- DSS : Poor fidelity (adults under-represented, frequent omission/hallucination).
- Qwen-2.5 produces high entity counts but simultaneously high hallucination rates, nullifying DSS gains ( for adults/children).
- GPT-4.1 Nano exhibits balanced, consistently superior demographic fidelity.
4. Conceptual Significance and Use Cases
DSS provides:
- An interpretable score for demographic prominence, facilitating direct comparison across user groups or model outputs.
- A mechanism for identifying "bridging" influencers: high DSS gap (ego-centric vs. audience-centric) indicates capacity to channel diverse perspectives (e.g., Dolly Parton as a bridge to women, Allen Iverson to communities of color).
- In generative systems, DSS quantifies the preservation of demographic specificity, supporting diagnostic and regulatory practices in biomedical evidence synthesis.
A plausible implication is the adoption of DSS as a diagnostic tool for identifying representational bias, evaluating fairness interventions, and guiding post-hoc review protocols where demographic fidelity is critical (e.g., medical guideline summarization, social platform diversity analysis).
5. Limitations and Recommendations
Limitations:
- Coverage: DSS as implemented explores only select axes—race, gender, religion, politics, LGBTQIA+ (influencer graphs) or age (LLM summaries); latent/confounding attributes (age, occupation, nationality) are acknowledged but not measured.
- Entity extraction and tagging: Reliance on Wikipedia categories and curated lists introduces selection and coverage bias.
- Proxy nature: Following patterns and summary entity retention are noisy proxies; causal inference regarding underlying divergence drivers is explicitly out of scope.
- Generalization: Weighting and normalization choices (e.g., , ) may require contextual tuning.
Recommendations:
- Incorporate DSS alongside standard metrics (BLEU, BERTScore, FactCC) to detect representational bias.
- Use demographic-aware prompting with caution: slot-filling or two-stage pipelines may control over-generation and hallucination.
- Post-hoc entity-to-gold matching and review protocols (flag summaries when DSS ).
- Extend DSS to additional dimensions (gender, race), and explore demographic-anchored objectives in model training/fine-tuning.
- Expand real-time monitoring, feedback dashboards, and intersectional/dynamic DSS analytics for high-stakes intervention.
6. Illustrative Examples and Practical Impact
Social Identity Bridging: Dolly Parton's ego-centric gender score () vs. audience mean () produces a DSS gap of $3.3$, exemplifying her role as a conduit for women's representation.
Generative Model Hallucination: Qwen in a child-focused review retained most gold age entities (high ERS) but fabricated "prisoners" as a descriptor (high HP), resulting in . Longformer's omission of "midlife women" in adult reviews yielded , . For older adults, all models demonstrated high retention ().
A plausible implication is that DSS is uniquely capable of revealing both overt and latent representational gaps in machine-generated and social attention landscapes, guiding interventions for improved demographic coverage and fidelity.
7. Extensions, Future Directions, and Contextual Integration
Proposed DSS extensions involve:
- Incorporating additional data modalities (retweet/mention graphs), developing intersectional salience analytics (e.g., race gender), and supporting time-varying, event-driven salience monitoring.
- Broadening the schema to encompass further axes of sensitive information, tailored to the demands of fairness-aware systems, federated social analysis, and biomedical NLP.
- Application in evaluating pipeline-level bias, guiding influencer network gatekeeper identification, diversity interventions, and real-time demographic shift reporting.
Collectively, the Demographic Salience Score forms an evidentiary and analytical backbone for quantitative demographic analysis in both network-centric and generative AI workflows, enabling systematic diagnosis and remediation of under-represented group bias and facilitating more equitable information ecosystems.