Embedding Democratic Values into Social Media AIs via Societal Objective Functions

Published 26 Jul 2023 in cs.HC and cs.AI | (2307.13912v3)

Abstract: Can we design AI systems that rank our social media feeds to consider democratic values such as mitigating partisan animosity as part of their objective functions? We introduce a method for translating established, vetted social scientific constructs into AI objective functions, which we term societal objective functions, and demonstrate the method with application to the political science construct of anti-democratic attitudes. Traditionally, we have lacked observable outcomes to use to train such models, however, the social sciences have developed survey instruments and qualitative codebooks for these constructs, and their precision facilitates translation into detailed prompts for LLMs. We apply this method to create a democratic attitude model that estimates the extent to which a social media post promotes anti-democratic attitudes, and test this democratic attitude model across three studies. In Study 1, we first test the attitudinal and behavioral effectiveness of the intervention among US partisans (N=1,380) by manually annotating (alpha=.895) social media posts with anti-democratic attitude scores and testing several feed ranking conditions based on these scores. Removal (d=.20) and downranking feeds (d=.25) reduced participants' partisan animosity without compromising their experience and engagement. In Study 2, we scale up the manual labels by creating the democratic attitude model, finding strong agreement with manual labels (rho=.75). Finally, in Study 3, we replicate Study 1 using the democratic attitude model instead of manual labels to test its attitudinal and behavioral impact (N=558), and again find that the feed downranking using the societal objective function reduced partisan animosity (d=.25). This method presents a novel strategy to draw on social science theory and methods to mitigate societal harms in social media AIs.

Abstract PDF HTML Upgrade to Chat

References (112)

Citations (21)

View on Semantic Scholar

Summary

The paper presents a societal objective function that integrates democratic values into social media algorithms to reduce partisan animosity.
It leverages manual ratings and GPT-4 to operationalize social scientific constructs at scale with a high correlation (ρ = .75).
Experimental studies demonstrate that democratic attitude feeds lower polarization without negatively impacting user engagement.

Introduction

The paper "Embedding Democratic Values into Social Media AIs via Societal Objective Functions" (2307.13912) proposes a novel approach to integrate democratic values into social media algorithms. The authors address the problem of social media platforms exacerbating partisan animosity due to their engagement-based ranking systems. To combat this, they develop a societal objective function that incorporates social scientific constructs, specifically focusing on reducing anti-democratic attitudes and partisan animosity.

Methodology

The methodology involves three main steps: identifying well-established social scientific constructs related to democratic values, operationalizing these constructs through manual ratings, and scaling up the ratings using LLMs such as GPT-4.

Figure 1: Steps of our societal objective function method: (1) Identify a well-established social scientific construct, (2) Operationalize the construct with manual rating methods, (3) Scale up the ratings with algorithmic methods using an LLM.

The research employs three studies to test their approach:

Study 1: It involved manually annotating social media posts to measure their anti-democratic attitudes and testing different feed ranking conditions. The results showed that feeds employing democratic attitude models reduced partisan animosity significantly compared to traditional engagement-based feeds.
Study 2: This study scaled the manual annotation by utilizing GPT-4 to automate the democratic attitude ratings. The automated ratings showed a high correlation ( $\rho = .75$ ) with manual efforts, making it feasible to apply at scale.
Study 3: The final study replicated the intervention using the automated model from GPT-4, confirming similar reductions in partisan animosity as seen in Study 1.

Results

The results of these studies suggest that democratic attitude feeds—particularly those using downranking and removal-and-replacement strategies—effective in reducing partisan animosity without impacting user engagement negatively.

Figure 2: Summary of our seven feed ranking conditions indicating the influence of democratic attitude feeds versus traditional methods.

Figure 3: Website Interface of Democratic Attitude Feeds showing interfaces for different feed conditions.

The findings illustrate a promising method to directly integrate societal values into algorithmic objectives, providing a clear pathway for embedding complex social constructs into AI systems.

Discussion

The implications of this research are vast, spanning both practical and theoretical domains. Practically, the work suggests a feasible approach for social media platforms to mitigate societal harms associated with polarization by embedding pro-democratic values directly into their algorithmic structures. Theoretically, it extends the existing literature on algorithmic design by blending social scientific constructs with algorithmic objectives through LLMs, offering a template for similar applications across various societal values.

Importantly, the study highlights the potential for LLMs to replicate complex human judgments at scale, facilitating broader implementation of socially conscious AI systems. However, it also underscores the need for ongoing evaluation and refinement of these models to ensure their alignment with established social values and ethical standards.

Figure 4: Means of Partisan Animosity Across Conditions (Divided by Parties), demonstrating the effectiveness of democratic attitude feeds.

Conclusion

This research bridges the gap between social science and AI by demonstrating a viable method to encode democratic values into social media AIs, offering a tool to reduce partisan animosity while maintaining user engagement. The societal objective functions developed in this work set a precedent for integrating diverse social science constructs into technological systems, promoting algorithms that are not only engaging but also socially responsible. Future research may explore the application of this approach to other areas, such as mental health and cultural diversity, using their respective societal constructs.

Markdown Report Issue

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

Embedding Democratic Values into Social Media AIs via Societal Objective Functions

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Glossary

off on

Practical Applications

off on

Conceptual Simplification

off on

Explain it Like I'm 14

Overview

This paper asks a big question: can we design social media to support healthy democracy? The authors show a new way to teach the AI systems that sort our feeds to care about democratic values—especially reducing “partisan animosity,” which means disliking and attacking people from the other political party. They turn ideas from social science into clear goals the algorithm can follow, then test whether feeds built around those goals make people less hostile without making the app less enjoyable.

Key Objectives and Questions

The paper focuses on two simple goals:

Can a social media algorithm lower partisan animosity and anti‑democratic attitudes?
Can it do this without hurting how much people like or use the platform?

It also asks:

Can we convert trusted social science measures into instructions an AI understands?
Will an AI that follows these values work as well as careful human reviewers?

How the Research Was Done

Think of a social media feed as a playlist that an algorithm constantly reorders. Today, most feeds are sorted to maximize clicks and likes (engagement). That can push drama and conflict to the top, which can increase political anger.

The authors designed a different “goal” for the feed: reduce anti-democratic attitudes. To do that, they used a proven checklist from political science, then taught both human reviewers and an AI to score posts using that checklist.

Step 1: Manually score posts (human judges)

The team collected real political posts from Facebook using a public tool called CrowdTangle.
Two trained reviewers read each post and rated how much it encouraged eight anti‑democratic attitudes, using a simple 1–3 scale (3 = strongly promotes it).
The reviewers agreed with each other a lot (a reliability score called Krippendorff’s alpha was 0.895), which means the checklist was clear and consistent.
They then re‑ranked feeds so posts that seemed to increase anti‑democratic attitudes appeared lower or were removed/replaced.

The eight attitudes they measured were:

Partisan animosity (hostility toward the other party)
Support for undemocratic practices (like rejecting election results)
Support for partisan violence
Support for undemocratic candidates
Opposition to bipartisan cooperation
Social distrust (mistrust in people generally)
Social distance (avoiding people from the other party)
Biased evaluation of politicized facts (rejecting facts that help the other side)

Step 2: Teach an AI to score posts (AI “judge” using a checklist)

Next, they turned the same checklist into instructions for a LLM (GPT‑4).
The AI rated each post on the eight attitudes (zero‑shot classification, meaning the AI used instructions without extra training data).
The AI’s scores matched human scores closely (Spearman’s rho = 0.75), which is strong agreement.

Step 3: Test the redesigned feeds with people

They ran online experiments with U.S. partisans (Democrats and Republicans).
Study 1: used human scores to build feeds; 1,380 people.
Study 3: used the AI’s scores to build feeds; 558 people.
They compared several feed designs:
- Downranking: move posts that boost anti‑democratic attitudes lower in the feed.
- Content warning: blur those posts with a warning.
- Remove-and-replace: delete those posts and swap in posts that promote healthier attitudes.
- Engagement-based: sort by likes/shares (a typical current feed).
- Ideologically balanced: mix posts from both parties.
- Chronological: show posts in time order.
- Null control: show no feed (baseline).

Main Findings and Why They Matter

Downranking and remove‑and‑replace feeds lowered partisan animosity compared to a standard engagement‑based feed.
- Study 1 effect sizes: d = 0.20 (removal) and d = 0.25 (downranking).
- Study 3 (with the AI model): downranking again reduced animosity (d = 0.25).
Importantly, people didn’t enjoy the feed less and didn’t engage less. This suggests you can make the feed kinder without making it boring.
They checked free‑speech worries: feeds that re‑ranked or removed posts did not make people feel their speech was being threatened.
The AI model could scale manual ratings while staying close to human judgments (rho = 0.75), which means this approach can work at large sizes needed for real platforms.

Why this matters: Social media algorithms usually chase engagement, which can reward angry, divisive content. This study shows that if we give the algorithm a new goal—protect democratic values—it can reduce hostility while keeping users satisfied. That’s a big deal for building healthier online spaces.

Implications and Potential Impact

This paper introduces “societal objective functions”—clear, testable goals for algorithms based on trusted social science. It proves you can:

Translate complex values (like pro‑democracy attitudes) into a scorecard an AI can follow.
Re‑rank feeds using that scorecard to reduce harmful outcomes (like partisan animosity).
Scale the approach with AI so it could work in real products.

Beyond democracy, the same method could be used for other values people care about—such as well‑being, cultural diversity, or sustainability—by choosing the right social science measures and building them into the algorithm’s goals. In short, this is a practical path to make social media AIs not just engaging, but also better for society.

View Paper Prompt View All Prompts

Knowledge Gaps

Knowledge gaps, limitations, and open questions

The paper presents a promising method to embed democratic values in social media AIs via societal objective functions, but it leaves several issues unresolved. Future research could address the following specific gaps:

External validity of interventions: Effects are tested in a simulated, isolated feed (PolitiFeed), not in live platform environments with real-time feedback loops, social network effects, and algorithmic interference (e.g., recommender learning, resharing cascades).
Short-term exposure only: Outcomes are measured immediately after a one-time exposure to 60 posts; the durability, decay, or cumulative effects of these interventions over days/weeks/months remain unknown.
Limited participant population: Studies focus on U.S. partisans (Democrats and Republicans). Effects for independents, non-partisans, other political identities, and in other countries and cultural contexts are untested.
Narrow content source and timeframe: Stimuli are public political Facebook Page posts collected from CrowdTangle over one month (Jan–Feb 2023). Generalization to personal posts, comments, multimodal content (images, video, audio), other platforms (X/Twitter, TikTok, YouTube, Instagram), and broader time windows is unexplored.
Construct-to-post validity: The eight anti-democratic attitude variables originate from survey measures of people; their translation to post-level “promotion of anti-democratic attitudes” is not validated against real-world outcomes (e.g., changes in behavior, cross-party interactions), raising questions about criterion and predictive validity.
Aggregation and weighting choices: The approach sums eight subscales with equal weights into a single score. No sensitivity analysis explores alternative weightings, multi-objective optimization (e.g., Pareto front), or context-dependent weighting (e.g., election periods, crisis events).
Error cost analysis: The paper does not quantify harms from false positives/negatives (e.g., mistakenly downranking legitimate reporting, satire, or minority political expression) or propose safeguards to mitigate misclassification damage.
Fairness and ideological balance: Potential asymmetric impacts on different ideologies, minority groups, or issue domains are not audited. It remains unclear whether the model disproportionately penalizes one party or specific rhetorical styles.
Cultural and linguistic generalization: Prompts and labels are in English; applicability to other languages and culturally specific political discourse is untested.
LLM dependence and robustness: The democratic attitude model relies on GPT-4 zero-shot prompts. Stability across model versions, vendor changes, prompt drift, reproducibility, and latency/cost constraints for production-scale deployment are not assessed.
Benchmarking against alternative modeling approaches: No comparison to supervised classifiers, rule-based systems, or hybrid human-in-the-loop pipelines; the relative accuracy, interpretability, and operational costs of alternatives are unknown.
Adversarial resilience: The model’s susceptibility to gaming (evasion through phrasing, coded language, irony, deliberate ambiguity) and its ability to detect coordinated manipulation campaigns are not evaluated.
Side effects on information diversity and civic engagement: Downranking/removal may reduce exposure to important but uncomfortable news, potentially affecting civic knowledge, issue salience, or engagement; these trade-offs are not measured.
Perceptions of censorship and legitimacy: While the study reports no increased perceived free-speech threats in the lab, real-world perception dynamics (media coverage, political elites’ framing, platform trust) could differ significantly and are untested.
Engagement and business metrics: Lab-based satisfaction and engagement proxies may not reflect long-term retention, creator incentives, revenue impacts, or ecosystem health on production platforms.
Governance and value pluralism: Who defines “democratic values,” how they evolve over time, and how conflicting societal values are arbitrated (e.g., free expression vs. harm reduction) are unresolved; participatory, multi-stakeholder governance mechanisms are not specified.
Operational integration: The paper does not address how to integrate societal objective functions with existing multi-objective ranking pipelines (e.g., engagement, watch time, quality), including constraint handling, online learning, and A/B testing infrastructure.
Legal and policy implications: The legal risks of remove-and-replace or downranking (e.g., transparency obligations, viewpoint discrimination concerns, jurisdictional differences) are not analyzed.
Dataset representativeness and size: The labeled inventory is 405 posts, sampled across engagement buckets. Coverage of niche communities, local politics, and low-engagement content is limited; effects may vary with different content distributions.
Annotation process limitations: Ratings are produced by two expert annotators with high agreement; generalizability to larger, more diverse annotator pools and the reproducibility of the codebook across teams are uncertain.
Heterogeneous treatment effects: Beyond “holds for conservatives and liberals,” finer-grained heterogeneity (e.g., age, media literacy, political knowledge, extremity, trust) and context-specific moderators are not explored.
Mechanisms of change: The psychological pathways by which re-ranking reduces animosity (e.g., reduced exposure to incitement, increased exposure to bridging content, altered affect) are not measured, limiting theory-building and intervention refinement.
Coverage of all eight subscales: The abstract emphasizes reductions in partisan animosity; impacts on other subscales (support for undemocratic practices, violence, etc.) are not consistently reported or decomposed, leaving unclear which sub-dimensions are most influenced.
Content warning UX design: The specific warning design, its interpretability, and potential backfire effects (e.g., curiosity, reactance) are not analyzed; alternative UX variants could produce different outcomes.
Supply constraints for “pro-democratic” content: Remove-and-replace feasibility depends on sufficient inventory; dynamic supply, topical relevance, and content freshness are not addressed.
Privacy and data rights: The implications of using LLMs to process user-generated content (data governance, consent, retention, cross-border data flows) are not discussed.
Multi-value extension roadmap: While the paper proposes broader societal objectives (e.g., wellbeing, diversity), concrete methodologies for value elicitation, construct selection, conflict resolution, and cross-value optimization remain open.
Synergy and interactions with existing interventions: Combined effects with ideologically balanced feeds, bridging-based ranking, or misinformation defenses are not tested; complementary or antagonistic interactions remain unknown.
Calibration and thresholding: The paper uses a 3-point scale per subscale summed to a total score; optimal thresholds for downranking/removal, confidence calibration, and uncertainty-aware decisions are not identified or evaluated.

View Paper Prompt View All Prompts

Glossary

Affective polarization: Emotional hostility toward members of the opposing political group, distinct from disagreement on issues. "we focus on affective polarization instead of issue polarization"
Anti-democratic attitudes: A composite social science construct spanning eight variables that assess willingness to engage in democratic norms. "This measure, which was recently tested in a large study that received widespread attention~\cite{voelkel2023megastudy}, spans eight variables that describe willingness to engage in good faith in the democratic process: partisan animosity, support for undemocratic practices, support for partisan violence, support for undemocratic candidates, opposition to bipartisanship, social distrust, social distance, and biased evaluation of politicized facts."
Biased evaluation of politicized facts: Tendency to judge facts through a partisan lens that favors one’s own side. "biased evaluation of politicized facts"
Bridging-based ranking: A feed-ranking approach intended to surface content that builds trust across divides. "bridging-based ranking to build trust across divides"
Chronological feed: A feed ordered by time rather than algorithmic engagement. "a traditional engagement-based feed or a chronological feed"
Constitutional AI: A technique to steer AI models using a set of explicit principles. "such as Constitutional AI~\cite{bai2022constitutional} and reinforcement learning from human feedback (RLHF)~\cite{ouyang2022training} present technical methods for holistically steering AI models"
Content moderation: Policies and models for detecting and removing content that violates platform rules. "Content moderation models are a common strategy to ensure that content does not violate platform policies or community guidelines~\cite{gillespie2018custodians}"
Content Warning feed: A feed design that masks certain posts with warnings while preserving ranking. "Content Warning feed which mirrors designs commonly used by real-world social media platforms to mask harmful content"
CrowdTangle: A Meta-hosted tool for monitoring public Facebook posts and their engagement. "CrowdTangle, a tool hosted by Meta that allows external parties to monitor public posts on Facebook"
Democratic attitude feed: A re-ranked social media feed that optimizes for pro-democratic outcomes using a model’s scores. "The democratic attitude feeds incorporate our anti-democratic attitude model with either: (1) Downranking, (2) Content Warning, or (3) Remove-and-Replace feeds."
Democratic attitude model: An AI model that scores posts on their propensity to promote anti-democratic attitudes. "create a democratic attitude model that estimates the extent to which a social media post promotes anti-democratic attitudes"
Downranking: A ranking intervention that lowers the position of content deemed harmful or undesirable. "Removal ( $d = .20$ ) and downranking feeds ( $d = .25$ ) reduced participants' partisan animosity"
Echo chambers: Information environments that isolate users within ideologically homogeneous content. "echo chambers (or selective exposure)"
Engagement-Based feed: A feed ranked by interactions such as likes, shares, and comments. "Engagement-Based feed"
Feeling thermometer: A survey measure of warmth or favorability toward a group. "measure this with ratings of warmth on a feeling thermometer"
G*Power: Statistical software used to conduct power analyses for experimental design. "a priori power analysis using G*Power determined that a total sample of at least 1,369 participants would be needed"
Ideologically Balanced feed: A feed designed to present content from multiple political perspectives. "Ideologically Balanced feed"
Inter-coder reliability: A statistic assessing agreement between multiple human annotators. "The two independent coders achieved strong inter-coder reliability (Krippendorff's $\alpha$ = .895)"
Krippendorff's alpha: A reliability coefficient for measuring agreement among annotators. "Krippendorff's $\alpha$ = .895"
LLM: A class of AI systems trained on vast text corpora to perform language tasks. "LLMs such as GPT-4"
Megastudy: A large multi-intervention study conducted at scale to compare effects. "The Voelkel et al. megastudy found that almost all of their interventions successfully reduced partisan animosity"
Null control: An experimental condition where participants receive no treatment or stimulus. "a Null Control where no feed is shown"
Open Science Framework (OSF): A platform for preregistering studies and sharing research materials. "We pre-registered our research questions and hypotheses on Open Science Framework (OSF)"
Operationalize: To translate a theoretical construct into measurable procedures or variables. "Operationalize the construct with manual rating methods"
Partisan animosity: Negative thoughts, feelings, and behaviors toward the opposing political group. "A key outcome of interest in a healthy democracy is partisan animosity: negative thoughts, feelings and behaviours towards a political out-group."
Partisan sorting: A mechanism of polarization driven by exposure beyond local networks, rather than isolation. "partisan sorting, whereby polarization is not driven by isolation"
Partyism: Hostility and aversion toward a specific political party. "others introduce new terms or measurements such as 'partyism' to describe the hostility and aversion to a certain political party"
Reinforcement Learning from Human Feedback (RLHF): A method to align AI behavior using human-provided preferences. "reinforcement learning from human feedback (RLHF)~\cite{ouyang2022training}"
Remove-and-Replace feed: A feed where harmful posts are removed and replaced with pro-democratic content. "Remove-and-Replace feed (i.e., ranked by engagement, but anti-democratic posts are replaced with pro-democratic posts sourced from our dataset inventory ( $n$ = 405))"
Selective exposure: The tendency to consume information that aligns with existing beliefs. "echo chambers (or selective exposure)"
Societal objective function: A method to encode social science constructs as AI optimization targets. "We introduce the term societal objective function to refer to our method of translating well-established social science constructs into an AI objective function."
Spearman's rho: A nonparametric correlation coefficient based on rank-order. "Spearman's $\rho$ = .75"
Systematic random sampling: A sampling technique selecting items at regular intervals from an ordered list. "We then use the systematic random sampling method to select a final inventory"
Zero-shot prompting: Instructing a model to perform a task without task-specific training examples. "zero-shot prompting with a LLM"

View Paper Prompt View All Prompts

Practical Applications

Immediate Applications

Below is a concise set of practical, deployable use cases that can be implemented with today’s tooling (LLMs, ranking systems, content moderation pipelines), drawing directly from the paper’s method (societal objective functions), model (democratic attitude scores), and empirical findings (downranking/removal reduced partisan animosity without harming engagement).

Social media feed re-ranking to mitigate partisan animosity
- Sector: Software (social media/recommenders), Civic tech
- What: Integrate the democratic attitude model as a scoring feature in ranking pipelines to downrank content that promotes anti-democratic attitudes; optionally use removal-and-replacement or content warnings for high-risk items.
- Tools/Workflows: LLM-based scoring service; codebook-derived prompts; feature ingestion into multi-objective rankers; A/B testing with attitudinal outcome metrics; UI patterns for warnings and replacements.
- Assumptions/Dependencies: Access to posts and ranking stack; LLM inference reliability and latency; calibration to local contexts; governance to address free-speech concerns; willingness to trade minor engagement shifts for societal benefits.
Brand safety and ad suitability filters
- Sector: Advertising/Ad tech
- What: Use the democratic attitude model to prevent ad placement adjacent to content that elevates anti-democratic attitudes, protecting brands and reducing systemic risks.
- Tools/Workflows: Pre-bid/post-bid suitability checks; supply quality scoring; integration with existing brand safety taxonomies.
- Assumptions/Dependencies: Platform inventory access; standardized thresholds; clear advertiser policies; potential partisan bias audits.
Content moderation triage and review prioritization
- Sector: Trust & Safety
- What: Complement policy-based moderation by triaging borderline content for human review using anti-democratic attitude scores; accelerate escalation for posts that score high on support for undemocratic practices or partisan violence.
- Tools/Workflows: Queue prioritization; reviewer dashboards; policy decision logs; feedback loops to refine prompts.
- Assumptions/Dependencies: Reviewer capacity; clarity on non-policy but socially risky content; safeguards against over-removal/censorship perceptions.
“Democracy-friendly mode” user toggle
- Sector: Consumer product (social platforms), UX
- What: Offer an opt-in setting that prioritizes pro-democratic content and downranks content associated with animosity, violence, or undemocratic norms—shown in the paper to reduce partisan animosity without hurting experience/engagement.
- Tools/Workflows: User settings; on-device or server-side re-ranking; transparent explanations; preference logging.
- Assumptions/Dependencies: User acceptance; explainability; impact monitoring; localized codebooks.
Creator-side “post risk scorecard” and rewrite assistant
- Sector: Creator tools, Publishing, Newsrooms
- What: Provide authors with pre-publication scores across the eight sub-dimensions (e.g., partisan animosity, support for violence) plus suggested edits to reduce inflammatory framing.
- Tools/Workflows: LLM scoring + generative rewrite suggestions; CMS plugins; editorial QA checklists.
- Assumptions/Dependencies: Editorial buy-in; potential tension with persuasive goals; transparency to avoid chilling effects.
News and aggregator recommendation safeguards
- Sector: Media/News, Search and video platforms
- What: Incorporate anti-democratic attitude scoring into news/homepage/video recommendations to avoid over-amplifying content that inflames out-group hostility while maintaining diversity of viewpoints.
- Tools/Workflows: Multi-objective ranking (engagement + societal measures); slate composition constraints; periodic audits.
- Assumptions/Dependencies: Balancing viewpoint diversity with toxicity reduction; measurement across formats (text, video thumbnails, headlines).
Academic and product experimentation using societal objective functions
- Sector: Academia, R&D, Product research
- What: Use the method to turn validated constructs into measurable objectives (e.g., wellbeing, trust, misinformation skepticism) and run controlled experiments on their feed-level impacts.
- Tools/Workflows: Codebook-to-prompt pipelines; preregistration; multi-arm experiments; effect-size reporting (e.g., d≈.20–.25).
- Assumptions/Dependencies: Construct validity; reproducibility across contexts; IRB/ethics oversight.
Regulatory risk assessments and dashboards
- Sector: Policy/Compliance (e.g., EU DSA), Platform governance
- What: Track platform-level systemic risk (partisan animosity, support for undemocratic practices) with periodic scoring and publish aggregated metrics for regulators/auditors.
- Tools/Workflows: Risk dashboards; sampling methodologies; third-party audit APIs.
- Assumptions/Dependencies: Legal frameworks; standardized reporting; auditor independence; potential politicization of metrics.
Browser extensions for personal feed hygiene
- Sector: Daily life, Civic tech
- What: Client-side re-ranking or overlays that blur/annotate posts flagged for anti-democratic attitudes, with user-adjustable sensitivity.
- Tools/Workflows: Extension-based DOM analysis; LLM API calls; local caching; simple explainers.
- Assumptions/Dependencies: API cost; latency; site terms of use; user privacy.

Long-Term Applications

The following applications require further research, scaling, governance, or standardization—extending the paper’s approach to broader values, larger populations, and durable policy/industry structures.

Multi-objective recommender systems that operationalize plural societal values
- Sector: Software (recommenders), AI alignment
- What: Formalize rankings that jointly optimize engagement, satisfaction, pro-democratic attitudes, wellbeing, diversity, and misinformation resilience; provide transparent trade-off controls and participatory value-setting.
- Tools/Workflows: Value elicitation processes; Pareto optimization; fairness-aware constraints; stakeholder deliberation interfaces.
- Assumptions/Dependencies: Consensus on values; governance legitimacy; robust longitudinal evaluation.
Standardized societal objective function libraries and benchmarks
- Sector: Open-source, Standards bodies
- What: Create canonical codebooks, prompt templates, datasets, and evaluation suites for constructs (democracy, mental health, extremism, climate misinformation), enabling cross-platform comparability.
- Tools/Workflows: Shared repositories; versioned prompt packs; multilingual datasets; inter-rater and model-human agreement benchmarks.
- Assumptions/Dependencies: Community coordination; funding; multilingual/cultural adaptation; avoiding lock-in to specific vendors.
Field deployments and longitudinal impact studies
- Sector: Academia, Platforms, Public-interest research
- What: Measure real-world, long-horizon effects (months/years) of value-aligned feeds on partisan animosity, civic engagement, and trust—beyond short lab exposures.
- Tools/Workflows: Cohort tracking; quasi-experiments; synthetic controls; mixed methods (surveys + behavioral logs).
- Assumptions/Dependencies: Data-sharing agreements; privacy-preserving analytics; attrition management; external shocks (elections).
Regulatory frameworks that mandate systemic risk mitigation via ranking objectives
- Sector: Policy/Regulation (e.g., EU DSA, national regulators)
- What: Codify expectations that large platforms assess, mitigate, and report societal harms; endorse auditability of objective functions; allow third-party oversight.
- Tools/Workflows: Compliance APIs; independent auditors; policy toolkits for acceptable objectives; transparency reports.
- Assumptions/Dependencies: Legal clarity; avoiding overreach or politicized enforcement; safe harbor for experimentation.
Creator-side “democracy linter” and style transfer models
- Sector: Publishing, Generative AI
- What: Train models to automatically suggest language reframing that preserves message content while reducing signals of animosity or support for undemocratic practices.
- Tools/Workflows: Fine-tuned generative models; constrained optimization for tone; human-in-the-loop editorial review.
- Assumptions/Dependencies: Avoiding homogenization/censorship; maintaining viewpoint diversity; measuring unintended effects.
Cross-cultural and multilingual adaptation of constructs
- Sector: Global platforms, Localization
- What: Translate codebooks and prompts to local political contexts (different parties, norms, histories), and validate with local experts and participants.
- Tools/Workflows: Participatory co-design; local IRBs/ethics boards; regional evaluation cohorts; bias/accuracy analysis per locale.
- Assumptions/Dependencies: Contextual nuance; varying legal regimes; model performance across languages and scripts.
Integration with foundation models and RLHF for value-aware feed AI
- Sector: AI systems
- What: Jointly train ranking AIs with societal objective signals (e.g., via reinforcement learning or constitutional objectives) so alignment is native, not bolted on.
- Tools/Workflows: Policy reward models; offline RL from logged data; simulators for safe training; guardrails for gaming/feedback loops.
- Assumptions/Dependencies: Robust reward design; avoidance of Goodhart’s law; monitoring for emergent behaviors.
Public APIs and third‑party auditor ecosystems
- Sector: Platform governance, Civic tech
- What: Expose anonymized, rate-limited scoring endpoints and datasets so civil society and researchers can independently assess platform impacts.
- Tools/Workflows: Audit sandboxes; reproducible pipelines; certification programs; data trusts.
- Assumptions/Dependencies: Privacy; legal liability; preventing adversarial exploitation.
Application to adjacent harms (health misinformation, extremism, climate denial)
- Sector: Healthcare, Public safety, Environment
- What: Extend the societal objective function method to validated constructs (e.g., evidence acceptance, extremist cues), enabling proactive mitigation in ranking.
- Tools/Workflows: Domain-specific codebooks; expert panels; high-stakes evaluation; cross-sector partnerships.
- Assumptions/Dependencies: Construct validity across domains; false positive costs; crisis/rapid response scenarios.
Organizational governance and ESG reporting on algorithmic impacts
- Sector: Corporate governance, Finance/ESG
- What: Board-level oversight and KPIs for societal outcomes; integrate “democratic attitude impact” into ESG disclosures and risk registers.
- Tools/Workflows: Balanced scorecards; assurance audits; investor communications.
- Assumptions/Dependencies: Materiality standards; consistent metrics; alignment with fiduciary duties.

Notes on general feasibility: The paper’s experiments found that downranking and removal/replacement reduced partisan animosity (d≈.20–.25) without degrading engagement or perceived experience, and LLM-based scoring correlated strongly with manual labels (ρ≈.75). Real-world deployment should account for generalization beyond U.S. partisans and short-term exposure; the method is adaptable to other values but requires robust validation, stakeholder governance, and transparency to maintain user trust and avoid unintended censorship or bias.

Embedding Democratic Values into Social Media AIs via Societal Objective Functions

Summary

Introduction

Methodology

Results

Discussion

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

Overview

Key Objectives and Questions

How the Research Was Done

Step 1: Manually score posts (human judges)

Step 2: Teach an AI to score posts (AI “judge” using a checklist)

Step 3: Test the redesigned feeds with people

Main Findings and Why They Matter

Implications and Potential Impact

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Glossary

Practical Applications

Immediate Applications

Long-Term Applications

Open Problems

Continue Learning

Authors (5)

Collections

Tweets

Embedding Democratic Values into Social Media AIs via Societal Objective Functions

Summary