Explicit Moral Foundation Dictionary (EMFD)

Updated 20 January 2026

EMFD is a crowdsourced lexicon that annotates over 3,000 English words with probabilistic moral foundation scores and sentiment values.
It employs a bag-of-words approach combined with embedding techniques to statistically measure moral loadings, virtue/vice polarity, and ambivalence in texts.
Empirical validations on social media data demonstrate its practical effectiveness in scaling moral content analysis and enhancing predictive models.

The Explicit Moral Foundation Dictionary (EMFD), often referenced as eMFD, is a crowdsourced lexical resource designed for the quantitative measurement of moral content in English-language texts. Rooted in Moral Foundations Theory (MFT), the EMFD systematically annotates thousands of English lemmas with probabilistic associations to five or six canonical moral domains, together with sentiment (valence) information. The EMFD serves as both a standalone resource for dictionary-based analysis and as a foundational component in advanced “vec-tionary” methods that leverage distributed word representations for scalable, context-sensitive moral content extraction (Duan et al., 2023, Gamage et al., 2023).

1. Theoretical Basis: Moral Foundations Theory and Lexical Coding

Moral Foundations Theory (MFT) posits that human moral reasoning is structured around a set of universal, evolutionary-derived foundations, each spanning a semantic continuum from virtue to vice. The canonical foundations typically encompass Care/Harm, Fairness/Cheating, Loyalty/Betrayal, Authority/Subversion, Sanctity/Degradation, and (sometimes) Liberty/Oppression. Each foundation is operationalized as an axis in semantic space, with utterances mapped via lexical indicators onto one or more foundation dimensions. The EMFD advances this theoretical framework by providing a rigorous, high-coverage lexicon that encodes explicit probabilistic and valence information for thousands of English words (Duan et al., 2023, Gamage et al., 2023).

2. Construction and Structure of the EMFD

The EMFD (eMFD) comprises approximately 3,270 English lemmas. Its construction employed a crowdsourcing protocol in which annotators assessed target words within context, yielding estimates for:

$p_i \in [0, 1]$ : the probability that word $i$ evokes a given foundation, based on aggregated annotator judgments.
$v_i \in [-1, +1]$ : the average valence for word $i$ , derived from VADER sentiment analysis contextualized to the foundation.
$s_i = p_i \cdot v_i$ : an observed “relevance” score indicating the probability-weighted valence (Duan et al., 2023).

For each lemma $w$ , the EMFD tabulates a “probability vector” $P_w = (p_\text{care}, p_\text{fairness}, p_\text{loyalty}, p_\text{authority}, p_\text{sanctity})$ (for five-foundation schemes) and a sentiment vector $S_w = (s_\text{care}, s_\text{fairness}, s_\text{loyalty}, s_\text{authority}, s_\text{sanctity})$ with $s_F\in[-1,+1]$ . Words can have nonzero associations with multiple foundations and are not hierarchically organized. The EMFD is delivered as a flat lexicon, enabling straightforward integration into bag-of-words pipelines (Gamage et al., 2023).

3. Algorithmic Use: Quantifying Moral Loadings

To operationalize the EMFD for corpus analysis, the typical workflow processes texts as “bag-of-words,” matching tokens against the lexicon and computing per-document metrics for each foundation. For a document $d$ with token set $W_d = \{w_1, ..., w_N\}$ , foundation $F$ is scored as:

$F_p(d) = \frac{1}{|W_d|} \sum_{w \in W_d} p_F(w)$

$F_\text{sent}(d) = \frac{1}{|W_d|} \sum_{w \in W_d} s_F(w)$

Here, $F_p(d)$ is interpretable as the mean probability of invoking foundation $F$ , and $F_\text{sent}(d)$ reflects the average sentiment of that foundation’s language within the document. The dominant foundation is determined as $F^* = \arg\max_F F_p(d)$ , with positive sentiment labeled “virtue” and negative sentiment “vice” (Gamage et al., 2023).

4. Vec-tionary Extension: Embedding-Augmented Moral Measurement

Traditional dictionary-based approaches, such as raw EMFD lookup, are limited by fixed vocabulary and contextual insensitivity. The vec-tionary methodology addresses these limitations by projecting the EMFD’s weighted word list into high-dimensional embedding space (e.g., Word2Vec 300-dim) (Duan et al., 2023).

For each foundation, a latent unit-norm semantic axis $m^*\in\mathbb{R}^d$ is optimized by minimizing:

$m^* = \underset{\|m\|=1}{\arg\min}\, \sum_{i=1}^N (w_i^\top m - s_i)^2$

where $w_i$ is the unit vector for EMFD word $i$ , and $s_i$ is its foundation-specific relevance. The optimized axis $m^*$ captures the semantic continuum of vice to virtue for the foundation in embedding space. The full vocabulary can then be projected onto $m^*$ , expanding the coverage of the dictionary; thresholding or top-k selection yields an augmented lexicon. This approach enables moral content detection in texts containing words absent from the EMFD seed (Duan et al., 2023).

5. Metrics: Strength, Valence, and Ambivalence

The vec-tionary framework defines three key metrics for a text $T$ of $n$ tokens:

Strength ( $S$ ): Mean absolute cosine similarity to the foundation axis,

$S = \frac{1}{n} \sum_{i=1}^n |w_i \cdot m^*|$

quantifying overall moral salience.

Valence ( $V$ ): Mean signed similarity,

$V = \frac{1}{n} \sum_{i=1}^n (w_i \cdot m^*)$

indicating virtue–vice polarity.

Ambivalence ( $A$ ): Variance of signed projections,

$A = \frac{1}{n} \sum_{i=1}^n((w_i \cdot m^*) - V)^2$

capturing intra-textual tension between virtue and vice.

The original eMFD only supports strength-like measures; the extension adds valence and ambivalence “for free,” enabling more nuanced characterization of moral discourse (Duan et al., 2023).

6. Empirical Evaluation and Validation

Benchmark evaluations have applied the EMFD and its vec-tionary enhancement to large-scale social media corpora. For example, (Duan et al., 2023) evaluated these methods on 2.3 million COVID-19–era English tweets, using 2,000 tweets per foundation with human-annotated gold-standard virtue/vice labels. Tweets ranked by vec-tionary Strength produced rank-biased overlap (RBO) improvements of 10–50% over baseline eMFD scoring for Care/Harm, Authority/Subversion, and Loyalty/Betrayal, with no significant difference for Fairness/Cheating and Sanctity/Degradation.

Predictive validations using zero-inflated negative binomial models for retweet counts showed that adding (a) vec-tionary Strength, and further (b) Valence and (c) Ambivalence, significantly improved predictive accuracy ( $\Delta$ AIC $>$ 200k, $\Delta\chi^2$ $>$ 200k; full model: $\Delta\chi^2 \approx 668$ k over baseline).

In applied studies, such as (Gamage et al., 2023), the eMFDscore tool produced foundation-specific “moral loadings” across more than 100,000 Reddit posts, enabling statistical analysis of the distribution and sentiment of moral narratives across topics and time.

7. Practical Implementation and Usage Examples

Deployment of the EMFD and its extensions follows reproducible steps:

Preprocessing: Tokenize text, optionally remove stopwords.
Embedding mapping: Normalize each token to its embedding vector.
Projection: For foundation $m^*$ , compute cosine similarities $w_i \cdot m^*$ for all tokens.
Aggregation: Calculate Strength, Valence, and Ambivalence.
Interpretation: Scores are interpreted as follows: strength (overall moral content), valence (virtue–vice orientation), and ambivalence (conflicting signals).
Dictionary extension: Augment the lexicon by projecting all vocabulary words and applying thresholds or top- $k$ selection. Optionally, iterate the axis estimation with expanded seeds.

Worked example (Duan et al., 2023): Given the tweet “We must protect vulnerable children from disease,” the Care/Harm axis $m_C$ yields token projections: “protect” (+0.22), “vulnerable” (+0.30), “children” (+0.15), “disease” (–0.05), with strength 0.18 (mildly moral), valence +0.155 (virtue-leaning), and ambivalence 0.022 (low tension).

A plausible implication is that with minimal computational resources, researchers can apply EMFD-vec-tionary methods to arbitrary English corpora, robustly measuring foundation-specific moral content, even in previously unannotated domains.

Table: EMFD Data Structure Overview

Field	Description	Value Range
$p_i$	Probability word $i$ signals a moral foundation	$[0,1]$
$v_i$	Rated valence of word $i$ for the foundation (– vice, + virtue)	$[-1, +1]$
$s_i = p_i v_i$	Observed relevance (probability-weighted valence)	$[-1, +1]$

The EMFD and its embedding-augmented extensions constitute essential infrastructure for computational moral psychology and automated text analysis, yielding both scalable coverage across diverse genres and granularity in the quantification of moral valence, vice/virtue polarity, and internal ambivalence of textual messages.

Markdown Report Issue Upgrade to Chat

References (2)

Constructing Vec-tionaries to Extract Message Features from Texts: A Case Study of Moral Appeals (2023)

Moral intuitions behind deepfake-related discussions in Reddit communities (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Explicit Moral Foundation Dictionary (EMFD).

Explicit Moral Foundation Dictionary (EMFD)

1. Theoretical Basis: Moral Foundations Theory and Lexical Coding

2. Construction and Structure of the EMFD

3. Algorithmic Use: Quantifying Moral Loadings

4. Vec-tionary Extension: Embedding-Augmented Moral Measurement

5. Metrics: Strength, Valence, and Ambivalence

6. Empirical Evaluation and Validation

7. Practical Implementation and Usage Examples

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Explicit Moral Foundation Dictionary (EMFD)

1. Theoretical Basis: Moral Foundations Theory and Lexical Coding

2. Construction and Structure of the EMFD

3. Algorithmic Use: Quantifying Moral Loadings

4. Vec-tionary Extension: Embedding-Augmented Moral Measurement

5. Metrics: Strength, Valence, and Ambivalence

6. Empirical Evaluation and Validation

7. Practical Implementation and Usage Examples

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research