Papers
Topics
Authors
Recent
Search
2000 character limit reached

Explicit Moral Foundation Dictionary (EMFD)

Updated 20 January 2026
  • EMFD is a crowdsourced lexicon that annotates over 3,000 English words with probabilistic moral foundation scores and sentiment values.
  • It employs a bag-of-words approach combined with embedding techniques to statistically measure moral loadings, virtue/vice polarity, and ambivalence in texts.
  • Empirical validations on social media data demonstrate its practical effectiveness in scaling moral content analysis and enhancing predictive models.

The Explicit Moral Foundation Dictionary (EMFD), often referenced as eMFD, is a crowdsourced lexical resource designed for the quantitative measurement of moral content in English-language texts. Rooted in Moral Foundations Theory (MFT), the EMFD systematically annotates thousands of English lemmas with probabilistic associations to five or six canonical moral domains, together with sentiment (valence) information. The EMFD serves as both a standalone resource for dictionary-based analysis and as a foundational component in advanced “vec-tionary” methods that leverage distributed word representations for scalable, context-sensitive moral content extraction (Duan et al., 2023, Gamage et al., 2023).

1. Theoretical Basis: Moral Foundations Theory and Lexical Coding

Moral Foundations Theory (MFT) posits that human moral reasoning is structured around a set of universal, evolutionary-derived foundations, each spanning a semantic continuum from virtue to vice. The canonical foundations typically encompass Care/Harm, Fairness/Cheating, Loyalty/Betrayal, Authority/Subversion, Sanctity/Degradation, and (sometimes) Liberty/Oppression. Each foundation is operationalized as an axis in semantic space, with utterances mapped via lexical indicators onto one or more foundation dimensions. The EMFD advances this theoretical framework by providing a rigorous, high-coverage lexicon that encodes explicit probabilistic and valence information for thousands of English words (Duan et al., 2023, Gamage et al., 2023).

2. Construction and Structure of the EMFD

The EMFD (eMFD) comprises approximately 3,270 English lemmas. Its construction employed a crowdsourcing protocol in which annotators assessed target words within context, yielding estimates for:

  • pi[0,1]p_i \in [0, 1]: the probability that word ii evokes a given foundation, based on aggregated annotator judgments.
  • vi[1,+1]v_i \in [-1, +1]: the average valence for word ii, derived from VADER sentiment analysis contextualized to the foundation.
  • si=pivis_i = p_i \cdot v_i: an observed “relevance” score indicating the probability-weighted valence (Duan et al., 2023).

For each lemma ww, the EMFD tabulates a “probability vector” Pw=(pcare,pfairness,ployalty,pauthority,psanctity)P_w = (p_\text{care}, p_\text{fairness}, p_\text{loyalty}, p_\text{authority}, p_\text{sanctity}) (for five-foundation schemes) and a sentiment vector Sw=(scare,sfairness,sloyalty,sauthority,ssanctity)S_w = (s_\text{care}, s_\text{fairness}, s_\text{loyalty}, s_\text{authority}, s_\text{sanctity}) with sF[1,+1]s_F\in[-1,+1]. Words can have nonzero associations with multiple foundations and are not hierarchically organized. The EMFD is delivered as a flat lexicon, enabling straightforward integration into bag-of-words pipelines (Gamage et al., 2023).

3. Algorithmic Use: Quantifying Moral Loadings

To operationalize the EMFD for corpus analysis, the typical workflow processes texts as “bag-of-words,” matching tokens against the lexicon and computing per-document metrics for each foundation. For a document dd with token set Wd={w1,...,wN}W_d = \{w_1, ..., w_N\}, foundation FF is scored as:

Fp(d)=1WdwWdpF(w)F_p(d) = \frac{1}{|W_d|} \sum_{w \in W_d} p_F(w)

Fsent(d)=1WdwWdsF(w)F_\text{sent}(d) = \frac{1}{|W_d|} \sum_{w \in W_d} s_F(w)

Here, Fp(d)F_p(d) is interpretable as the mean probability of invoking foundation FF, and Fsent(d)F_\text{sent}(d) reflects the average sentiment of that foundation’s language within the document. The dominant foundation is determined as F=argmaxFFp(d)F^* = \arg\max_F F_p(d), with positive sentiment labeled “virtue” and negative sentiment “vice” (Gamage et al., 2023).

4. Vec-tionary Extension: Embedding-Augmented Moral Measurement

Traditional dictionary-based approaches, such as raw EMFD lookup, are limited by fixed vocabulary and contextual insensitivity. The vec-tionary methodology addresses these limitations by projecting the EMFD’s weighted word list into high-dimensional embedding space (e.g., Word2Vec 300-dim) (Duan et al., 2023).

For each foundation, a latent unit-norm semantic axis mRdm^*\in\mathbb{R}^d is optimized by minimizing:

m=argminm=1i=1N(wimsi)2m^* = \underset{\|m\|=1}{\arg\min}\, \sum_{i=1}^N (w_i^\top m - s_i)^2

where wiw_i is the unit vector for EMFD word ii, and sis_i is its foundation-specific relevance. The optimized axis mm^* captures the semantic continuum of vice to virtue for the foundation in embedding space. The full vocabulary can then be projected onto mm^*, expanding the coverage of the dictionary; thresholding or top-k selection yields an augmented lexicon. This approach enables moral content detection in texts containing words absent from the EMFD seed (Duan et al., 2023).

5. Metrics: Strength, Valence, and Ambivalence

The vec-tionary framework defines three key metrics for a text TT of nn tokens:

  • Strength (SS): Mean absolute cosine similarity to the foundation axis,

S=1ni=1nwimS = \frac{1}{n} \sum_{i=1}^n |w_i \cdot m^*|

quantifying overall moral salience.

  • Valence (VV): Mean signed similarity,

V=1ni=1n(wim)V = \frac{1}{n} \sum_{i=1}^n (w_i \cdot m^*)

indicating virtue–vice polarity.

  • Ambivalence (AA): Variance of signed projections,

A=1ni=1n((wim)V)2A = \frac{1}{n} \sum_{i=1}^n((w_i \cdot m^*) - V)^2

capturing intra-textual tension between virtue and vice.

The original eMFD only supports strength-like measures; the extension adds valence and ambivalence “for free,” enabling more nuanced characterization of moral discourse (Duan et al., 2023).

6. Empirical Evaluation and Validation

Benchmark evaluations have applied the EMFD and its vec-tionary enhancement to large-scale social media corpora. For example, (Duan et al., 2023) evaluated these methods on 2.3 million COVID-19–era English tweets, using 2,000 tweets per foundation with human-annotated gold-standard virtue/vice labels. Tweets ranked by vec-tionary Strength produced rank-biased overlap (RBO) improvements of 10–50% over baseline eMFD scoring for Care/Harm, Authority/Subversion, and Loyalty/Betrayal, with no significant difference for Fairness/Cheating and Sanctity/Degradation.

Predictive validations using zero-inflated negative binomial models for retweet counts showed that adding (a) vec-tionary Strength, and further (b) Valence and (c) Ambivalence, significantly improved predictive accuracy (Δ\DeltaAIC >> 200k, Δχ2\Delta\chi^2 >> 200k; full model: Δχ2668\Delta\chi^2 \approx 668k over baseline).

In applied studies, such as (Gamage et al., 2023), the eMFDscore tool produced foundation-specific “moral loadings” across more than 100,000 Reddit posts, enabling statistical analysis of the distribution and sentiment of moral narratives across topics and time.

7. Practical Implementation and Usage Examples

Deployment of the EMFD and its extensions follows reproducible steps:

  1. Preprocessing: Tokenize text, optionally remove stopwords.
  2. Embedding mapping: Normalize each token to its embedding vector.
  3. Projection: For foundation mm^*, compute cosine similarities wimw_i \cdot m^* for all tokens.
  4. Aggregation: Calculate Strength, Valence, and Ambivalence.
  5. Interpretation: Scores are interpreted as follows: strength (overall moral content), valence (virtue–vice orientation), and ambivalence (conflicting signals).
  6. Dictionary extension: Augment the lexicon by projecting all vocabulary words and applying thresholds or top-kk selection. Optionally, iterate the axis estimation with expanded seeds.

Worked example (Duan et al., 2023): Given the tweet “We must protect vulnerable children from disease,” the Care/Harm axis mCm_C yields token projections: “protect” (+0.22), “vulnerable” (+0.30), “children” (+0.15), “disease” (–0.05), with strength 0.18 (mildly moral), valence +0.155 (virtue-leaning), and ambivalence 0.022 (low tension).

A plausible implication is that with minimal computational resources, researchers can apply EMFD-vec-tionary methods to arbitrary English corpora, robustly measuring foundation-specific moral content, even in previously unannotated domains.


Table: EMFD Data Structure Overview

Field Description Value Range
pip_i Probability word ii signals a moral foundation [0,1][0,1]
viv_i Rated valence of word ii for the foundation (– vice, + virtue) [1,+1][-1, +1]
si=pivis_i = p_i v_i Observed relevance (probability-weighted valence) [1,+1][-1, +1]

The EMFD and its embedding-augmented extensions constitute essential infrastructure for computational moral psychology and automated text analysis, yielding both scalable coverage across diverse genres and granularity in the quantification of moral valence, vice/virtue polarity, and internal ambivalence of textual messages.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Explicit Moral Foundation Dictionary (EMFD).