Cross-Linguistic Priming
- Cross-Linguistic Priming is a phenomenon where exposure to a syntactic structure in one language increases the likelihood of using a similar structure in another language.
- It is quantified through controlled experiments that compare the reuse probabilities of syntactic frames under matching versus mismatching primes.
- Empirical results from human studies and neural models demonstrate robust priming effects, underscoring shared grammatical representations across languages.
Cross-linguistic priming refers to the phenomenon by which exposure to a syntactic structure in one language (the “prime”) increases the probability that a bilingual human—or a multilingual LLM—will subsequently produce or select the same abstract structure in another language (the “target”), even when the prime and target are semantically unrelated. This effect provides direct behavioral evidence for shared, language-independent grammatical representations and is a cornerstone methodology in both experimental psycholinguistics and computational modeling of bilingualism. In generative LLMs, cross-linguistic priming is operationalized as a robust, causal change in next-token probability or grammaticality judgments on target-language constructions given structurally matched primes from another language.
1. Foundations and Definitions
Cross-linguistic priming extends the tradition of structural priming, originally observed in monolingual contexts (Bock 1986; Pickering & Ferreira 2008), to the bilingual domain. In the human literature, such priming is measured by the increased probability of reusing a particular syntactic frame in the target language following exposure to a matching frame in the prime language, with rigorous controls to eliminate lexical or surface-form overlap as explanations.
Formally, a bilingual LLM (LM) exhibits cross-linguistic priming if, for a prime sentence in language and a target sentence in language , the following holds: Here, and are primes in whose syntactic frames do and do not match that of , respectively. The basic logic is that only abstract syntactic representations, not direct surface or lexical overlap, can account for such effects across languages (Arnett et al., 5 Mar 2025, Michaelov et al., 2023).
2. Experimental Paradigms and Quantification
Canonical experiments, both for humans and for LLMs, utilize minimal pairs or controlled syntactic alternations (e.g., double-object (DO) vs. prepositional-object (PO) datives; active vs. passive voice) across typologically varied language pairs (Arnett et al., 5 Mar 2025, Michaelov et al., 2023, Arnett et al., 2023, Zhang et al., 2024). Stimuli are typically constructed such that for each prime in language L₁, corresponding target sentences in L₂ differ only by argument structure or grammatical alternation.
Priming effect size is quantified by the difference in normalized target structure probabilities conditioned on matching/mismatching primes: where, for S denoting a structure and its alternative, the normalized probability is: Statistical significance is assessed using linear mixed-effects models with prime type as fixed effect and item as random intercept, with corrections for multiple comparisons using False Discovery Rate (FDR) control (Arnett et al., 5 Mar 2025, Michaelov et al., 2023, Arnett et al., 2023).
3. Mechanistic Evidence and Model Architectures
Transformer models, both autoregressive and encoder–decoder, outperform Recurrent Neural Networks (RNNs) in replicating human-like cross-linguistic priming, particularly in typologically distant language pairs (e.g., Chinese–English). Transformer's self-attention mechanism serves as a direct analog to cue-based retrieval—a cognitive theory where memory retrieval of syntax is triggered by content-addressable cues rather than sequential activation. Empirical results demonstrate priming accuracies of 33.33% for Transformers versus 25.84% for RNN-GRU models on Chinese–English dative and voice alternations (Zhang et al., 2024). Larger multilingual Transformers such as XGLM and PolyLM families (1.7–7.5B parameters) display robust priming effects that are commensurate in magnitude and direction to those observed in human bilingual studies (Michaelov et al., 2023).
Layerwise and neuronwise interventions reveal that internal neural representations in models encode cross-linguistically shared syntax. Language-neuron overlap and logit lens analyses show that syntactic similarity between L₁ and L₂ leads to increased neuron sharing for L₂ and greater priming magnitudes (Issam et al., 29 Jan 2026).
4. Core Empirical Findings
Quantitative studies consistently find:
- Priming effect sizes () in L₁→L₂ directions (target English) are robust (Δ ≈ 0.12–0.20 for typologically similar languages; p < .01 after FDR) (Arnett et al., 5 Mar 2025, Michaelov et al., 2023).
- Directional asymmetries: L₁→L₂ priming is consistently stronger than L₂→L₁, independent of acquisition order, reflecting target language properties rather than exposure (Arnett et al., 5 Mar 2025, Michaelov et al., 2023).
- Priming is modulated by syntactic and orthographic similarity. Dutch and Spanish show transfer gains in loss and priming effect that are not matched by Greek or Polish, with loss curves and effect sizes diminishing as typological distance increases (Arnett et al., 5 Mar 2025, Issam et al., 29 Jan 2026).
- Priming emerges rapidly after L₂ exposure, with statistically significant effects appearing after fewer than 1M L₂ tokens in controlled model pretraining (Arnett et al., 2023).
- Bidirectional facilitation for grammatical structures consistent with “shared syntax” accounts; ungrammatical structure priming is seen only from a dominant L₁ to a less proficient L₂, supporting a “separate-but-connected” representation for non-overlapping syntax (Issam et al., 29 Jan 2026).
Table 1: Cross-linguistic priming effects in XGLM 4.5B (Michaelov et al., 2023)
| Study (L₁→L₂) | ΔP (Effect Size) | Significance |
|---|---|---|
| Dutch→English (PO) | 0.12 | p < .0001 * |
| Dutch→English (’s-Gen) | 0.09 | p < .0001 * |
| Greek→English (Passive) | 0.07 | p = 0.0485 * |
| Spanish→English (Passive) | 0.04 | n.s. |
| Polish→English (Passive) | 0.02 | n.s. |
*Significant after FDR correction.
5. Determinants and Modulators of Priming Strength
- Syntactic distance: Measured by non-shared WALS features, with closer languages showing more facilitation and distant languages showing interference or attenuated priming (Issam et al., 29 Jan 2026).
- Language dominance and L₂ proficiency: Manipulation of “age of exposure” (sequential step at which L₂ data is introduced) modulates priming. Greater L₁ entrenchment and lower L₂ proficiency reduce bidirectional priming, paralleling late versus early L₂ acquisition in humans (Issam et al., 29 Jan 2026).
- Orthography: Romanization of non-Latin scripts increases priming, demonstrating that orthographic overlap is a facilitator of cross-linguistic activation (Issam et al., 29 Jan 2026).
- Word order: Shared word order amplifies observed priming effects; however, it is not strictly necessary for priming to occur (Issam et al., 29 Jan 2026).
- Training curriculum: Catastrophic forgetting in sequential training abolishes priming for distant pairs, indicating a dependency on persistent, overlapping representation in continued bilingual exposure (Arnett et al., 5 Mar 2025).
6. Implications for Human Psycholinguistics and Computational Modeling
Cross-linguistic priming in neural models has close parallels to human psycholinguistic findings (Hartsuiker et al. 2004; Schoonbaert et al. 2007). Key implications include:
- Direct, causal evidence for abstract, shared grammatical representations aligns with “shared syntax” models of the bilingual mind (Arnett et al., 5 Mar 2025, Michaelov et al., 2023, Issam et al., 29 Jan 2026).
- Priming strength is constrained by both typological relatedness and the manner of training (simultaneous vs. sequential), mirroring real-world L₂ dominance and acquisition effects (Arnett et al., 5 Mar 2025, Issam et al., 29 Jan 2026).
- Transformers’ mechanisms are cognitively analogous to cue-based retrieval, challenging views that human syntax adaptation depends exclusively on sequential or recurrent processing (Zhang et al., 2024).
- Evidence from both behavioral LM outputs and internal representation analyses argues for underlying syntactic subspaces that are both separate and connectable depending on construction overlap (Issam et al., 29 Jan 2026).
- The rapid emergence of priming after limited L₂ exposure illuminates paths for low-resource language transfer and the need for contamination control in “zero-shot” multilingual model evaluation (Arnett et al., 2023).
7. Open Questions and Directions
Current research highlights several open avenues:
- Mechanistic localization of cross-linguistic priming within model layers and neurons, especially for abstract syntactic frames (Michaelov et al., 2023, Issam et al., 29 Jan 2026).
- Direct alignment of model surprisal-change metrics with human reading-time data for comprehension-oriented priming (Zhang et al., 2024).
- Systematic testing and mitigation of lexical-boost confounds in cross-linguistic priming setups (Zhang et al., 2024).
- Evaluation of priming under varied construction frequencies to relate model adaptation to inverse-frequency priming in human data (Zhang et al., 2024).
- Extension of experimental paradigms to low-resource and typologically divergent language pairs, with emphasis on understanding the boundaries of cross-linguistic transfer (Michaelov et al., 2023).
In sum, cross-linguistic priming serves as both a sensitive probe and a unifying lens for investigating abstract grammatical representation, transfer mechanisms, and structural adaptation in both humans and large-scale LLMs (Arnett et al., 5 Mar 2025, Issam et al., 29 Jan 2026, Michaelov et al., 2023, Arnett et al., 2023, Zhang et al., 2024).