Token-Level Importance Sampling (TIS-DPO)

Updated 24 January 2026

Token-level Importance Sampling (TIS-DPO) is a method that assigns importance weights to individual tokens for refined policy optimization.
It dynamically adjusts gradient updates based on token contributions, leading to improved model convergence and performance.
Empirical results indicate that TIS-DPO can significantly enhance training efficiency and robustness in various NLP applications.

Aligned Span Projection is a family of methodologies for mapping contiguous spans—of text, features, or geometric structures—between domains under constraints of alignment, annotation, or interpretability. Originally prominent in sequence annotation projection, word alignment, and dimensionality reduction, aligned span projection techniques seek to preserve semantic, geometric, or structural correspondence between source and target representations. Modern approaches leverage deep models, statistical aligners, and geometric loss functions for cross-lingual transfer, high-dimensional data visualization, and 3D object localization. Below is a comprehensive exposition across principal domains.

1. Formal Definitions of Aligned Span Projection

Aligned span projection encapsulates variants depending on application domain:

Sequence Label Projection: Given a labeled source span $(i, j, \ell)$ in sentence $x = (x_1, \dots, x_n)$ , aligned span projection seeks a corresponding span $(p^*, q^*, \ell)$ in the parallel sentence $y = (y_1, \dots, y_m)$ , where the label $\ell$ is preserved (García-Ferrero et al., 2022).
Cross-lingual Transfer: Label-span projection transports source span $(s_i, e_i)$ annotations from $S = (w_1, \dots, w_n)$ to a translated text $T = (v_1, \dots, v_m)$ , aiming to recover spans $(\hat{s}_i, \hat{e}_i)$ in $T$ that encapsulate identical semantic content (Chen et al., 2022).
Weakly Supervised Span Alignment: Word alignment is reframed as bidirectional span‐prediction, aligning $X_{i,j}$ in source and $Y_{k,l}$ in target with probabilistic correspondence, yielding quadruple alignments $(i, j, k, l)$ (Wu et al., 2023).
Axis-Aligned Decomposition for Visualization: In linear embedding, aligned span projection refers to decomposing dense linear projections into sparse, axis-aligned 2D subspaces $Z_i$ , each visualizing two original coordinates and jointly reconstructing the pairwise neighborhood structure of the original linear projections (Thiagarajan et al., 2017).
Geometric Alignment in 3D Detection: Projected 3D bounding boxes are optimized so that their corner spans align spatially with ground-truth cuboid corners and image-plane projections are tightly bounded by detected 2D boxes (Wang et al., 10 Nov 2025).

2. Algorithmic Methods and Mathematical Formulation

Principal methodologies include:

Statistical and Neural Alignment: Soft alignment matrices $A \in \mathbb{R}^{n \times m}$ quantify $P(w_i \mapsto v_j)$ ; spans are projected by collecting maximal alignments and applying heuristics for contiguous span estimation (Chen et al., 2022).
Marker-Based Projection ("Mark-then-Translate"): Annotation spans in source are demarcated by unique markers (e.g., square brackets) that survive translation. Target spans are identified by marker positions post-translation, with fuzzy string-matching for label correspondence (Chen et al., 2022).
Text-to-Text Model Candidate Generation: T-Projection decomposes projection: candidate spans in target $y$ are generated using an mT5 prompt (target $y$ plus label placeholders), then filtered and scored. Final span selection uses translation-equivalence scores between source and each candidate via NMT probability normalization and symmetrization $sim(A,B)$ (García-Ferrero et al., 2022).
Span Prediction Architecture: WSPAlign encodes marked spans using a multilingual transformer, predicts start/end indices in target via linear heads, symmetrizes by averaging directional probabilities, and thresholds alignments (Wu et al., 2023).
2D Axis-Aligned Decomposition: For a linear subspace $V \in \mathbb{R}^{d \times 2}$ , the task is to select $\Omega = \{ Z_i \}$ , where each $Z_i$ projects onto axes $(p,q)$ , minimizing neighborhood distortion $\| C \alpha - b \|_2^2$ . Relevancy uses Dempster–Shafer evidence accumulation $\mu(Z_i)$ across multiple linear projections (Thiagarajan et al., 2017).
Geometric Alignment Losses (SPAN): Spatial Point Alignment computes marginal 3D IoU on corners and penalizes deviation; Projection Alignment computes 2D GIoU between projected 3D box and ground-truth 2D rectangle (Wang et al., 10 Nov 2025).

3. Practical Applications and Key Results

Aligned span projection techniques underpin transferable annotation, interpretable visualization, and spatially consistent detection:

Sequence Labeling and NER: T-Projection yields micro-F1 $93.9$ on span-level alignment, outperforming aligners and text encoders by $8.6$ points; marker-based EasyProject gains $7.4$ F1 for NER in African languages over Awesome-align (García-Ferrero et al., 2022, Chen et al., 2022).
Word Alignment: WSPAlign fine-tuned on noisy, weakly supervised data improves F1 by $3.3$-$6.1$ and reduces AER by up to $6.1$ over baseline SpanAlign; zero-shot transfer surpasses unsupervised aligners (Wu et al., 2023).
Cross-lingual Event/QA Extraction: Mark-then-translate with square-bracket markers preserves $97$- $100\%$ of spans, outperforming standard alignment methods, and shows greater benefits for low-resource languages (Chen et al., 2022).
Visualization of High-Dimensional Data: Decomposition of $P$ linear projections typically requires only $\sim P$ axis-aligned spans to capture all pairwise relationships, providing interpretable views; feature drives cluster separation are exposed directly (Thiagarajan et al., 2017).
Monocular 3D Object Detection: SPAN module improves AP $_{3D}$ /AP $_{BEV}$ by $0.2$-$1.2$, with KITTI car validation AP $_{3D}$ rising from $22.34$ to $23.26$ (+$0.92$) (Wang et al., 10 Nov 2025).

4. Experimental Protocols and Algorithmic Outlines

Benchmark workflows illustrate technical details:

Method / Paper	Data Handling	Key Workflow Steps	Metrics / Criteria
T-Projection (García-Ferrero et al., 2022)	CoNLL, SemEval, Europarl	Candidate generation (mT5 prompt), NMT scoring, non-overlap selection	micro-F1, overlap, label fidelity
EasyProject (Chen et al., 2022)	WikiANN, MasakhaNER	Insert markers, translation with MT, bracket extraction, fuzzy match	F1, projection-rate, manual span accuracy
WSPAlign (Wu et al., 2023)	Wiki data + co-mention	Transformer encoding, start/end prediction, symmetrization	F1, AER, token-wise alignment
Axis-Aligned Decomp. (Thiagarajan et al., 2017)	Wine, NIF, Climate	Masking optimization for axes, Dempster–Shafer scoring, greedily pick spans	Neighborhood preservation, evidence score
SPAN (3D detection) (Wang et al., 10 Nov 2025)	KITTI, MonoDETR	3D corner/2D projection alignment, HTL scheduling, gradient modulated loss	AP $_{3D}$ , AP $_{BEV}$ , loss ablation

5. Strengths, Limitations, and Implementation Insights

Strengths:

Semantic Boundary Preservation: Marker-based projection methods exhibit superior preservation of span boundaries (up to $100\%$ ) compared to neural aligners ($93$- $97\%$ ) (Chen et al., 2022).
Noisy/Low Resource Suitability: Weakly supervised, marker-based, and generative methods (WSPAlign, EasyProject, T-Projection) scale to zero-shot and few-shot regimes, enabling alignment for hundreds of languages (Wu et al., 2023, García-Ferrero et al., 2022).
Interpretability: Axis-aligned decomposition renders high-dimensional structure accessible, isolating influential features for cluster or embedding visualization (Thiagarajan et al., 2017).
Plug-and-Play Integration: SPAN's geometric losses can augment any modern monocular 3D detector, enhancing localization accuracy without inference overhead (Wang et al., 10 Nov 2025).

Limitations:

Dependency on Data Quality: Some methods (SPAN) require accurate 2D detection; errors ( $>10$ -$15$ pixels) in bounding boxes substantially degrade performance (Wang et al., 10 Nov 2025).
Translation Artifacts: Marker-based projection may introduce misalignment in agglutinative languages, and performance may depend on MT system fine-tuning (Chen et al., 2022).
Computational Overhead: Axis-aligned decomposition and geometric loss computation incur training-time complexity proportional to number of neighbors, axes, and candidate spans (Thiagarajan et al., 2017, Wang et al., 10 Nov 2025).

Implementation recommendations:

Use language-agnostic square brackets as markers, and fine-tune MT systems for marker retention (Chen et al., 2022).
Precompute offsets and batch GIoU computations for efficiency in geometric span alignment (Wang et al., 10 Nov 2025).
Enforce one-to-one assignments in candidate-based annotation projection and filter noisy projections (García-Ferrero et al., 2022).
Employ evidence-based filtering of axis-aligned spans to retain only high-preservation projections (Thiagarajan et al., 2017).

6. Extensions and Future Directions

Recent literature suggests further possibilities:

Multi-View and Video Alignment: Geometric span projection could extend to multi-view or temporal alignment where projections are constrained across frames or viewpoints (Wang et al., 10 Nov 2025).
Learnable Weight Schedules: Hierarchical Task Learning dynamically modulates loss weights according to convergence status of prerequisite heads, promoting stable optimization (Wang et al., 10 Nov 2025).
Generalization Beyond Axis-Aligned Rectangles: Pursuit of convex-hull or higher-order geometric constraints may render projection alignment more robust in both detection and visualization settings (Wang et al., 10 Nov 2025).
Adversarial and Adaptive Alignment: Exploration of adversarial generation for difficult span assignments or outlier cases may strengthen model robustness (Wang et al., 10 Nov 2025).
Scaling Cross-Lingual Annotation: Fully automatic span-alignment methods using pre-trained generative models and translation probability filtering pose promising directions for multilingual corpus construction (García-Ferrero et al., 2022, Wu et al., 2023).

7. Relation to Broader Research and Impact

Aligned span projection bridges annotation transfer, unsupervised alignment, geometric consistency in detection, and interpretable scientific visualization. Its techniques inform dataset construction, model transferability, and diagnostic embeddings across computational linguistics, computer vision, and scientific analytics. Comparative empirical analyses establish that marker-based, generative, and geometric alignment methods consistently outperform traditional aligners, delivering high-quality transfer in low-resource scenarios and interpretable feature correlations in high-dimensional data (García-Ferrero et al., 2022, Chen et al., 2022, Thiagarajan et al., 2017, Wang et al., 10 Nov 2025, Wu et al., 2023).