End-to-End Aspect-based Sentiment Analysis

Updated 18 December 2025

E2E-ABSA is a comprehensive task that jointly extracts aspect spans and classifies their sentiment, enabling detailed analysis of specific opinion targets.
It leverages methodologies such as unified sequence tagging, joint multi-task learning, MRC framing, dependency-aware networks, and generative models to optimize performance.
Advanced techniques like LLM-guided transfer and contrastive post-training enhance cross-domain robustness and address challenges like implicit and overlapping aspects.

End-to-End Aspect-based Sentiment Analysis (E2E-ABSA) refers to the joint extraction of aspect terms and the assignment of sentiment polarity to each aspect within unstructured text, typically at the sentence or review level. Unlike traditional sentiment analysis, which produces a single overall sentiment per document, E2E-ABSA delivers fine-grained aspect-level judgments, enabling more precise understanding of opinions on individual features (e.g., price, screen, battery in laptop reviews). This compound task has become central in natural language processing for applications such as product analytics, social media monitoring, and customer feedback mining.

1. Formal Task Definition and Taxonomy

E2E-ABSA requires the simultaneous solution of two subproblems: (1) aspect term extraction—identifying all spans corresponding to product/service features, and (2) aspect-level sentiment classification—predicting one of the sentiment polarities (typically “positive”, “negative”, or “neutral”) for each extracted span. Let $s = (x_1, ..., x_n)$ be an input sentence; the output is a set of pairs

$Y = \{(a_1, p_1), ..., (a_m, p_m)\}$

with $a_j$ span indices in $s$ , and $p_j \in \{\text{POS, NEG, NEU}\}$ (Zhang et al., 2022). This task is the simplest compound formulation in the spectrum of ABSA challenges, positioned between single-element tasks (aspect term or sentiment classification) and richer relation extraction settings (e.g., triplets or quadruples involving aspects, opinions, categories, sentiment) (Zhang et al., 2022, Cai et al., 2023).

2. Principal Modeling Paradigms

E2E-ABSA has been addressed through several distinctive neural and pre-neural paradigms. The dominant strategies are:

Unified Sequence Tagging: The input sequence $s$ is mapped to a tagged sequence using a single or joint tag set incorporating both aspect span boundaries and polarity information. A canonical tag set is

$\mathcal{Y} = \{\text{B-POS}, \text{I-POS}, ..., \text{S-NEU}, O\}$

so that each token label encodes both boundary and sentiment (Li et al., 2019, Zhang et al., 2022). Training is with token-level cross-entropy or negative log-likelihood for a CRF layer (Yadav, 2020).

Joint Multi-task Learning: A shared encoder feeds two output heads, one for boundary spans and one for per-token or per-span polarity prediction, typically optimized with a weighted joint loss (He et al., 2019, Liang et al., 2020). Inter-task message passing (e.g., via latent variable sharing or re-encoding) improves information flow (He et al., 2019, Liang et al., 2020).
Machine Reading Comprehension (MRC) Framing: Dual-MRC approaches formulate aspect extraction and sentiment classification as question-answering problems—predict all aspect spans with a fixed query, then, for each span, predict sentiment and (optionally) associated opinion terms, sharing BERT parameters across sub-problems (Mao et al., 2021).
Dependency-Aware and Graph Neural Networks: Syntactic structures are used to propagate cues, with graph convolution over dependency relations and explicit modeling of relation types to improve extraction and sentiment consistency (Liang et al., 2020).
Generative / Seq2Seq Models: Recent work has unified ABSA subtasks as sequence generation problems using pre-trained sequence-to-sequence architectures (BART, T5). All aspect–sentiment pairs are linearized as pointer indices or verbalized strings and generated directly, simplifying output structures and supporting multi-element extraction (Yan et al., 2021, Chebolu et al., 2021, Cai et al., 2023).
Pipeline Baselines and LLM Augmentation: Early systems decoupled aspect and polarity in a two-stage pipeline; newer pipelines enrich candidate selection via LLMs (e.g., LLaMA front end) to bridge cross-domain gaps, with a frozen backbone for efficient transfer (Ghosh et al., 15 Jan 2025).

3. Training, Inference, and Architectural Details

Pre-training and fine-tuning of LLMs (PLMs) have enabled high-accuracy E2E-ABSA without extensive architectural innovations. For unified tagging, BERT fine-tuned with a linear tagging head achieves state-of-the-art, outperforming BiLSTM-CRF and earlier hybrids (Li et al., 2019). Training typically minimizes global token-wise cross-entropy, with domain adaptation and multi-task objectives integrated by joint loss functions

$L_{\text{total}} = \alpha L_{\text{aspect}} + \beta L_{\text{sentiment}}$

for label-weighted tasks (Ghosh et al., 15 Jan 2025). Dual-MRC models train on paired question–context input representations; generative models treat tuples as target sequences and optimize decoder negative log-likelihood without auxiliary losses (Mao et al., 2021, Yan et al., 2021).

Inference (for unified tagging or dual-task models) involves extracting span boundaries aligned with consistent polarity labels; in generation-based approaches, post-processing translates outputs into tuple sets, applying pointer de-tokenization or template-based parsing as appropriate (Chebolu et al., 2021, Yan et al., 2021).

4. Datasets, Metrics, and Baseline Results

SemEval ABSA tasks (2014–2016) for laptops and restaurants are the principal benchmarks, with thousands of annotated review sentences and aspect spans (Ghosh et al., 15 Jan 2025, Zhang et al., 2022). Recent large-scale datasets (MEMD-ABSA) annotate aspect-category-opinion-sentiment quadruples across five domains, with explicit and implicit span marking (Cai et al., 2023). Evaluation relies on exact-match F1 for aspect-sentiment pairs, with precision and recall computed as

$F_1 = \frac{2\cdot P \cdot R}{P + R}$

for correctly predicted tuples. Macro-averaged F1 across categories is common for domain-heterogeneous datasets (Shu et al., 2022).

Baseline performance spans:

Model/Setting	Dataset	Aspect+Polarity F1 (%) / ACC (%)
BERT+LLM pipeline	SemEval-15	92.1 (Laptop), 91.4 (Restaurant)
BERT+linear tagging	SemEval-14	60.4 (Laptop), 73.2 (Restaurant)
T5-Seq2Seq	MEMD	83.1 (AE F1, Laptop), 76.0 (ASPE F1)
MRC-dual BERT	WangPDX17	76.6 (AESC F1), 74.9 (Pair F1)

The BERT+LLM pipeline achieves up to 92% accuracy on SemEval-2015 cross-domain benchmarks without requiring BERT re-training per target domain, markedly outperforming prior memory network and vanilla BERT fine-tuning by over 10% absolute (Ghosh et al., 15 Jan 2025). Generative models (BART, T5) excel in both in-domain extraction and cross-domain resilience, especially for complex and implicit-structure benchmarks (MEMD-ABSA, ACOS) (Cai et al., 2023).

5. Cross-Domain, Zero-Shot, and Transfer Techniques

Robust transfer across domains is a persistent challenge. Methods include:

LLM-guided transfer: An LLM (e.g., LLaMA) is prompted with external schemata to surface aspect candidates in the target domain, plugging terminology gaps before BERT processing. The system, trained once on source domains, can generalize directly, requiring only inference-time LLM queries—demonstrated by 90–92% accuracy when transferring between SemEval-2015 domains (Ghosh et al., 15 Jan 2025).
Contrastive post-training for zero-shot: Unified NLI formulations augmented by constrastive learning on review-based hypotheses (CORN) enable fully zero-shot E2E-ABSA, where aspect extraction and sentiment classification are both cast as NLI problems and handled by a BART encoder (Shu et al., 2022).
Multi-domain/Implicit Element Coverage: Datasets such as MEMD-ABSA enable evaluation of cross-domain and implicit aspect/opinion extraction. Transfer performance typically degrades by 10–40 F1 points (in AOS tasks) but partial recovery via multi-source mixing is possible (Cai et al., 2023).

6. Limitations, Open Problems, and Future Directions

Several active challenges structure current research on E2E-ABSA:

Implicit and Overlapping Structures: Most systems handle only explicit, non-overlapping spans, but up to 40% of annotated quadruples in recent datasets contain implicit aspects or opinions, reducing F1 by 5–10 points and demanding richer semantic modeling (Cai et al., 2023).
Cross-domain Robustness: Despite LLM augmentation, models are commonly validated on laptops/restaurants; generalization to open domain (e.g., clothing, books, hotels) remains an open gap. Explicit adversarial/contrastive objectives and meta-learning architectures are under-explored (Ghosh et al., 15 Jan 2025, Cai et al., 2023).
Pipeline versus Joint Optimization: Many workable systems still retain pipelined phase order (aspect first, then sentiment), leading to error propagation and suboptimal consistency. Fully joint or sequence-to-sequence structures (pointer generation, text-to-text) alleviate some coupling issues but depend on data scale and careful decoding (Yan et al., 2021).
Annotation Cost and Efficiency: Unified E2E-ABSA evaluation requires annotation of full span+polarity pairings. Data collection, especially with implicit element marking, remains labor-intensive, motivating semi-supervised, weakly supervised, or cross-lingual transfer techniques (Zhang et al., 2022, Yadav, 2020).
Model Size and Interpretability: PLMs and seq2seq architectures dominate in accuracy, but their efficiency for deployment and interpretability for practical use are still insufficiently addressed. Adapter modules, model distillation, and graph-based explainability are promising but not widely adopted (Zhang et al., 2022, Liang et al., 2020).

7. Synthesis and State of the Art

E2E-ABSA has matured from feed-forward or BiLSTM-based pipelines to joint neural, dependency-augmented, and generative architectures, nearly always leveraging pre-trained LLMs. Empirical results demonstrate that BERT-based models, with or without LLM augmentation or dual-MRC prompting, deliver high performance in both in-domain and cross-domain settings (e.g., 92% on SemEval-2015, >76% F1 on more complex multi-domain data) (Ghosh et al., 15 Jan 2025, Cai et al., 2023). Generative text-to-text reformulations align all ABSA subtasks, including E2E-ABSA, under a common framework, supporting unified modeling and transfer (Chebolu et al., 2021, Yan et al., 2021). Despite these advances, robust cross-domain generalization, implicit/complex structure coverage, and efficient, explainable deployment remain open and active areas of research (Cai et al., 2023, Zhang et al., 2022).