Multi-Scale and Multi-Objective Optimization for Cross-Lingual Aspect-Based Sentiment Analysis

Published 19 Feb 2025 in cs.CL | (2502.13718v1)

Abstract: Aspect-based sentiment analysis (ABSA) is a sequence labeling task that has garnered growing research interest in multilingual contexts. However, recent studies lack more robust feature alignment and finer aspect-level alignment. In this paper, we propose a novel framework, Multi-Scale and Multi-Objective optimization (MSMO) for cross-lingual ABSA. During multi-scale alignment, we achieve cross-lingual sentence-level and aspect-level alignment, aligning features of aspect terms in different contextual environments. Specifically, we introduce code-switched bilingual sentences into the language discriminator and consistency training modules to enhance the model's robustness. During multi-objective optimization, we design two optimization objectives: supervised training and consistency training, aiming to enhance cross-lingual semantic alignment. To further improve model performance, we incorporate distilled knowledge of the target language into the model. Results show that MSMO significantly enhances cross-lingual ABSA by achieving state-of-the-art performance across multiple languages and models.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces the Multi-Scale and Multi-Objective optimization (MSMO) framework for cross-lingual Aspect-Based Sentiment Analysis, achieving state-of-the-art results through multi-level feature alignment.
Key techniques include adversarial training for sentence alignment, consistency training for aspect alignment, and utilizing code-switched data to improve feature robustness.
The framework employs multi-objective optimization combining supervised and consistency training losses and can be further enhanced using knowledge distillation.

The paper "Multi-Scale and Multi-Objective Optimization for Cross-Lingual Aspect-Based Sentiment Analysis" (2502.13718) introduces a novel Multi-Scale and Multi-Objective optimization (MSMO) framework to enhance cross-lingual Aspect-Based Sentiment Analysis (ABSA). The framework addresses the limitations of existing methods by focusing on robust feature alignment and finer aspect-level alignment across languages.

MSMO Framework Architecture

The MSMO framework's architecture is composed of a feature extractor, a language discriminator, a consistency training module, and a sentiment classifier. The framework proceeds in two primary stages: sentence-level alignment using adversarial training and aspect-level alignment using multi-objective optimization.

Sentence-Level Alignment

Sentence-level alignment is achieved through adversarial training. A language discriminator is employed to differentiate between source and target languages. To improve the model's robustness, the technique of code-switched bilingual sentences is introduced. This involves substituting aspect terms in source language sentences with their counterparts in the target language and vice versa. The language discriminator is trained to identify the origin of the sentence, even when aspect terms have been swapped. This process forces the feature extractor to acquire language-invariant features. A gradient reversal layer connects the language discriminator to the encoder. The objective of this stage is to minimize the Wasserstein distance between the feature distributions of the source and target languages. This can be expressed mathematically as:

$\min_G \max_D W(p_r, p_g) = \mathbb{E}_{x \sim p_r}[D(x)] - \mathbb{E}_{z \sim p_z}[D(G(z))]$ ,

where $G$ is the feature extractor (generator), $D$ is the language discriminator, $p_r$ is the real data distribution, $p_g$ is the generated data distribution, and $W(p_r, p_g)$ is the Wasserstein distance.

Aspect-Level Alignment

Aspect-level alignment focuses on finer-grained alignment. The pre-trained multilingual encoder, which has been updated during the sentence-level alignment stage, is used to extract features. These features are then input into both a sentiment classifier (for supervised training) and a consistency training module.

Supervised Training

The sentiment classifier is trained using the standard cross-entropy loss to predict the sentiment polarity of aspect terms. Given an aspect term $a$ and its context $c$ , the sentiment classifier predicts a sentiment label $y$ . The cross-entropy loss is defined as:

$L_{CE} = -\sum_{i=1}^{N} y_i \log(\hat{y}_i)$ ,

where $y_i$ is the true sentiment label, $\hat{y}_i$ is the predicted sentiment probability, and $N$ is the number of training examples.

Consistency Training

This module enforces consistency in predictions for aspect terms that express the same sentiment across different languages. This is achieved by applying transformations to the input, such as translating the sentence or swapping aspect terms using the code-switched data. The model is then encouraged to produce consistent predictions for the original and transformed inputs. KL divergence is used to measure the difference between the probability distributions of aspect terms in the source and target languages, and a consistency loss is minimized. The consistency loss can be formulated as:

$L_{cons} = KL(p(y|x), p(y|x'))$ ,

where $x$ is the original input, $x'$ is the transformed input, $p(y|x)$ is the predicted probability distribution for the original input, and $p(y|x')$ is the predicted probability distribution for the transformed input.

Multi-Objective Optimization

The overall training objective combines the supervised training and consistency training losses. A weighted sum of these losses is used to optimize the model:

$L_{total} = \alpha L_{CE} + (1 - \alpha) L_{cons}$ ,

where $\alpha$ is a hyperparameter that balances the contribution of the two losses.

Key Techniques

Code-Switched Bilingual Sentences

The introduction of code-switched data is a critical technique. By swapping aspect terms between the source and target languages, the model is exposed to perturbations that force it to learn more robust and language-invariant features. This helps align the embedding spaces of the source and target languages, particularly around the anchor aspects.

Multi-Scale Alignment

The framework performs alignment at two scales: sentence-level (through adversarial training) and aspect-level (through consistency training). This multi-scale approach allows for a more comprehensive alignment of features across languages.

Distilled Target Language Knowledge

The paper explores knowledge distillation as a means to further improve performance. Unlabeled data in the target language is used to train a "student" model, guided by the predictions of a "teacher" model trained on labeled data. The paper examines single-teacher, multi-teacher, and multilingual distillation strategies. The teacher model is trained with the MSMO framework. The knowledge distillation loss can be defined as:

$L_{KD} = \sum_{x \in D_{unlabeled}} KL(p_T(y|x), p_S(y|x))$ ,

where $D_{unlabeled}$ is the unlabeled target language data, $p_T(y|x)$ is the prediction of the teacher model, and $p_S(y|x)$ is the prediction of the student model.

In conclusion, the MSMO framework presents an effective method for cross-lingual ABSA. It combines adversarial training for sentence-level alignment, consistency training for aspect-level alignment, and multi-objective optimization. The use of code-switched data and knowledge distillation further improves the model's performance, achieving state-of-the-art results.

Markdown Report Issue