BiGEL: Behavior-Informed Graph Embedding

Updated 19 January 2026

The paper introduces BiGEL, a novel graph framework that integrates target and auxiliary behaviors through feedback-driven optimization.
It leverages Cascading Gated Feedback (CGF) along with Global Context Enhancement and Contrastive Preference Alignment to balance and refine multi-task learning.
Empirical results show that BiGEL achieves superior performance on both target and auxiliary recommendation tasks compared to conventional models.

Behavior-Informed Graph Embedding Learning (BiGEL) is a graph-based framework for multi-behavior multi-task recommendation, introducing an architecture that integrates feedback from target behaviors to auxiliary ones, augmenting recommendation quality across all behaviors. BiGEL's distinguishing feature is its feedback-driven optimization, built atop a cascading graph paradigm and enhanced via three core modules: Cascading Gated Feedback (CGF), Global Context Enhancement (GCE), and Contrastive Preference Alignment (CPA). This design addresses the limitations of prior multi-behavior recommendation models, particularly their neglect of auxiliary behavior performance in multi-task settings (Lai et al., 12 Jan 2026).

1. Background and Motivation

The multi-behavior recommendation (MBR) problem leverages multiple interaction types—such as click, favorite, and purchase—on user-item bipartite graphs to improve downstream recommendation accuracy. Traditional approaches typically adopt a cascading graph structure, modeling auxiliary behaviors to inform the prediction of target (e.g., purchase) behaviors. However, these models optimize for the target behavior, often at the expense of under-training auxiliary behaviors, reducing overall system robustness in multi-task contexts. The need for balanced, task-aware optimization led to the proposal of BiGEL as a unified architecture for multi-behavior multi-task recommendation (MMR) (Lai et al., 12 Jan 2026).

2. Architectural Overview of BiGEL

The BiGEL model consists of three sequential modules applied after graph structure-based feature extraction:

Cascading Graph Learning (CGL): Generates initial, behavior-specific node embeddings $\mathbf{e}_u^{(b_k)}$ , $\mathbf{e}_i^{(b_k)}$ for each of $K$ behaviors through a cascaded graph convolutional update.
Cascading Gated Feedback (CGF): Refines auxiliary behavior embeddings by integrating feedback from the target behavior using learned, behavior-specific gates.
Global Context Enhancement (GCE): Injects global user preferences into behavior-aware embeddings, alleviating overfitting to isolated behaviors.
Contrastive Preference Alignment (CPA): Aligns target behavior preferences to the global preference distribution via contrastive learning, mitigating preference drift from the cascading process.

Information flows from raw graph embeddings through CGL, into CGF where behavioral optimization occurs, then into GCE and CPA which reinforce and align embedding semantics. The process supports the generation of high-quality personalized lists for each behavioral task (Lai et al., 12 Jan 2026).

3. Cascading Gated Feedback Module

The CGF module is the central mechanism for feedback-driven embedding refinement in BiGEL. For each auxiliary behavior $b_k$ $(1 \leq k < K)$ , CGF receives:

Behavior-specific user/item embeddings $\mathbf{e}_u^{(b_k)}$ and $\mathbf{e}_i^{(b_k)}$ (from CGL)
Target-behavior embeddings $\mathbf{e}_u^{(b_K)}$ and $\mathbf{e}_i^{(b_K)}$

Transformation proceeds as follows for a $d$ -dimensional embedding:

Projection and Activation:

$h_u^{(b_k)} = \mathrm{LeakyReLU}\bigl(W_1^{(b_k)} \mathbf{e}_u^{(b_k)} + b_1^{(b_k)}\bigr) \in \mathbb{R}^d$

Gate Computation:

$g_u^{(b_k)} = \sigma\bigl(W_2^{(b_k)} h_u^{(b_k)} + b_2^{(b_k)}\bigr) \in \mathbb{R}^d$

where $\sigma$ is the element-wise sigmoid.

Feedback Integration:

$\widetilde{\mathbf{e}_u^{(b_k)}} = \mathbf{e}_u^{(b_k)} + g_u^{(b_k)} \otimes \mathbf{e}_u^{(b_K)}$

with $\otimes$ as the Hadamard (element-wise) product. The same applies to items.

All $W_1^{(b_k)}$ , $W_2^{(b_k)}$ (matrices of $\mathbb{R}^{d\times d}$ ) and $b_1^{(b_k)}$ , $b_2^{(b_k)}$ (vectors of $\mathbb{R}^d$ ) are jointly learned. The output $\widetilde{\mathbf{e}_u^{(b_k)}}$ (and item counterpart) supplant the original embeddings for all downstream modules.

CGF ensures that target behavior signals (such as purchases) are propagated back, correcting and denoising auxiliary behaviors (e.g., clicks, favorites), counteracting the unidirectional bias of prior cascading schemes. Empirical ablations demonstrate superior performance on both target and auxiliary tasks (Lai et al., 12 Jan 2026).

4. Global Context Enhancement and Contrastive Alignment

Following CGF, BiGEL employs:

GCE: This module aggregates embeddings across all behaviors into a unified representation, integrating this global context with each behavior-specific embedding. This mitigates the loss of key user preferences due to over-specialization in individual behavior graphs.
CPA: The CPA module leverages a contrastive loss to align the distribution of the target behavior embeddings $\mathbf{e}_u^{(b_K)}$ with the global embedding distribution, encouraging consistency and discouraging preference drift through the cascade.

Both modules promote stability and generalization in multi-task settings, crucial for robust recommendation when user preference signals are distributed across disjoint interaction behaviors (Lai et al., 12 Jan 2026).

5. Training and Optimization in BiGEL

BiGEL is trained end-to-end via backpropagation. The loss function is a combination of:

Task-specific Bayesian Personalized Ranking (BPR) losses for each behavioral task
The contrastive alignment loss (CPA) to enforce alignment between target and global preferences

CGF does not introduce additional explicit loss terms; its parameters are optimized via gradient flow from the downstream multi-task and contrastive losses. The LeakyReLU activation in CGF typically uses a fixed negative slope (commonly 0.01). Sigmoid gates enable element-wise feedback control, producing weights in $(0,1)$ for selective target signal integration (Lai et al., 12 Jan 2026).

6. Empirical Performance and Comparative Analysis

BiGEL demonstrates superior performance relative to ten competitive methods across two real-world datasets in multi-behavior multi-task recommendation scenarios. Ablation studies confirm that bidirectional information flow—enabled by CGF—substantially enhances auxiliary task performance without sacrificing target behavior accuracy. This suggests that feedback-driven refinement is critical for balanced multi-task recommendation (Lai et al., 12 Jan 2026).

Module	Role in BiGEL	Methodological Basis
CGL	Cascading graph embedding generation	Multi-behavior GCN
CGF	Feedback-driven behavior refinement	Gated integration of target signals
GCE	Global context injection	Aggregation and fusion
CPA	Preference alignment (contrastive)	Metric learning/contrastive loss

7. Relationship to Cascading Gated Feedback Across Domains

The CGF module in BiGEL extends a family of "gated feedback" mechanisms originally proposed for dense semantic label refinement (Islam et al., 2018), deep recurrent architectures (Chung et al., 2015), super-resolution networks (Li et al., 2019), and biochemical switching/oscillation motifs (Ehrmann et al., 2019). In all such systems, cascading gated feedback provides input-dependent, learnable routing of information between layers, across time, or across tasks, enabling selective integration and robust multi-stage optimization. The adoption of CGF in BiGEL exemplifies the transfer of this meta-architectural motif from vision and recurrent domains to graph-based multi-task recommendation (Lai et al., 12 Jan 2026).