Learning a Structural Causal Model for Intuition Reasoning in Conversation

Published 28 May 2023 in cs.CL | (2305.17727v2)

Abstract: Reasoning, a crucial aspect of NLP research, has not been adequately addressed by prevailing models including LLM. Conversation reasoning, as a critical component of it, remains largely unexplored due to the absence of a well-designed cognitive model. In this paper, inspired by intuition theory on conversation cognition, we develop a conversation cognitive model (CCM) that explains how each utterance receives and activates channels of information recursively. Besides, we algebraically transformed CCM into a structural causal model (SCM) under some mild assumptions, rendering it compatible with various causal discovery methods. We further propose a probabilistic implementation of the SCM for utterance-level relation reasoning. By leveraging variational inference, it explores substitutes for implicit causes, addresses the issue of their unobservability, and reconstructs the causal representations of utterances through the evidence lower bounds. Moreover, we constructed synthetic and simulated datasets incorporating implicit causes and complete cause labels, alleviating the current situation where all available datasets are implicit-causes-agnostic. Extensive experiments demonstrate that our proposed method significantly outperforms existing methods on synthetic, simulated, and real-world datasets. Finally, we analyze the performance of CCM under latent confounders and propose theoretical ideas for addressing this currently unresolved issue.

Abstract PDF Upgrade to Chat

Authors (5)

Citations (8)

View on Semantic Scholar

Summary

The paper introduces a novel structural causal model that integrates a cognitive conversation model with deep learning to capture explicit utterances and latent mental states.
It transforms a psychological framework into an actionable SCM using causal inference assumptions and a VAE with GNNs to learn dialogue dynamics.
Experimental results show superior performance in cause extraction tasks with high interpretability and robust causal discriminability compared to existing baselines.

This paper tackles the challenge of intuitive reasoning in conversations, an area where even LLMs struggle due to the lack of robust cognitive models that capture the complex interplay of factors influencing dialogue. The authors propose a novel approach combining a cognitive model, a structural causal model (SCM), and a probabilistic deep learning implementation.

1. Conversation Cognitive Model (CCM)

Concept: Inspired by theories in psychology, linguistics, and sociology (particularly "common ground"), the CCM provides a conceptual framework for understanding dialogue flow.
Components: It posits that a speaker's Plan (intent) arises from their Perception of previous utterances and their internal Mental State (e.g., beliefs, emotions, memory). This Plan manifests as observable Utterance and Action. The generated Utterance then influences the subsequent Perception and Mental State of participants, driving the conversation forward recursively.
Purpose: The CCM serves as a theoretical foundation, unifying various factors known to influence conversation (context, emotion, speaker state) and explaining the inference processes in tasks like Emotion Recognition in Conversation (ERC) and Conversation Generation (CG).

2. SCM Transformation

Goal: To make the CCM computationally tractable, it's algebraically transformed into a Structural Causal Model (SCM).
Process: This involves simplification using standard causal inference assumptions:
- Omitting unobservable mediator nodes (like Plan).
- Omitting non-descendant child nodes (like Action).
- Treating observable context (Perception and observable aspects of Mental State) as direct influences on Utterance.
- Treating unobservable Mental State components (unique experiences, desires) as exogenous variables (noise terms).
Resulting SCM: The simplified model represents utterances ( $U$ $U$ ) as observable endogenous variables and unobservable mental states ( $E$ $E$ ) as exogenous variables. Each utterance $u_i$ $u_{i}$ is caused by a combination of previous utterances (explicit causes, $Pa(u_i)$ $P a (u_{i})$ ) and its corresponding mental state (implicit cause, $e_i$ $e_{i}$ ).
- Mathematically: $u_i = f_i(Pa(u_i)) + e_i$ .
- In linear matrix form for utterance embeddings $H$ : $H = AH + E$ , where $A$ is the causal strength matrix (adjacency matrix of the causal graph) and $E$ is the matrix of implicit causes. This implies $H = (I-A)^{-1}E$ .

3. Probabilistic Implementation (VAE Framework)

Challenge: The implicit causes $E$ are unobservable.

Solution: The SCM is implemented using a Variational Autoencoder (VAE) framework to handle these latent variables.

Encoder ( $q_{\varphi}(z|H)$ ): Takes utterance embeddings $H$ as input. It uses a Graph Attention Network (GAT) to learn the causal adjacency matrix $A$ (representing causal strengths between utterances) and infer the parameters of the posterior distribution $q$ over latent variables $z$ (representing the implicit causes $E$ ).

# Conceptual Encoder Steps
# H_in: Input utterance embeddings (N x D)
# A_learned, H_encoded = GAT_Encoder(H_in) # GAT learns A and encodes H
# z_params = MLP(H_encoded) # Predict parameters (mean, variance) for q(z|H)
# z = sample_latent(z_params) # Sample latent implicit causes
# A_learned: Learned causal adjacency matrix (N x N)
# z: Sampled latent variables (N x D_latent or similar)

Decoder ( $p_{\theta}(H|z)$ ): Takes the sampled latent variables $z$ and the learned causal structure (using $(I-A)^{-1}$ ) as input. It uses a Graph Neural Network (GNN) to reconstruct the original utterance embeddings $\widehat{H}$ .

# Conceptual Decoder Steps
# z: Sampled latent variables
# A_learned: Learned causal adjacency matrix from encoder
# H_reconstructed = GNN_Decoder(z, A_learned) # Reconstruct H using z and causal structure

Training Objective (ELBO): The model is trained by maximizing the Evidence Lower Bound (ELBO), which consists of:
Reconstruction Loss: Measures the difference between original embeddings $H$ and reconstructed embeddings $\widehat{H}$ (e.g., Mean Squared Error - MSE).
KL Divergence: Regularizes the inferred latent distribution $q_{\varphi}(z|H)$ to be close to a prior distribution $p(z)$ (e.g., standard Gaussian), ensuring smooth latent space and acting as a regularizer.

$\mathcal{L}_{ELBO} = \mathbb{E}_{q_{\varphi}(z|H)}[\log p_{\theta}(H|z)] - KL(q_{\varphi}(z|H) || p(z))$

4. Datasets for Evaluation

Recognizing the lack of suitable data, the paper introduces two new datasets:
- Synthetic Dataset: Generated using SCM principles (Causal Additive Models) with known ground truth for both explicit utterance relationships and implicit causes (mapped from emotion labels in a reference dataset). Statistics match the RECCON dataset.
- Simulation Dataset: Generated using GPT-4, prompted to create 4-utterance dialogues following specific predefined causal chain structures (Chain_I to Chain_IV). Provides ground truth for explicit utterance-level causal links.
Real-World Dataset: Uses RECCON [poria2021recognizing] for benchmarking, which contains partial labels for emotion-cause utterance pairs.

5. Experiments and Results

Tasks:
- Explicit Cause Extraction (ECE): Identifying $U_j \rightarrow U_i$ links. Measured by F1 score.
- Implicit Cause Extraction (ICE): Evaluating sentiment consistency between inferred implicit causes $E_i$ and utterances $H_i$ (proxy for interpretability). Measured by F1 score (target > 80).
Key Findings:
- The proposed model significantly outperforms baselines (RoBERTa, GNNs like EGAT/DECN/DAG-ERC/CAE, LLMs like GPT-3.5/GPT-4) on the ECE task across all datasets.
- Achieves high F1 scores (>96) on the ICE task, indicating the inferred implicit causes are interpretable and align semantically with utterances.
- Ablation studies confirm the importance of the VAE structure, the GNN components, and using the learned causal matrix $A$ consistently.
- Visualizations show the model learns meaningful latent representations for implicit causes and avoids learning shortcut causal links compared to baselines.
- The model demonstrates better causal discriminability than baselines, being more robust in distinguishing true causes from confounders or reversed pairs, though still imperfect.
- The SCM structure inherently helps mitigate confounding effects better than non-SCM approaches.

6. Discussion and Limitations

The paper discusses the problem of latent confounders (unobserved variables affecting multiple utterances), showing their SCM approach offers some mitigation but doesn't fully solve it.
Future work might involve causal intervention techniques (do-calculus) to address confounding more directly, requiring datasets designed for such analysis and potentially new methods for estimating interventional distributions.
The simulation dataset has limitations (only chain structures, potential GPT-4 biases).

In Summary:

This paper presents a principled approach to conversational reasoning by grounding a deep learning model in cognitive theory and causal modeling. It introduces the CCM for conceptual understanding, translates it to a computable SCM separating explicit (utterance) and implicit (mental state) causes, and implements this using a VAE with GNNs. The practical outcome is a model that demonstrably improves utterance-level causal reasoning compared to existing methods, offers interpretable insights into implicit factors, and provides a framework for more robust dialogue understanding. The introduction of new synthetic and simulation datasets also provides valuable resources for future research in this area.

Markdown Report Issue