Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots

Published 6 Dec 2016 in cs.CL | (1612.01627v2)

Abstract: We study response selection for multi-turn conversation in retrieval-based chatbots. Existing work either concatenates utterances in context or matches a response with a highly abstract context vector finally, which may lose relationships among utterances or important contextual information. We propose a sequential matching network (SMN) to address both problems. SMN first matches a response with each utterance in the context on multiple levels of granularity, and distills important matching information from each pair as a vector with convolution and pooling operations. The vectors are then accumulated in a chronological order through a recurrent neural network (RNN) which models relationships among utterances. The final matching score is calculated with the hidden states of the RNN. An empirical study on two public data sets shows that SMN can significantly outperform state-of-the-art methods for response selection in multi-turn conversation.

Abstract PDF Upgrade to Chat

Citations (488)

View on Semantic Scholar

Summary

The paper introduces SMN, which decomposes context-response matching into utterance-level pairings to effectively preserve conversational context.
The model employs GRU and convolution operations to extract and accumulate matching vectors from individual utterance-response pairs.
Empirical tests on Ubuntu and Douban datasets show SMN outperforms baselines with significant improvements in key response selection metrics.

Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots

The paper presents a novel architecture called Sequential Matching Network (SMN) aimed at improving multi-turn response selection in retrieval-based chatbots. Unlike existing methods that utilize overly abstract context representations, SMN ensures that significant contextual relationships are preserved.

Problem Addressed

Current retrieval-based chatbot systems either concatenate utterances or combine them into high-level abstract vectors, often losing crucial relational data among utterances. The SMN addresses this challenge by decomposing context-response matching into individual utterance-response pair matchings, subsequently integrating these through a Recurrent Neural Network (RNN) that accounts for the chronological dependencies among utterances.

Architectural Overview

The SMN architecture consists of three primary layers:

Utterance-Response Pair Matching: The model matches each response candidate with individual utterances from the context on word and segment levels using word embeddings and a Gated Recurrent Unit (GRU). Essential matching information is extracted and encoded through convolution and pooling operations into a matching vector.
Sequential Accumulation: These matching vectors are inputs for a GRU, which accumulates matching information according to the chronological sequence of utterances. This allows the model to capture dependencies and relationships between context utterances effectively.
Final Matching Score Computation: The accumulated data is processed using a logit model to produce the final context-response matching score.

Empirical Evaluation

The SMN was empirically validated using two datasets: the Ubuntu Dialogue Corpus and a newly proposed Douban Conversation Corpus. The key results include:

On the Ubuntu dataset, SMN outperformed the best existing models with over a 6% improvement on the R $_{10}$ @1 metric.
On the Douban dataset, which features human-labeled multi-turn conversations, SMN showed a 3% improvement on R $_{10}$ @1 and a 4% on P@1, demonstrating its robustness in diverse conversational settings.

Implications and Future Directions

The architectural design and empirical results suggest that the SMN effectively preserves and utilizes complex conversational contexts, enhancing multi-turn response selection. The direct engagement with each utterance at the matching phase further strengthens the interpretability and efficiency of the model.

For future research, exploring enhancements in candidate retrieval and improving logical consistency in mult-turn dialogues could further bolster the effectiveness of retrieval-based chatbots. The introduction of a human-labeled data set also opens avenues for more nuanced evaluations and model training, promoting advancements in conversational AI systems.

This research marks a significant advancement in multi-turn interaction scenarios, aligning with practical and theoretical developments in the field of AI-driven communication.

Markdown Report Issue