Papers
Topics
Authors
Recent
Search
2000 character limit reached

DS-TOD: Efficient Domain Specialization for Task Oriented Dialog

Published 15 Oct 2021 in cs.CL | (2110.08395v2)

Abstract: Recent work has shown that self-supervised dialog-specific pretraining on large conversational datasets yields substantial gains over traditional language modeling (LM) pretraining in downstream task-oriented dialog (TOD). These approaches, however, exploit general dialogic corpora (e.g., Reddit) and thus presumably fail to reliably embed domain-specific knowledge useful for concrete downstream TOD domains. In this work, we investigate the effects of domain specialization of pretrained LLMs (PLMs) for TOD. Within our DS-TOD framework, we first automatically extract salient domain-specific terms, and then use them to construct DomainCC and DomainReddit -- resources that we leverage for domain-specific pretraining, based on (i) masked language modeling (MLM) and (ii) response selection (RS) objectives, respectively. We further propose a resource-efficient and modular domain specialization by means of domain adapters -- additional parameter-light layers in which we encode the domain knowledge. Our experiments with prominent TOD tasks -- dialog state tracking (DST) and response retrieval (RR) -- encompassing five domains from the MultiWOZ benchmark demonstrate the effectiveness of DS-TOD. Moreover, we show that the light-weight adapter-based specialization (1) performs comparably to full fine-tuning in single domain setups and (2) is particularly suitable for multi-domain specialization, where besides advantageous computational footprint, it can offer better TOD performance.

Citations (30)

Summary

  • The paper introduces DS-TOD, a framework that injects domain-specific knowledge into PLMs via targeted term extraction, domain pretraining, and efficient adapter-based specialization.
  • It leverages domain-specific corpora such as DomainCC and DomainReddit to improve performance on task-oriented dialog tasks like Dialog State Tracking and Response Retrieval.
  • Experimental evaluations demonstrate significant performance gains with reduced computational costs, highlighting DS-TOD’s effectiveness in cross-domain and multi-domain settings.

DS-TOD: Efficient Domain Specialization for Task Oriented Dialog

The paper "DS-TOD: Efficient Domain Specialization for Task-Oriented Dialog" (2110.08395) presents a framework for domain specialization of Pretrained LLMs (PLMs) tailored for Task-Oriented Dialog (TOD) systems. The framework, DS-TOD, aims to inject domain-specific knowledge into PLMs to enhance their performance on downstream TOD tasks.

Introduction to DS-TOD

Task-Oriented Dialog systems are prevalent in applications where conversational agents assist in accomplishing specific tasks like booking a taxi or ordering food. Most recent TOD systems leverage fine-tuning of PLMs such as BERT and GPT-2 to achieve state-of-the-art results. However, these models, when pretrained on general dialog corpora like Reddit, may not capture domain-specific nuances essential for specific TOD domains.

DS-TOD addresses this by creating domain-specialized PLMs through three primary steps:

  1. Domain-Specific Term Extraction: Extract salient domain-specific terms from a TOD corpus, creating resources like DomainCC and DomainReddit.
  2. Pretraining on Domain-Specific Data: Use Masked Language Modeling and Response Selection objectives on these resources to perform domain-specific pretraining.
  3. Resource-Efficient Specialization via Domain Adapters: Introduce additional parameter-light layers (domain adapters) to encode domain-specific knowledge, providing an efficient specialization mechanism. Figure 1

    Figure 1: Overview of DS-TOD. Three different specialization objectives for injecting domain-specific knowledge into PLMs.

Domain-Specific Data Collection

The approach leverages the multi-domain MultiWOZ dataset to focus on five domains: Taxi, Restaurant, Hotel, Train, and Attraction. Domain-specific terms are identified using TF-IDF across domain-specific dialogs. These terms are then employed to filter relevant content from large corpora like CCNet and Reddit, resulting in the DomainCC and DomainReddit resources.

Domain-specific corpora such as DomainCC provide flat text data, while DomainReddit offers dialogic data, both facilitating effective pretraining for each domain. Additionally, salient n-grams identified for each domain enable targeted filtering of relevant data, ensuring the models are exposed to domain-relevant linguistic patterns.

Training Objectives

DS-TOD explores multiple training objectives:

  • Masked Language Modeling (MLM): Performed on DomainCC, MLM adapts PLMs to domain-specific content by dynamically masking segments of the input text.
  • Response Selection (RS): Implemented on DomainReddit, RS-Class entails binary classification to identify correct responses in dialogues, while RS-Contrast employs a contrastive learning framework, enhancing conversational structure embedding.

Adapter-Based Specialization

To mitigate computational costs and catastrophic forgetting, DS-TOD employs adapters—parameter-light modules integrated into PLMs. These adapters enable efficient domain knowledge encoding, allowing for dynamic domain specialization without full model fine-tuning. This method proves beneficial, especially in multi-domain settings where combining domain-specific adapters enhances model performance without requiring extensive retraining.

Experimental Evaluation

Experiments conducted on MULTIWOZ domains validate the effectiveness of DS-TOD. Domain-specialized models significantly outperform baseline PLMs on Dialog State Tracking (DST) and Response Retrieval (RR) tasks. Figure 2

Figure 2: Sample efficiency of DS-TOD for DST: joint goal accuracy for different portions of downstream training data.

Results demonstrate that domain specialization, particularly via RS objectives, provides consistent performance improvements. Furthermore, adapter-based specialization achieves comparable results to full fine-tuning, underscoring its efficiency.

Cross-Domain Transfer and Multi-Domain Specialization

DS-TOD also explores cross-domain transfer capabilities, revealing that domain-specialized models exhibit performance gains in related domains, suggesting a promising avenue for leveraging domain interdependencies in TOD systems. The approach also supports efficient multi-domain specialization through domain adapter stacking and fusion, facilitating versatile multi-domain dialog systems. Figure 3

Figure 3: Relative improvements in cross-domain DST transfer using DS-TOD.

Conclusion

The DS-TOD framework advances TOD systems by effectively integrating domain-specific knowledge into PLMs. This approach not only enhances TOD performance across various domains but also reduces computational demands through efficient domain specialization techniques. DS-TOD paves the way for more adaptive, domain-aware TOD models capable of handling diverse, real-world dialog scenarios. Future research will focus on expanding this specialization to encompass additional languages and tasks, ensuring broader applicability in multilingual and multi-functional dialog systems.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 7 likes about this paper.