Striking a Balance between Classical and Deep Learning Approaches in Natural Language Processing Pedagogy

Published 16 May 2024 in cs.CL | (2405.09854v2)

Abstract: While deep learning approaches represent the state-of-the-art of NLP today, classical algorithms and approaches still find a place in NLP textbooks and courses of recent years. This paper discusses the perspectives of conveners of two introductory NLP courses taught in Australia and India, and examines how classical and deep learning approaches can be balanced within the lecture plan and assessments of the courses. We also draw parallels with the objects-first and objects-later debate in CS1 education. We observe that teaching classical approaches adds value to student learning by building an intuitive understanding of NLP problems, potential solutions, and even deep learning models themselves. Despite classical approaches not being state-of-the-art, the paper makes a case for their inclusion in NLP courses today.

Abstract PDF HTML Upgrade to Chat

References (13)

Summary

The paper demonstrates that combining classical pre-neural methods with deep learning techniques builds strong foundational intuition in NLP education.
It compares course strategies in Australia and India, emphasizing gradual concept layering to reduce cognitive overload while enhancing comprehension.
Empirical findings suggest that classical statistical models improve student motivation and understanding before transitioning to advanced neural approaches.

The Value of Pre-Neural Approaches in NLP Education

Introduction

NLP continues to be dramatically influenced by Transformer-based models and other neural approaches. With these innovations, a critical analysis arises: Are pre-neural methods irrelevant in today's introductory NLP education? A recent study explores this debate by examining NLP courses in Australia and India. Let's break down their findings and what it means for current and future NLP educators and students.

Context of the Study

The study examines two NLP courses: one in Australia (NLP-UNSW) and one in India (NLP-IITB). NLP-UNSW represents a new course developed in 2024 with 60 students, while NLP-IITB has been running for 19 years with 150 students. Both courses target undergraduate and postgraduate students. However, a significant difference is that NLP-IITB has a follow-up course focusing on deep learning for NLP, unlike NLP-UNSW.

Parallels to Computer Science Education

Drawing parallels from the "objects-first vs. objects-later" debate in introductory programming education helps. The study suggests that, like object-oriented programming, diving straight into neural approaches in NLP might overwhelm students. Instead, incremental introduction through simpler, pre-neural methodologies can foster a more robust foundation:

Objects-first: Introduces complex concepts early (akin to neural-first in NLP).
Objects-later: Starts with foundational concepts before layering complexity (akin to pre-neural-first in NLP).

Empirical studies in programming have shown mixed results, advocating that the overall quality of instruction and the specific goals of a curriculum are more crucial than whether one starts with objects or procedural paradigms first. A similar perspective could benefit NLP education.

Textbooks and Course Content

A review of recent NLP textbooks shows a trend towards interleaving pre-neural and neural approaches, with nearly all texts discussing statistical models alongside neural methods. For example:

Jurafsky-Martin's "Speech and Language Processing" introduces fundamental algorithms, including statistical models, before diving into neural models like Transformers.
"Natural Language Processing" by Bhattacharyya and Joshi alternates between pre-neural and neural approaches, clearly demonstrating the evolution and significance of each.

Several university NLP courses also reflect this balanced approach. For instance, courses at UMass Amherst and NYU start with basic statistical models before introducing Transformers around the midway point.

Lecture Plans

Instructors at NLP-UNSW and NLP-IITB adopt varied strategies to balance both approaches:

NLP-UNSW: Uses a hybrid method where neural approaches are interwoven with pre-neural techniques. Early weeks focus on black-box models and probabilistic language modeling. Subsequent weeks introduce Transformers, followed by task-specific applications (e.g., sentiment analysis, named entity recognition) using both pre-neural and neural methods.
NLP-IITB: Begins with sequence labeling and probabilistic parsing using pre-neural methods before transitioning into neural approaches like Transformers and LLMs.

The interleaved teaching strategy at both institutions ensures that students grasp the complexity and nuances of NLP tasks, providing a solid foundation before diving into more advanced neural techniques.

Coding Assessments

Both courses incorporate individual and group coding assessments, but their approaches differ slightly:

NLP-UNSW: Emphasizes using pre-neural libraries (e.g., spacy matcher) in individual assignments and then combines this with neural methods (e.g., embeddings). Group projects are designed to ensure a mix of techniques, with credits assigned for problem definition, dataset selection, modeling, and evaluation.
NLP-IITB: Includes multiple individual assignments focusing on tasks like POS tagging and statistical parsing. Group projects are more flexible in topic choice, emphasizing the 'right' algorithm for the task.

Making the Case for Pre-Neural Approaches

The study highlights several reasons for maintaining pre-neural approaches in NLP curricula:

Intuition-Building: Understanding rule-based and statistical methods helps students appreciate the complexities and challenges in NLP.
Student Motivation: Pre-neural methods serve as a bridge to understand the necessity and efficiency of neural approaches.
Popular Classical Approaches: Methods like HMM and CRF remain effective for certain tasks, emphasizing the need for a foundational understanding.
Annotation: Linguistically sound pre-neural techniques (e.g., POS tagging) provide robust benchmarks for evaluating neural methods.
Cognitive Load Theory: Gradual introduction of concepts reduces cognitive overload, allowing for better assimilation of neural approaches later.

Conclusion

The blend of pre-neural and neural approaches in NLP education addresses diverse learning needs and prepares students comprehensively for both current practices and future innovations in the field. By balancing foundational and advanced methods, educators can cultivate a deeper understanding and allow students to navigate the complexities of NLP efficiently.

Markdown Report Issue