Papers
Topics
Authors
Recent
Search
2000 character limit reached

Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment

Published 7 Nov 2023 in cs.CL | (2311.03792v1)

Abstract: The International Phonetic Alphabet (IPA) is indispensable in language learning and understanding, aiding users in accurate pronunciation and comprehension. Additionally, it plays a pivotal role in speech therapy, linguistic research, accurate transliteration, and the development of text-to-speech systems, making it an essential tool across diverse fields. Bangla being 7th as one of the widely used languages, gives rise to the need for IPA in its domain. Its IPA mapping is too diverse to be captured manually giving the need for Artificial Intelligence and Machine Learning in this field. In this study, we have utilized a transformer-based sequence-to-sequence model at the letter and symbol level to get the IPA of each Bangla word as the variation of IPA in association of different words is almost null. Our transformer model only consisted of 8.5 million parameters with only a single decoder and encoder layer. Additionally, to handle the punctuation marks and the occurrence of foreign languages in the text, we have utilized manual mapping as the model won't be able to learn to separate them from Bangla words while decreasing our required computational resources. Finally, maintaining the relative position of the sentence component IPAs and generation of the combined IPA has led us to achieve the top position with a word error rate of 0.10582 in the public ranking of DataVerse Challenge - ITVerse 2023 (https://www.kaggle.com/competitions/dataverse_2023/).

Summary

  • The paper introduces a character-level transformer model for Bangla text-to-IPA transcription, achieving a WER of 0.10582.
  • The model leverages sequence alignment and extensive preprocessing to handle punctuation, foreign words, and numerals effectively.
  • The study demonstrates significant improvements in transcription accuracy, offering practical benefits for NLP and speech technology applications.

Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment

Introduction

This paper presents a study on Bangla text-to-IPA transcription utilizing a transformer-based sequence-to-sequence model. Recognizing the phonetic intricacies of Bangla, one of the most widely spoken languages globally, the authors focus on enhancing the transcription accuracy by leveraging advanced ML and AI methodologies. The traditional IPA mapping, crucial in various linguistic and technological contexts, is augmented in this study through an innovative use of transformer architecture. The paper targets both theoretical enhancements and practical applications, aiming to contribute to diverse fields such as language learning, speech therapy, and the development of text-to-speech systems.

Methodology

Dataset and Preprocessing

The dataset for this study derives from the DataVerse Challenge - ITVerse 2023, consisting of Bangla text and their corresponding IPA transcriptions. The training dataset includes 21,999 samples, and the test dataset contains 27,228 samples. A thorough analysis identified unique characters and handled variations between text and IPA alignments, focusing on optimizing the training process through data-driven insights. The dataset's character-level details, such as the histogram of word counts, were meticulously evaluated to refine the model's input. Figure 1

Figure 1: Word count histogram of training dataset

Model Architecture

The study employs a simplified transformer model with a single encoder and decoder layer, consisting of 8.5 million parameters. The model's design is tailored to the task's requirements, focusing on character-level transcription to accommodate the high variance in Bangla's phonetic representation. Extensive preprocessing augments the model's efficacy by handling punctuation marks, foreign words, and numerals, optimizing it for practical applications while minimizing computational overhead.

Training and Inference

The model is trained with a focus on enhancing accuracy and reducing the word error rate (WER), achieving a top position in the public leaderboard of the DataVerse Challenge with a WER of 0.10582. Training involves a detailed tuning of hyperparameters, with a focus on stability and performance across varied data subsets. The inference process incorporates a dictionary for efficient IPA mapping, leveraging previously computed results to improve speed and resource utilization.

Results and Analysis

The model's performance is evaluated through iterative enhancements, addressing challenges specific to Bangla text, such as handling punctuation and foreign language integration. The study reports significant reductions in WER, showcasing the effectiveness of various preprocessing and handling strategies. Comparative results against baseline and enhanced models demonstrate the system's robustness and potential for real-world application. Figure 2

Figure 2: Architecture of our system

Conclusion

The research illuminates the potential of transformer architectures in handling complex phonetic transcription tasks. By focusing on the Bangla language, the study not only advances linguistic research but also highlights the broader applicability of such models in NLP tasks with similar challenges. Future work could expand the dataset to include more phonetic variations and explore more sophisticated model architectures or hybrid approaches, contributing further to the field's development. The study's findings have significant implications for AI-driven linguistic tools, offering a pathway toward more accurate and efficient systems.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.