Contrastive Feedback Mechanism for Simultaneous Speech Translation
Abstract: Recent advances in simultaneous speech translation (SST) focus on the decision policies that enable the use of offline-trained ST models for simultaneous inference. These decision policies not only control the quality-latency trade-off in SST but also mitigate the impact of unstable predictions on translation quality by delaying translation for more context or discarding these predictions through stable hypothesis detection. However, these policies often overlook the potential benefits of utilizing unstable predictions. We introduce the contrastive feedback mechanism (CFM) for SST, a novel method that leverages these unstable predictions as feedback to improve translation quality. CFM guides the system to eliminate undesired model behaviors from these predictions through a contrastive objective. The experiments on 3 state-of-the-art decision policies across 8 languages in the MuST-C v1.0 dataset show that CFM effectively improves the performance of SST.
- D. Liu, G. Spanakis, and J. Niehues, “Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection,” in Proc. Interspeech, 2020, pp. 3620–3624.
- Y. Ren, J. Liu, X. Tan, C. Zhang, T. Qin, Z. Zhao, and T.-Y. Liu, “Simulspeech: End-to-end simultaneous speech to text translation,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 3787–3796.
- X. Ma, J. Pino, and P. Koehn, “SimulMT to SimulST: Adapting simultaneous text translation to end-to-end simultaneous speech translation,” in Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, Dec. 2020, pp. 582–587.
- M. A. Zaidi, B. Lee, S. Kim, and C. Kim, “Decision attentive regularization to improve simultaneous speech translation systems,” arXiv preprint arXiv:2110.15729, 2021.
- D. Liu, M. Du, X. Li, Y. Hu, and L. Dai, “The USTC-NELSLIP systems for simultaneous speech translation task at IWSLT 2021,” in Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021), Aug. 2021, pp. 30–38.
- S. Papi, M. Gaido, M. Negri, and M. Turchi, “Does simultaneous speech translation need simultaneous models?” in Findings of the Association for Computational Linguistics: EMNLP, Dec. 2022, pp. 141–153.
- M. Ma, L. Huang, H. Xiong, R. Zheng, K. Liu, B. Zheng, C. Zhang, Z. He, H. Liu, X. Li, H. Wu, and H. Wang, “STACL: Simultaneous translation with implicit anticipation and controllable latency using prefix-to-prefix framework,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Jul. 2019, pp. 3025–3036.
- X. Zeng, L. Li, and Q. Liu, “RealTranS: End-to-end simultaneous speech translation with convolutional weighted-shrinking transformer,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP, Aug. 2021, pp. 2461–2474.
- P. Polák, N.-Q. Pham, T. N. Nguyen, D. Liu, C. Mullov, J. Niehues, O. Bojar, and A. Waibel, “CUNI-KIT system for simultaneous speech translation task at IWSLT 2022,” in Proceedings of the 19th International Conference on Spoken Language Translation, May 2022, pp. 277–285.
- A. Antonios, B. Loc, L. Bentivogli, M. Z. Boito, B. Ondřej, R. Cattoni, C. Anna, D. Georgiana, D. Kevin, E. Maha et al., “Findings of the iwslt 2022 evaluation campaign.” in Proceedings of the 19th International Conference on Spoken Language Translation. Association for Computational Linguistics, 2022, pp. 98–157.
- T.-S. Nguyen, S. Stüker, and A. Waibel, “Super-Human Performance in Online Low-Latency Recognition of Conversational Speech,” in Proc. Interspeech, 2021, pp. 1762–1766.
- S. Papi, M. Negri, and M. Turchi, “Attention as a guide for simultaneous speech translation,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul. 2023, pp. 13 340–13 356.
- S. Papi, M. Turchi, and M. Negri, “AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation,” in Proc. INTERSPEECH, 2023, pp. 3974–3978.
- X. L. Li, A. Holtzman, D. Fried, P. Liang, J. Eisner, T. Hashimoto, L. Zettlemoyer, and M. Lewis, “Contrastive decoding: Open-ended text generation as optimization,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul. 2023, pp. 12 286–12 312.
- R. Cattoni, M. A. Di Gangi, L. Bentivogli, M. Negri, and M. Turchi, “Must-c: A multilingual corpus for end-to-end speech translation,” Computer Speech & Language, vol. 66, p. 101155, 2021.
- A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and R. Pang, “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proc. Interspeech, 2020, pp. 5036–5040.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- C. Wang, Y. Tang, X. Ma, A. Wu, D. Okhonko, and J. Pino, “Fairseq S2T: Fast speech-to-text modeling with fairseq,” in Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: System Demonstrations, 2020, pp. 33–39.
- M. Ott, S. Edunov, A. Baevski, A. Fan, S. Gross, N. Ng, D. Grangier, and M. Auli, “fairseq: A fast, extensible toolkit for sequence modeling,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Jun. 2019, pp. 48–53.
- M. Post, “A call for clarity in reporting BLEU scores,” in Proceedings of the Third Conference on Machine Translation: Research Papers, Oct. 2018, pp. 186–191.
- X. Ma, M. J. Dousti, C. Wang, J. Gu, and J. Pino, “SIMULEVAL: An evaluation toolkit for simultaneous translation,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Oct. 2020, pp. 144–150.
- S. Papi, M. Gaido, M. Negri, and M. Turchi, “Over-generation cannot be rewarded: Length-adaptive average lagging for simultaneous speech translation,” in Proceedings of the Third Workshop on Automatic Simultaneous Translation, Jul. 2022, pp. 12–17.
- Q. Fang, R. Ye, L. Li, Y. Feng, and M. Wang, “STEMM: Self-learning with speech-text manifold mixup for speech translation,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), May 2022, pp. 7050–7062.
- H. Inaguma, S. Kiyono, K. Duh, S. Karita, N. Yalta, T. Hayashi, and S. Watanabe, “ESPnet-ST: All-in-one speech translation toolkit,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020, pp. 302–311.
- L. Ferrer and P. Riera, “Confidence intervals for evaluation in machine learning.” [Online]. Available: https://github.com/luferrer/ConfidenceIntervals
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.