Papers
Topics
Authors
Recent
Search
2000 character limit reached

DeMPT: Decoding-enhanced Multi-phase Prompt Tuning for Making LLMs Be Better Context-aware Translators

Published 23 Feb 2024 in cs.CL | (2402.15200v2)

Abstract: Generally, the decoder-only LLMs are adapted to context-aware neural machine translation (NMT) in a concatenating way, where LLMs take the concatenation of the source sentence (i.e., intra-sentence context) and the inter-sentence context as the input, and then to generate the target tokens sequentially. This adaptation strategy, i.e., concatenation mode, considers intra-sentence and inter-sentence contexts with the same priority, despite an apparent difference between the two kinds of contexts. In this paper, we propose an alternative adaptation approach, named Decoding-enhanced Multi-phase Prompt Tuning (DeMPT), to make LLMs discriminately model and utilize the inter- and intra-sentence context and more effectively adapt LLMs to context-aware NMT. First, DeMPT divides the context-aware NMT process into three separate phases. During each phase, different continuous prompts are introduced to make LLMs discriminately model various information. Second, DeMPT employs a heuristic way to further discriminately enhance the utilization of the source-side inter- and intra-sentence information at the final decoding phase. Experiments show that our approach significantly outperforms the concatenation method, and further improves the performance of LLMs in discourse modeling.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Target-side augmentation for document-level machine translation. In Proceedings of ACL, pages 10725–10742.
  2. G-transformer for document-level machine translation. In Proceedings of ACL, pages 3442–3455.
  3. BigScience. 2022. Bloom: A 176b-parameter open-access multilingual language model. Computing Research Repository, arXiv:2211.05100.
  4. Google. 2022. Palm: Scaling language modeling with pathways. J. Mach. Learn. Res., 24:240:1–240:113.
  5. Improving evaluation of document-level machine translation quality estimation. In Proceedings of EACL, pages 356–361.
  6. LoRA: Low-rank adaptation of large language models. In Proceedings of ICLR.
  7. Xinyu Hu and Xiaojun Wan. 2023. Exploring discourse structure in document-level machine translation. In Proceedings of EMNLP, pages 13889–13902.
  8. Does neural machine translation benefit from larger context? Computing Research Repository, arXiv:1704.05135.
  9. BlonDe: An automatic evaluation metric for document-level machine translation. In Proceedings of NAACL, pages 1550–1565, Seattle, United States.
  10. Dynamic context selection for document-level neural machine translation via reinforcement learning. In Proceedings of EMNLP, pages 2242–2254.
  11. Marzena Karpinska and Mohit Iyyer. 2023. Large language models effectively leverage document-level context for literary translation, but critical errors persist. In Proceedings of WMT, pages 419–451.
  12. Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP, pages 388–395.
  13. Attention focusing for neural machine translation by bridging source and target embeddings. In Proceedings of ACL, pages 1767–1776.
  14. The power of scale for parameter-efficient prompt tuning. In Proceedings of EMNLP, pages 3045–3059.
  15. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of ACL-IJCNLP, pages 4582–4597, Online.
  16. P-Transformer: Towards Better Document-to-Document Neural Machine Translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:3859–3870.
  17. Enhancing document-level translation of large language model via translation mixed-instructions. Computing Research Repository, arXiv:2401.08088.
  18. P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of ACL, pages 61–68.
  19. Gpt understands, too. Computing Research Repository, arXiv:2103.10385.
  20. Encouraging lexical translation consistency for document-level neural machine translation. In Proceedings of EMNLP, pages 3265–3277.
  21. Modeling consistency preference via lexical chains for document-level neural machine translation. In Proceedings of EMNLP, pages 6312–6326.
  22. Selective attention for context-aware neural machine translation. In Proceedings of NAACL, pages 3092–3102.
  23. MetaAI. 2023a. Llama 2: Open foundation and fine-tuned chat models. Computing Research Repository, arXiv:2307.09288.
  24. MetaAI. 2023b. Llama: Open and efficient foundation language models. ArXiv, abs/2302.13971.
  25. Document-level neural machine translation with hierarchical attention networks. In Proceedings of EMNLP, pages 2947–2954.
  26. OpenAI. 2023. Gpt-4 technical report. Computing Research Repository, arXiv:2303.08774.
  27. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of NAACL-HLT: Demonstrations.
  28. Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of WMT, pages 186–191.
  29. Zero: Memory optimizations toward training trillion parameter models. In SC20: Proceedings of High Performance Computing, Networking, Storage and Analysis, pages 1–16.
  30. COMET: A neural framework for MT evaluation. In Proceedings of EMNLP, pages 2685–2702.
  31. Rethinking document-level neural machine translation. In Findings of ACL, pages 3537–3548.
  32. MSP: Multi-stage prompting for making pre-trained language models better translators. In Proceedings of ACL, pages 6131–6142.
  33. Attention is all you need. In Proceedings of NIPS, pages 5998–6008.
  34. Context-aware monolingual repair for neural machine translation. In Proceedings of EMNLP-IJCNLP, pages 877–886.
  35. When a good translation is wrong in context: Context-aware machine translation improves on deixis, ellipsis, and lexical cohesion. In Proceedings of ACL, pages 1198–1212.
  36. Context-aware neural machine translation learns anaphora resolution. In Proceedings of ACL, pages 1264–1274.
  37. Document-level machine translation with large language models. In Proceedings of EMNLP, pages 16646–16661.
  38. One model to learn both: Zero pronoun prediction and translation. In Proceedings of EMNLP-IJCNLP, pages 921–930.
  39. Exploiting cross-sentence context for neural machine translation. In Proceedings of EMNLP, pages 2826–2831.
  40. Adapting large language models for document-level machine translation. Computing Research Repository, arXiv:2401.06468.
  41. Yangjian Wu and Gang Hu. 2023. Exploring prompt engineering with GPT language models for document-level machine translation: Insights and findings. In Proceedings of the Eighth Conference on Machine Translation, pages 166–169.
  42. Improving the transformer translation model with document-level context. In Proceedings of EMNLP, pages 533–542.
Citations (1)

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.