Papers
Topics
Authors
Recent
Search
2000 character limit reached

Unlocking the Power of GANs in Non-Autoregressive Text Generation

Published 6 May 2023 in cs.CL | (2305.03977v3)

Abstract: Generative Adversarial Networks (GANs) have been studied in text generation to tackle the exposure bias problem. Despite their remarkable development, they adopt autoregressive structures so suffering from high latency in both training and inference stages. Although GANs have potential to support efficient generation by adopting non-autoregressive (NAR) structures, their explorations in NAR models are extremely limited. In this work, we conduct pioneering study of building language GANs based on NAR structures. We identify two issues that constrain the performance of GAN-based NAR models. Firstly, existing methods of incorporating latent variables provide highly similar representations which cannot describe the diversity of different words in sentences. We tackle this problem by proposing Position-Aware Self-Modulation, providing more diverse and effective representations. Secondly, the attention mechanism in Transformer cannot accurately build word dependencies in the unstable training of GANs, and we adopt Dependency Feed Forward Network to enhance the model capacity in dependency modeling. Armed with these two facilities, we propose a GAN-based NAR model, Adversarial Non-autoregressive Transformer (ANT). The experimental results demonstrate that ANT can achieve comparable performance with mainstream models in a single forward pass and has great potential in various applications like latent interpolation and semi-supervised learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
  2. EV-GAN: simulation of extreme events with relu neural networks. J. Mach. Learn. Res., 23:150:1–150:39, 2022.
  3. Wasserstein generative adversarial networks. In ICML, pages 214–223, 2017.
  4. Estimating or propagating gradients through stochastic neurons for conditional computation. CoRR, abs/1308.3432, 2013.
  5. Some theoretical insights into wasserstein gans. J. Mach. Learn. Res., 22:119:1–119:45, 2021.
  6. A relaxed inertial forward-backward-forward algorithm for solving monotone inclusions with application to gans. J. Mach. Learn. Res., 24:8:1–8:37, 2023.
  7. Language gans falling short. In ICLR, 2020.
  8. Universal sentence encoder for english. In EMNLP, 2018. doi: 10.18653/v1/d18-2029.
  9. Maximum-likelihood augmented discrete generative adversarial networks. CoRR, abs/1702.07983, 2017.
  10. On self modulation for generative adversarial networks. In ICLR, 2019.
  11. Training language gans from scratch. In NeurIPS, pages 4302–4313, 2019.
  12. BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, pages 4171–4186, 2019. doi: 10.18653/v1/n19-1423.
  13. Diffusion models beat gans on image synthesis. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, NeurIPS, pages 8780–8794, 2021.
  14. Order-agnostic cross entropy for non-autoregressive machine translation. In ICML, pages 2849–2859, 2021.
  15. Hierarchical neural story generation. In ACL, pages 889–898, 2018. doi: 10.18653/v1/P18-1082.
  16. Maskgan: Better text generation via filling in the _______. In ICLR, 2018.
  17. Mask-predict: Parallel decoding of conditional masked language models. In EMNLP-IJCNLP, pages 6111–6120, 2019. doi: 10.18653/v1/D19-1633.
  18. Generative adversarial networks. CoRR, abs/1406.2661, 2014.
  19. Non-autoregressive neural machine translation. In ICLR, 2018.
  20. Bridging nonlinearities and stochastic regularizers with gaussian error linear units. CoRR, abs/1606.08415, 2016.
  21. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, pages 6626–6637, 2017.
  22. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  23. A text {gan} for language generation with non-autoregressive generator, 2021.
  24. On the learning of non-autoregressive transformers. In ICML, pages 9356–9376, 2022a.
  25. Directed acyclic transformer for non-autoregressive machine translation. In ICML, pages 9410–9428, 2022b.
  26. Improving non-autoregressive translation models without distillation. In ICLR, 2022c.
  27. Categorical reparameterization with gumbel-softmax. In ICLR, 2017.
  28. Analyzing and improving the image quality of stylegan. In CVPR, pages 8107–8116, 2020. doi: 10.1109/CVPR42600.2020.00813.
  29. CTRL: A conditional transformer language model for controllable generation. CoRR, abs/1909.05858, 2019.
  30. Sequence-level knowledge distillation. In EMNLP, pages 1317–1327, 2016. doi: 10.18653/v1/d16-1139.
  31. Adam: A method for stochastic optimization. In ICLR, 2015.
  32. Auto-encoding variational bayes. In ICLR, 2014.
  33. Vitgan: Training gans with vision transformers. CoRR, abs/2107.04589, 2021.
  34. Diffusion-lm improves controllable text generation. In NeurIPS, 2022.
  35. Taylorgan: Neighbor-augmented policy update for sample-efficient natural language generation. CoRR, abs/2011.13527, 2020.
  36. Adversarial ranking for language generation. In NeurIPS, pages 3155–3165, 2017.
  37. Microsoft COCO: common objects in context. In ECCV, pages 740–755, 2014. doi: 10.1007/978-3-319-10602-1_48.
  38. Learning non-autoregressive models from search for unsupervised sentence summarization. In ACL, pages 7916–7929, 2022.
  39. Decoupled weight decay regularization. In ICLR, 2019.
  40. Relgan: Relational generative adversarial networks for text generation. In ICLR, 2019.
  41. Bleu: a method for automatic evaluation of machine translation. In ACL, pages 311–318, 2002. doi: 10.3115/1073083.1073135.
  42. On the regularization of wasserstein gans. In ICLR, 2018.
  43. Da Ren and Qing Li. Initialgan: A language gan with completely random initialization. CoRR, abs/2208.02531, 2022. doi: 10.48550/arXiv.2208.02531.
  44. Stylegan-xl: Scaling stylegan to large diverse datasets. In Munkhtsetseg Nandigjav, Niloy J. Mitra, and Aaron Hertzmann, editors, SIGGRAPH, pages 49:1–49:10, 2022. doi: 10.1145/3528233.3530738.
  45. Attention is all you need. In NeurIPS, pages 5998–6008, 2017.
  46. Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn., 8:229–256, 1992. doi: 10.1007/BF00992696.
  47. Variational autoencoder for semi-supervised text classification. In AAAI, pages 3358–3364, 2017.
  48. Seqgan: Sequence generative adversarial nets with policy gradient. In AAAI, pages 2852–2858, 2017.
  49. Texygen: A benchmarking platform for text generation models. In SIGIR, pages 1097–1100, 2018. doi: 10.1145/3209978.3210080.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.