Papers
Topics
Authors
Recent
Search
2000 character limit reached

Autoregressive Sign Language Production: A Gloss-Free Approach with Discrete Representations

Published 21 Sep 2023 in cs.CV | (2309.12179v2)

Abstract: Gloss-free Sign Language Production (SLP) offers a direct translation of spoken language sentences into sign language, bypassing the need for gloss intermediaries. This paper presents the Sign language Vector Quantization Network, a novel approach to SLP that leverages Vector Quantization to derive discrete representations from sign pose sequences. Our method, rooted in both manual and non-manual elements of signing, supports advanced decoding methods and integrates latent-level alignment for enhanced linguistic coherence. Through comprehensive evaluations, we demonstrate superior performance of our method over prior SLP methods and highlight the reliability of Back-Translation and Fr\'echet Gesture Distance as evaluation metrics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. “Considerations for meaningful sign language machine translation based on glosses,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023, pp. 682–693.
  2. “Gloss-free end-to-end sign language translation,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023, pp. 12904–12916.
  3. “Sign language video retrieval with free-form textual queries,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 14094–14104.
  4. “Signing at scale: Learning to co-articulate signs for large-scale photo-realistic sign language production,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5141–5151.
  5. “Cico: Domain-aware sign language retrieval via cross-lingual contrastive learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19016–19026.
  6. “Progressive transformers for end-to-end sign language production,” in Proceedings of the 16th European Conference on Computer Vision. Springer, 2020, pp. 687–705.
  7. “Mixed signals: Sign language production via a mixture of motion primitives,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1919–1929.
  8. “Non-autoregressive sign language production with gaussian space,” in Proceedings of the 32nd British Machine Vision Conference, 2021, pp. 22–25.
  9. “Towards fast and high-quality sign language production,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 3172–3181.
  10. “Adversarial training for multi-channel sign language production,” in Proceedings of the 31st British Machine Vision Conference, 2020, pp. 7–10.
  11. “The concrete distribution: A continuous relaxation of discrete random variables,” arXiv preprint arXiv:1611.00712, 2016.
  12. “Categorical reparameterization with gumbel-softmax,” arXiv preprint arXiv:1611.01144, 2016.
  13. “Neural discrete representation learning,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  14. “Zero-shot text-to-image generation,” in International Conference on Machine Learning, 2021, pp. 8821–8831.
  15. “Convolutional sequence generation for skeleton-based action synthesis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 4394–4402.
  16. “wav2vec 2.0: A framework for self-supervised learning of speech representations,” Advances in Neural Information Processing Systems, vol. 33, pp. 12449–12460, 2020.
  17. “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019.
  18. “Neural sign language translation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7784–7793.
  19. “How2sign: A large-scale multimodal dataset for continuous american sign language,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2735–2744.
  20. “Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity,” ACM Transactions on Graphics (TOG), vol. 39, no. 6, pp. 1–16, 2020.
  21. “Ham2pose: Animating sign language notation into pose sequences,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21046–21056.
Citations (4)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.