DEMOS: Dynamic Environment Motion Synthesis in 3D Scenes via Local Spherical-BEV Perception
Abstract: Motion synthesis in real-world 3D scenes has recently attracted much attention. However, the static environment assumption made by most current methods usually cannot be satisfied especially for real-time motion synthesis in scanned point cloud scenes, if multiple dynamic objects exist, e.g., moving persons or vehicles. To handle this problem, we propose the first Dynamic Environment MOtion Synthesis framework (DEMOS) to predict future motion instantly according to the current scene, and use it to dynamically update the latent motion for final motion synthesis. Concretely, we propose a Spherical-BEV perception method to extract local scene features that are specifically designed for instant scene-aware motion prediction. Then, we design a time-variant motion blending to fuse the new predicted motions into the latent motion, and the final motion is derived from the updated latent motions, benefitting both from motion-prior and iterative methods. We unify the data format of two prevailing datasets, PROX and GTA-IM, and take them for motion synthesis evaluation in 3D scenes. We also assess the effectiveness of the proposed method in dynamic environments from GTA-IM and Semantic3D to check the responsiveness. The results show our method outperforms previous works significantly and has great performance in handling dynamic environments.
- Z. Cao, H. Gao, K. Mangalam, Q.-Z. Cai, M. Vo, and J. Malik, “Long-term human motion prediction with scene context,” in European Conference on Computer Vision. Springer, 2020, pp. 387–404.
- J. Wang, H. Xu, J. Xu, S. Liu, and X. Wang, “Synthesizing long-term 3d human motion and interaction in 3d scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9401–9411.
- J. Wang, Y. Rong, J. Liu, S. Yan, D. Lin, and B. Dai, “Towards diverse and natural scene-aware 3d human motion synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 20 460–20 469.
- C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660.
- H. Thomas, C. R. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas, “Kpconv: Flexible and deformable convolution for point clouds,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6411–6420.
- S. Zhang, Y. Zhang, Q. Ma, M. J. Black, and S. Tang, “Place: Proximity learning of articulation and contact in 3d environments,” in 2020 International Conference on 3D Vision (3DV). IEEE, 2020, pp. 642–651.
- Y. Guo, H. Wang, Q. Hu, H. Liu, L. Liu, and M. Bennamoun, “Deep learning for 3d point clouds: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 12, pp. 4338–4364, 2020.
- M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, “Smpl: A skinned multi-person linear model,” ACM transactions on graphics (TOG), vol. 34, no. 6, pp. 1–16, 2015.
- G. Pavlakos, V. Choutas, N. Ghorbani, T. Bolkart, A. A. Osman, D. Tzionas, and M. J. Black, “Expressive body capture: 3d hands, face, and body from a single image,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 10 975–10 985.
- C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, “Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 7, pp. 1325–1339, 2013.
- I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese, “3d semantic parsing of large-scale indoor spaces,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1534–1543.
- A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5828–5839.
- A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matterport3d: Learning from rgb-d data in indoor environments,” arXiv preprint arXiv:1709.06158, 2017.
- N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black, “Amass: Archive of motion capture as surface shapes,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 5442–5451.
- Y. Ren, C. Zhao, Y. He, P. Cong, H. Liang, J. Yu, L. Xu, and Y. Ma, “Lidar-aid inertial poser: Large-scale human motion capture by sparse inertial and lidar sensors,” IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 5, pp. 2337–2347, 2023.
- Y. Zhang, M. Hassan, H. Neumann, M. J. Black, and S. Tang, “Generating 3d people in scenes without people,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 6194–6204.
- M. Hassan, P. Ghosh, J. Tesch, D. Tzionas, and M. J. Black, “Populating 3d scenes by learning human-scene interaction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 708–14 718.
- K. Zhao, S. Wang, Y. Zhang, T. Beeler, and S. Tang, “Compositional human-scene interaction synthesis with semantic control,” in European Conference on Computer Vision, 2022.
- S. Starke, H. Zhang, T. Komura, and J. Saito, “Neural state machine for character-scene interactions.” ACM Trans. Graph., vol. 38, no. 6, pp. 209–1, 2019.
- M. Hassan, D. Ceylan, R. Villegas, J. Saito, J. Yang, Y. Zhou, and M. J. Black, “Stochastic scene-aware motion prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 11 374–11 384.
- S. Zhang, Y. Zhang, F. Bogo, M. Pollefeys, and S. Tang, “Learning motion priors for 4d human body capture in 3d scenes,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 11 343–11 353.
- J. Xu, M. Wang, J. Gong, W. Liu, C. Qian, Y. Xie, and L. Ma, “Exploring versatile prior for human motion via motion frequency guidance,” in 2021 International Conference on 3D Vision (3DV). IEEE, 2021, pp. 606–616.
- D. Holden, T. Komura, and J. Saito, “Phase-functioned neural networks for character control,” ACM Transactions on Graphics (TOG), vol. 36, no. 4, pp. 1–13, 2017.
- M. Hassan, V. Choutas, D. Tzionas, and M. J. Black, “Resolving 3d human pose ambiguities with 3d scene constraints,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 2282–2292.
- Y. Shen, C. Feng, Y. Yang, and D. Tian, “Mining point cloud local structures by kernel correlation and graph pooling,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4548–4557.
- W. Wu, Z. Qi, and L. Fuxin, “Pointconv: Deep convolutional networks on 3d point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9621–9630.
- H. Lei, N. Akhtar, and A. Mian, “Spherical kernel for efficient graph convolution on 3d point clouds,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 10, pp. 3664–3680, 2020.
- J. Gong, J. Xu, X. Tan, H. Song, Y. Qu, Y. Xie, and L. Ma, “Omni-supervised point cloud segmentation via gradual receptive field component reasoning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11 673–11 682.
- J. Gong, J. Xu, X. Tan, J. Zhou, Y. Qu, Y. Xie, and L. Ma, “Boundary-aware geometric encoding for semantic segmentation of point clouds,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 2, 2021, pp. 1424–1432.
- Y. You, Y. Lou, R. Shi, Q. Liu, Y.-W. Tai, L. Ma, W. Wang, and C. Lu, “Prin/sprin: On extracting point-wise rotation invariant features,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 12, pp. 9489–9502, 2021.
- Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni, and A. Markham, “Learning semantic segmentation of large-scale point clouds with random sampling,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 8338–8354, 2021.
- S. Prokudin, C. Lassner, and J. Romero, “Efficient learning on point clouds with basis point sets,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 4332–4341.
- H. Yi, C.-H. P. Huang, S. Tripathi, L. Hering, J. Thies, and M. J. Black, “Mime: Human-aware 3d scene generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12 965–12 976.
- J. F. Mullen, D. Kothandaraman, A. Bera, and D. Manocha, “Placing human animations into 3d scenes by learning interaction-and geometry-driven keyframes,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 300–310.
- A. Alahi, V. Ramanathan, and L. Fei-Fei, “Socially-aware large-scale crowd forecasting,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2203–2210.
- A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese, “Social lstm: Human trajectory prediction in crowded spaces,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 961–971.
- Z. Wang, J. Chai, and S. Xia, “Combining recurrent neural networks and adversarial training for human motion synthesis and control,” IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 1, pp. 14–28, 2019.
- M. Petrovich, M. J. Black, and G. Varol, “Action-conditioned 3d human motion synthesis with transformer vae,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 985–10 995.
- B. Parsaeifard, S. Saadatnejad, Y. Liu, T. Mordan, and A. Alahi, “Learning decoupled representations for human pose forecasting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, 2021, pp. 2294–2303.
- D. Holden, O. Kanoun, M. Perepichka, and T. Popa, “Learned motion matching,” ACM Transactions on Graphics (TOG), vol. 39, no. 4, pp. 53–1, 2020.
- S. I. Park, H. J. Shin, and S. Y. Shin, “On-line locomotion generation based on motion blending,” in Proceedings of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animation, 2002, pp. 105–111.
- J. Xu, H. Xu, B. Ni, X. Yang, X. Wang, and T. Darrell, “Hierarchical style-based networks for motion synthesis,” in European conference on computer vision. Springer, 2020, pp. 178–194.
- B. Wandt, H. Ackermann, and B. Rosenhahn, “3d reconstruction of human motion from monocular image sequences,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 8, pp. 1505–1516, 2016.
- W. Straßer, “Schnelle kurven-und flächendarstellung auf grafischen sichtgeräten,” Ph.D. dissertation, 1974.
- J. Gong, F. Liu, J. Xu, M. Wang, X. Tan, Z. Zhang, R. Yi, H. Song, Y. Xie, and L. Ma, “Optimization over disentangled encoding: Unsupervised cross-domain point cloud completion via occlusion factor manipulation,” in European Conference on Computer Vision, 2022.
- Y.-C. Su and K. Grauman, “Kernel transformer networks for compact spherical convolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9442–9451.
- K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” Advances in neural information processing systems, vol. 28, 2015.
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
- Y. Zhou, C. Barnes, J. Lu, J. Yang, and H. Li, “On the continuity of rotation representations in neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5745–5753.
- M. Contributors, “Openmmlab 3d human parametric model toolbox and benchmark,” https://github.com/open-mmlab/mmhuman3d, 2021.
- Q.-Y. Zhou, J. Park, and V. Koltun, “Open3D: A modern library for 3D data processing,” arXiv:1801.09847, 2018.
- J. Wang, S. Yan, B. Dai, and D. Lin, “Scene-aware generative network for human motion synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 12 206–12 215.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- A. Abid, A. Abdalla, A. Abid, D. Khan, A. Alfozan, and J. Zou, “Gradio: Hassle-free sharing and testing of ml models in the wild,” arXiv preprint arXiv:1906.02569, 2019.
- T. Hackel, N. Savinov, L. Ladicky, J. D. Wegner, K. Schindler, and M. Pollefeys, “Semantic3d. net: A new large-scale point cloud classification benchmark,” arXiv preprint arXiv:1704.03847, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.