Papers
Topics
Authors
Recent
Search
2000 character limit reached

LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model

Published 18 Mar 2024 in cs.CV | (2403.11656v2)

Abstract: Previous work has shown that well-crafted adversarial perturbations can threaten the security of video recognition systems. Attackers can invade such models with a low query budget when the perturbations are semantic-invariant, such as StyleFool. Despite the query efficiency, the naturalness of the minutia areas still requires amelioration, since StyleFool leverages style transfer to all pixels in each frame. To close the gap, we propose LocalStyleFool, an improved black-box video adversarial attack that superimposes regional style-transfer-based perturbations on videos. Benefiting from the popularity and scalably usability of Segment Anything Model (SAM), we first extract different regions according to semantic information and then track them through the video stream to maintain the temporal consistency. Then, we add style-transfer-based perturbations to several regions selected based on the associative criterion of transfer-based gradient information and regional area. Perturbation fine adjustment is followed to make stylized videos adversarial. We demonstrate that LocalStyleFool can improve both intra-frame and inter-frame naturalness through a human-assessed survey, while maintaining competitive fooling rate and query efficiency. Successful experiments on the high-resolution dataset also showcase that scrupulous segmentation of SAM helps to improve the scalability of adversarial attacks under high-resolution data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. A survey of autonomous driving: Common practices and emerging technologies. IEEE access, 8:58443–58469, 2020.
  2. Iot wearable sensor and deep learning: An integrated approach for personalized human activity recognition in a smart home environment. IEEE Internet of Things Journal, 6(5):8553–8562, 2019.
  3. Haptic-feedback smart glove as a creative human-machine interface (hmi) for virtual/augmented reality applications. Science Advances, 6(19):eaaz8693, 2020.
  4. Intriguing properties of neural networks. In Proceedings of the International Conference on Learning Representations, 2014.
  5. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations, 2015.
  6. Sparse adversarial perturbations for videos. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 33, pages 8973–8980, 2019.
  7. Black-box adversarial attacks on video recognition models. In Proceedings of the 27th ACM International Conference on Multimedia, pages 864–872, 2019.
  8. Stealthy adversarial perturbations against real-time video classification systems. In Proceedings of the Symposium on Network and Distributed Systems Security (NDSS), 2019.
  9. Adversarial attacks on black box video classifiers: Leveraging the power of geometric transformations. In Advances in Neural Information Processing Systems, volume 34, pages 2085–2096, 2021.
  10. Attacking video recognition models with bullet-screen comments. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 312–320, 2022.
  11. Efficient decision-based black-box patch attacks on video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4379–4389, 2023.
  12. Stylefool: Fooling video classification systems via style transfer. In 2023 IEEE Symposium on Security and Privacy (SP), pages 1631–1648. IEEE, 2023.
  13. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  14. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
  15. Pytorch library for grad-cam. https://github.com/jacobgil/pytorch-grad-cam.
  16. A short note on the kinetics-700 human action dataset. arXiv preprint arXiv:1907.06987, 2019.
  17. Imperceptible adversarial attack with multi-granular spatio-temporal attention for video action recognition. IEEE Internet of Things Journal, 2023.
  18. Heuristic black-box adversarial attacks on video recognition models. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 34, pages 12338–12345, 2020.
  19. Adaptive temporal grouping for black-box adversarial attacks on videos. In Proceedings of the 2022 International Conference on Multimedia Retrieval, pages 587–593, 2022.
  20. Reinforcement learning based sparse black-box adversarial attack on video recognition models. In Proceedings of International Joint Conference on Artificial Intelligence, 2021.
  21. Efficient sparse attacks on videos using reinforcement learning. In Proceedings of the 29th ACM International Conference on Multimedia, pages 2326–2334, 2021.
  22. Sparse black-box video attack with reinforcement learning. International Journal of Computer Vision, 130(6):1459–1473, 2022.
  23. Efficient robustness assessment via adversarial spatial-temporal focus on videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  24. Breaking temporal consistency: Generating video universal adversarial perturbations using image models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4325–4334, 2023.
  25. Global-local characteristic excited cross-modal attacks from images to videos. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 2635–2643, 2023.
  26. Logostylefool: Vitiating video recognition systems via logo style transfer. In 38th AAAI Conference on Artificial Intelligence. AAAI, 2024.
  27. Over-the-air adversarial flickering attacks against video recognition networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
  28. Appending adversarial frames for universal video attack. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3199–3208, 2021.
  29. 3d convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, 35(1):221–231, 2012.
  30. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017.
  31. Segment anything model for medical image analysis: an experimental study. Medical Image Analysis, 89:102918, 2023.
  32. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016.
  33. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  34. Very deep convolutional networks for large-scale image recognition. Proceedings of the International Conference on Learning Representations, 2015.
  35. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV), 2016.
  36. Deep photo style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4990–4998, 2017.
  37. A closed-form solution to natural image matting. IEEE transactions on pattern analysis and machine intelligence, 30(2):228–242, 2007.
  38. Black-box adversarial attacks with limited queries and information. In Proceedings of the 35th International Conference on Machine Learning, pages 2137–2146. PMLR, 2018.
  39. Towards deep learning models resistant to adversarial attacks. In Proceedings of the International Conference on Learning Representations, 2018.
  40. Track anything: Segment anything meets videos. arXiv preprint arXiv:2304.11968, 2023.
  41. Sparse adversarial attack for video via gradient-based keyframe selection. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2874–2878. IEEE, 2022.
  42. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.
  43. Hmdb: a large video database for human motion recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2556–2563. IEEE, 2011.
  44. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4489–4497, 2015.
  45. Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 6546–6555, 2018.
  46. 3d resnets for action recognition. https://github.com/kenshohara/3D-ResNets-PyTorch/.
  47. Amazon mechanical turk. https://www.mturk.com.
  48. Basar: black-box attack on skeletal action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7597–7607, 2021.
  49. Rensis Likert. A technique for the measurement of attitudes. Archives of psychology, 1932.
  50. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8934–8943, 2018.
  51. Universal 3-dimensional perturbations for black-box attacks on video recognition systems. In Proceedings of the IEEE Symposium on Security and Privacy, 2022.
  52. Certifying some distributional fairness with subpopulation decomposition. Advances in Neural Information Processing Systems, 35:31045–31058, 2022.
  53. Comdefend: An efficient image compression model to defend adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6084–6092, 2019.
  54. Advit: Adversarial frames identifier based on temporal consistency in videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3968–3977, 2019.
  55. Certified adversarial robustness via randomized smoothing. In Proceedings of the 36th International Conference on Machine Learning, pages 1310–1320. PMLR, 2019.
  56. Instruct2attack: Language-guided semantic adversarial attacks. arXiv preprint arXiv:2311.15551, 2023.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.