Towards Gradient-based Time-Series Explanations through a SpatioTemporal Attention Network
Abstract: In this paper, we explore the feasibility of using a transformer-based, spatiotemporal attention network (STAN) for gradient-based time-series explanations. First, we trained the STAN model for video classifications using the global and local views of data and weakly supervised labels on time-series data (i.e. the type of an activity). We then leveraged a gradient-based XAI technique (e.g. saliency map) to identify salient frames of time-series data. According to the experiments using the datasets of four medically relevant activities, the STAN model demonstrated its potential to identify important frames of videos.
- Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
- Human-centered tools for coping with imperfect algorithms during medical decision-making. In Proceedings of the 2019 chi conference on human factors in computing systems. 1–14.
- Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. In CVPR.
- Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299–6308.
- Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, 839–847.
- Conditional positional encodings for vision transformers. arXiv preprint arXiv:2102.10882 (2021).
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
- Multiscale vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision. 6824–6835.
- Towards understanding ECG rhythm classification using convolutional neural networks and attention mappings. In Machine learning for healthcare conference. PMLR, 83–101.
- Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016).
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448–456.
- How much position information do convolutional neural networks encode? arXiv preprint arXiv:2001.08248 (2020).
- Benchmarking deep learning interpretability in time series predictions. Advances in neural information processing systems 33 (2020), 6441–6452.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning. PMLR, 2668–2677.
- Explaining machine learning predictions: State-of-the-art, challenges, and opportunities. NeurIPS Tutorial (2020).
- Min Hun Lee and Yi Jing Choy. 2023. Exploring a Gradient-based Explainable AI Technique for Time-Series Data: A Case Study of Assessing Stroke Rehabilitation Exercises. arXiv preprint arXiv:2305.05525 (2023).
- Towards personalized interaction and corrective feedback of a socially assistive robot for post-stroke rehabilitation therapy. In 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). IEEE, 1366–1373.
- A human-ai collaborative approach for clinical decision making on rehabilitation assessment. In Proceedings of the 2021 CHI conference on human factors in computing systems. 1–14.
- Uniformer: Unified transformer for efficient spatiotemporal representation learning. arXiv preprint arXiv:2201.04676 (2022).
- Ilya Loshchilov and Frank Hutter. 2016. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016).
- Ilya Loshchilov and Frank Hutter. 2018. Fixing weight decay regularization in adam. (2018).
- Learning saliency maps to explain deep time series classifiers. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 1406–1415.
- AI in health and medicine. Nature medicine 28, 1 (2022), 31–38.
- ” Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.
- Explainable artificial intelligence (xai) on timeseries data: A survey. arXiv preprint arXiv:2104.00950 (2021).
- Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296 (2017).
- Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618–626.
- Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1010–1019.
- Wataru Shimoda and Keiji Yanai. 2016. Distinct class-specific saliency maps for weakly supervised semantic segmentation. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer, 218–234.
- Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013).
- Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017).
- Mlp-mixer: An all-mlp architecture for vision. Advances in neural information processing systems 34 (2021), 24261–24272.
- What went wrong and when? Instance-wise feature importance for time-series black-box models. Advances in Neural Information Processing Systems 33 (2020), 799–809.
- Should health care demand interpretable artificial intelligence or accept “black box” medicine? , 59–60 pages.
- Focal self-attention for local-global interactions in vision transformers. arXiv preprint arXiv:2107.00641 (2021).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.