Papers
Topics
Authors
Recent
Search
2000 character limit reached

LangXAI: Integrating Large Vision Models for Generating Textual Explanations to Enhance Explainability in Visual Perception Tasks

Published 19 Feb 2024 in cs.CV and cs.AI | (2402.12525v1)

Abstract: LangXAI is a framework that integrates Explainable Artificial Intelligence (XAI) with advanced vision models to generate textual explanations for visual recognition tasks. Despite XAI advancements, an understanding gap persists for end-users with limited domain knowledge in artificial intelligence and computer vision. LangXAI addresses this by furnishing text-based explanations for classification, object detection, and semantic segmentation model outputs to end-users. Preliminary results demonstrate LangXAI's enhanced plausibility, with high BERTScore across tasks, fostering a more transparent and reliable AI framework on vision tasks for end-users.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Ttpla: An aerial-image dataset for detection and segmentation of transmission towers and power lines. In Proceedings of the Asian Conference on Computer Vision, 2020.
  2. Flamingo: a visual language model for few-shot learning, 2022.
  3. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72, 2005.
  4. A novel explainable artificial intelligence model in image classification problem, 2023.
  5. A survey on evaluation of large language models, 2023.
  6. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, March 2018.
  7. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
  8. Xai-enhanced semantic segmentation models for visual quality inspection. arXiv preprint arXiv:2401.09900, 2024.
  9. Use hirescam instead of grad-cam for faithful explanations of convolutional neural networks. arXiv preprint arXiv:2011.08891, 2020.
  10. Palm-e: An embodied multimodal language model, 2023.
  11. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021.
  12. Promptcap: Prompt-guided task-aware image captioning, 2023.
  13. Bridging ai developers and end users: An end-user-centred explainable ai taxonomy and visual vocabularies. Proceedings of the IEEE Visualization, Vancouver, BC, Canada, pages 20–25, 2019.
  14. Euca: The end-user-centered explainable ai framework. arXiv preprint arXiv:2102.02437, 2021.
  15. Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014.
  16. Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
  17. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12009–12019, 2022.
  18. G-came: Gaussian-class activation mapping explainer for object detectors. arXiv preprint arXiv:2306.03400, 2023.
  19. Towards trust of explainable ai in thyroid nodule diagnosis. arXiv preprint arXiv:2303.04731, 2023.
  20. OpenAI. Gpt-4v(ision) system card, sep 2023.
  21. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
  22. Perturbation-based explainable ai for ecg sensor data. Applied Sciences, 13(3), 2023.
  23. Kosmos-2: Grounding multimodal large language models to the world, 2023.
  24. Rise: Randomized input sampling for explanation of black-box models, 2018.
  25. Black-box explanation of object detectors via saliency maps, 2021.
  26. Do imagenet classifiers generalize to imagenet? In International Conference on Machine Learning, pages 5389–5400, 2019.
  27. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  28. Transforming gradient-based techniques into interpretable methods, 2024.
  29. Human-centered xai: Developing design patterns for explanations of clinical decision support systems. International Journal of Human-Computer Studies, 154:102684, 2021.
  30. Grad-cam: Visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 618–626, 2017.
  31. Prompting large language models with answer heuristics for knowledge-based visual question answering. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14974–14983, 2023.
  32. Ada-sise: Adaptive semantic input sampling for efficient explanation of convolutional neural networks, 2021.
  33. Towards better explanations for object detection. arXiv preprint arXiv:2306.02744, 2023.
  34. Gpt-4v(ision) is a human-aligned evaluator for text-to-3d generation, 2024.
  35. The dawn of lmms: Preliminary explorations with gpt-4v(ision), 2023.
  36. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
  37. Gpt-4v(ision) as a generalist evaluator for vision-language tasks, 2023.
Citations (4)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.