LangXAI: Integrating Large Vision Models for Generating Textual Explanations to Enhance Explainability in Visual Perception Tasks
Abstract: LangXAI is a framework that integrates Explainable Artificial Intelligence (XAI) with advanced vision models to generate textual explanations for visual recognition tasks. Despite XAI advancements, an understanding gap persists for end-users with limited domain knowledge in artificial intelligence and computer vision. LangXAI addresses this by furnishing text-based explanations for classification, object detection, and semantic segmentation model outputs to end-users. Preliminary results demonstrate LangXAI's enhanced plausibility, with high BERTScore across tasks, fostering a more transparent and reliable AI framework on vision tasks for end-users.
- Ttpla: An aerial-image dataset for detection and segmentation of transmission towers and power lines. In Proceedings of the Asian Conference on Computer Vision, 2020.
- Flamingo: a visual language model for few-shot learning, 2022.
- Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72, 2005.
- A novel explainable artificial intelligence model in image classification problem, 2023.
- A survey on evaluation of large language models, 2023.
- Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, March 2018.
- Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
- Xai-enhanced semantic segmentation models for visual quality inspection. arXiv preprint arXiv:2401.09900, 2024.
- Use hirescam instead of grad-cam for faithful explanations of convolutional neural networks. arXiv preprint arXiv:2011.08891, 2020.
- Palm-e: An embodied multimodal language model, 2023.
- Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021.
- Promptcap: Prompt-guided task-aware image captioning, 2023.
- Bridging ai developers and end users: An end-user-centred explainable ai taxonomy and visual vocabularies. Proceedings of the IEEE Visualization, Vancouver, BC, Canada, pages 20–25, 2019.
- Euca: The end-user-centered explainable ai framework. arXiv preprint arXiv:2102.02437, 2021.
- Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014.
- Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
- Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12009–12019, 2022.
- G-came: Gaussian-class activation mapping explainer for object detectors. arXiv preprint arXiv:2306.03400, 2023.
- Towards trust of explainable ai in thyroid nodule diagnosis. arXiv preprint arXiv:2303.04731, 2023.
- OpenAI. Gpt-4v(ision) system card, sep 2023.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
- Perturbation-based explainable ai for ecg sensor data. Applied Sciences, 13(3), 2023.
- Kosmos-2: Grounding multimodal large language models to the world, 2023.
- Rise: Randomized input sampling for explanation of black-box models, 2018.
- Black-box explanation of object detectors via saliency maps, 2021.
- Do imagenet classifiers generalize to imagenet? In International Conference on Machine Learning, pages 5389–5400, 2019.
- Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
- Transforming gradient-based techniques into interpretable methods, 2024.
- Human-centered xai: Developing design patterns for explanations of clinical decision support systems. International Journal of Human-Computer Studies, 154:102684, 2021.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 618–626, 2017.
- Prompting large language models with answer heuristics for knowledge-based visual question answering. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14974–14983, 2023.
- Ada-sise: Adaptive semantic input sampling for efficient explanation of convolutional neural networks, 2021.
- Towards better explanations for object detection. arXiv preprint arXiv:2306.02744, 2023.
- Gpt-4v(ision) is a human-aligned evaluator for text-to-3d generation, 2024.
- The dawn of lmms: Preliminary explorations with gpt-4v(ision), 2023.
- Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
- Gpt-4v(ision) as a generalist evaluator for vision-language tasks, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.