Papers
Topics
Authors
Recent
Search
2000 character limit reached

Toward the Explainability of Protein Language Models for Sequence Design

Published 24 Jun 2025 in q-bio.BM | (2506.19532v1)

Abstract: Transformer-based LLMs excel in a variety of protein-science tasks that range from structure prediction to the design of functional enzymes. However, these models operate as black boxes, and their underlying working principles remain unclear. Here, we survey emerging applications of explainable artificial intelligence (XAI) to protein LLMs (pLMs) and describe their potential in protein research. We break down the workflow of a generative decoder-only Transformer into four information contexts: (i) training sequences, (ii) input prompt, (iii) model architecture, and (iv) output sequence. For each, we describe existing methods and applications of XAI. Additionally, from published studies we distil five (potential) roles that XAI can play in protein design: Evaluator, Multitasker, Engineer, Coach, and Teacher, with the Evaluator role being the only one widely adopted so far. These roles aim to help both protein science practitioners and model developers understand the possibilities and limitations of implementing XAI for the design of sequences. Finally, we highlight the critical areas of application for the future, including risks related to security, trustworthiness, and bias, and we call for community benchmarks, open-source tooling, and domain-specific visualizations to advance explainable protein design. Overall, our analysis aims to move the discussion toward the use of XAI in protein design.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 7 tweets with 84 likes about this paper.