Efficacy of general-purpose language models for materials science

Ascertain whether training large, general-purpose language models on broad, heterogeneous text corpora such as The Pile or Common Crawl yields beneficial generalization for materials science tasks compared to specialized, domain-focused models.

Background

The paper discusses integrating LLMs into the processing–structure–property–performance (PSPP) reasoning chain in materials science. While LLMs encode extensive world knowledge, their training data often comprise non-scientific sources, raising questions about the relevance and fidelity of their internal representations for materials-specific inference.

For sustainable and scientifically rigorous applications, the authors advocate specialized small LLMs and systematic benchmarking, noting that the real value of general-purpose LLM pretraining for materials science has yet to be firmly established.

References

In the context of sustainably applying ML approaches for material discovery, it is questionable if large, general-purpose models, which include a variety of scientific and non-scientific sources, are the right tools: It is yet to be proven that the data points from such diverse fields that are covered by huge datasets such as the Pile or the Common Crawl lead to a generalization that is beneficial for the field of material science.

Perspective: Towards sustainable exploration of chemical spaces with machine learning  (2604.00069 - Sandonas et al., 31 Mar 2026) in Subsubsection 'Language models as predictive tools', Open challenges