Standardized provenance and licensing via data cards for instrumented data

Develop a standardized data card specification, extending Datasheets for Datasets, that encodes provenance, solver inheritance, quality gates, and reviewer information for instrumented data, including licensing and governance metadata.

Background

Instrumented data inherits information from sensor observations, solvers, verification gates, and human reviewers. Without a standard for communicating this provenance and licensing information, downstream consumers cannot audit or govern data usage effectively.

The authors call for a standardized data card to capture and transmit this inheritance in a machine-readable, comparable form.

References

Nine open questions will determine whether instrumented data matures into a recognised substrate for scientific machine learning. Provenance and licensing. A standardised data card extending Datasheets for Datasets is needed to carry the inheritance from image, solver, gates, and reviewer.

Instrumented data for causal scientific machine learning  (2606.07865 - Wilke, 5 Jun 2026) in Section 7, Methodological questions for the community, Item 4