Guideline for Trustworthy Artificial Intelligence -- AI Assessment Catalog
Abstract: AI has made impressive progress in recent years and represents a key technology that has a crucial impact on the economy and society. However, it is clear that AI and business models based on it can only reach their full potential if AI applications are developed according to high quality standards and are effectively protected against new AI risks. For instance, AI bears the risk of unfair treatment of individuals when processing personal data e.g., to support credit lending or staff recruitment decisions. The emergence of these new risks is closely linked to the fact that the behavior of AI applications, particularly those based on Machine Learning (ML), is essentially learned from large volumes of data and is not predetermined by fixed programmed rules. Thus, the issue of the trustworthiness of AI applications is crucial and is the subject of numerous major publications by stakeholders in politics, business and society. In addition, there is mutual agreement that the requirements for trustworthy AI, which are often described in an abstract way, must now be made clear and tangible. One challenge to overcome here relates to the fact that the specific quality criteria for an AI application depend heavily on the application context and possible measures to fulfill them in turn depend heavily on the AI technology used. Lastly, practical assessment procedures are needed to evaluate whether specific AI applications have been developed according to adequate quality standards. This AI assessment catalog addresses exactly this point and is intended for two target groups: Firstly, it provides developers with a guideline for systematically making their AI applications trustworthy. Secondly, it guides assessors and auditors on how to examine AI applications for trustworthiness in a structured way.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
Guideline for Trustworthy Artificial Intelligence — Explained Simply
What is this paper about?
This paper is a practical guide for making AI systems safe, fair, and reliable. It was written by researchers at the Fraunhofer Institute (Germany) to help two groups:
- People who build AI, so they do it responsibly from the start.
- People who check or audit AI, so they can judge if an AI system can be trusted.
Think of it like a safety and quality checklist for AI, similar to how cars need inspections before they go on the road.
What questions does it try to answer?
The paper focuses on easy-to-understand but important questions:
- How do we figure out if an AI system is trustworthy?
- What risks could an AI system create (for example, unfair decisions or unsafe behavior)?
- How do we measure and prove that an AI system meets good standards?
- How do we balance trade-offs, like being transparent without making the system easier to hack?
How did the authors approach it?
The authors created an “AI Assessment Catalog” — a step-by-step method and a shared language for talking about AI quality. Here’s the approach, in everyday terms:
- A risk-based process Not all AI systems are equally risky. An AI that recommends movies is different from one that helps drive a car. The guide says: first, understand the real-world risks, then set the right level of strictness.
- Six dimensions of trustworthiness The catalog organizes what “trustworthy” means into six areas:
- Fairness: Don’t treat people unfairly (e.g., no bias in hiring or credit decisions).
- Autonomy and Control: Keep humans in charge; the AI should support, not replace, responsible decisions.
- Transparency: Make it clear that AI is involved, and explain its results at the right level (for users and experts).
- Reliability: Work well in everyday cases and handle surprises without breaking.
- Safety and Security: Don’t harm people or systems; resist attacks and glitches.
- Data Protection: Protect personal and business-sensitive data.
- Four steps to assess an AI system The catalog suggests a clear, repeatable process:
- Step 1: Risk analysis — identify what could go wrong in each dimension (e.g., unfairness, errors, attacks).
- Step 2: Objectives — decide what “good enough” looks like, preferably with measurable targets (called KPIs, like a score or threshold).
- Step 3: Measures — take concrete actions across the AI’s life cycle to reach those objectives:
- Data: Collect, clean, and check your data for quality and bias.
- AI component (the model and its pre/post-processing): Build and test the model carefully.
- Embedding (how the AI fits into the bigger system): Add safety nets and sensible interfaces.
- Operation (after launch): Monitor performance, update safely, and handle problems.
- Step 4: Safeguarding argumentation — write a clear, evidence-based explanation that shows how risks were handled and why the system is trustworthy, including any trade-offs (for example, more transparency might reduce security, so explain your choices).
- Clear definitions of what’s being assessed The paper explains the parts of an AI system using simple building blocks:
- Model: The learned “brain” (e.g., a neural network).
- AI component: The model plus the extra steps that prepare inputs and interpret outputs.
- Embedding: The surrounding software and hardware that helps the AI work in the real world (interfaces, monitoring, safety checks).
- AI application: The whole input-to-output behavior in its real use (for example, identifying pedestrians and triggering alerts in a car).
This makes sure everyone (developers, auditors, regulators) is talking about the same thing.
- Fits with laws and standards The approach lines up with the European Union’s AI Act (which requires extra checks for “high-risk” AI) and complements existing standards. It’s designed to plug into current testing and certification processes.
What did they create, and why is it important?
Main results:
- A structured, risk-based catalog that turns broad AI ethics ideas into concrete steps.
- Checklists, criteria, and examples for measuring quality (like using suitable scores for translation quality or fairness metrics in hiring).
- A way to document evidence so an independent auditor can verify that an AI system is trustworthy.
- Guidance across the whole AI life cycle: design, build, test, deploy, and maintain.
Why it matters:
- It helps prevent real harms, like biased loan decisions or self-driving cars missing pedestrians due to “noisy” images.
- It builds public trust by making AI behavior explainable and accountable.
- It prepares companies for future legal requirements and certifications, which can boost confidence and competitiveness.
Simple examples of how this helps
- Fairness: If an AI screens job applications, the catalog helps set up checks so it doesn’t unfairly reject people based on gender, age, or background.
- Reliability and Safety: If an AI helps a car recognize pedestrians, the catalog guides stress tests for unusual situations (bad weather, image noise) and adds backup checks.
- Transparency: If a hospital uses AI to support diagnoses, patients and doctors should know AI is involved and get explanations they can understand.
- Data Protection: Personal and business data must be handled securely at all stages, from training the model to operating it live.
- Control of Dynamics: If data changes over time (for example, new slang in social media), the system should be monitored and updated safely without learning harmful behaviors.
What could this change in the real world?
- Better AI design: Developers get a practical roadmap to build safer, fairer systems from day one.
- Stronger audits: Independent assessors can evaluate AI more consistently and thoroughly.
- Legal readiness: Organizations can meet upcoming rules (like the EU AI Act) more easily.
- Public trust: Users are more likely to accept AI when it’s clear, fair, and secured.
- Ongoing improvement: Because the catalog covers the full life cycle, it encourages regular monitoring, updates, and responsible upgrades.
In short: This paper turns the big idea of “trustworthy AI” into a hands-on playbook. It shows how to spot risks, set measurable goals, apply the right fixes, and prove that an AI system is safe, fair, reliable, secure, explainable, and respectful of privacy—so people and organizations can use AI with confidence.
Knowledge Gaps
Knowledge Gaps, Limitations, and Open Questions
Below is a focused list of what remains missing, uncertain, or unexplored, framed to enable concrete follow-on research and development.
- Lack of domain-specific KPIs and thresholds: The catalog calls for measurable objectives but does not provide standardized, validated metrics and acceptance thresholds per domain and risk level across all six dimensions (e.g., beyond BLEU for translation). Action: develop domain- and risk-tiered metric sets with reference thresholds and validation protocols.
- No formal risk quantification scheme: Severity/likelihood scoring, risk matrices, and mappings from risk levels to assurance requirements are not specified. Action: define a reproducible risk scoring framework and its linkage to assurance activities and evidence depth.
- Absent aggregation/weighting of dimensions: The cross-dimensional assessment lacks a formal method to aggregate criteria into an overall trustworthiness judgment. Action: design multi-criteria decision analysis methods (weights, uncertainty, sensitivity analysis) aligned to application criticality.
- Single-component assumption: The catalog generally assumes one AI component and does not address systems composed of multiple interacting models, multi-modal pipelines, or complex ensembles. Action: extend methods to model interactions, emergent risks, and end-to-end assurance for multi-component architectures.
- Limited guidance for foundation models and generative AI: Risks such as hallucinations, prompt injection, output toxicity, copyright, provenance/watermarking, and content moderation are not operationalized. Action: add threat models, test suites, and mitigation criteria specific to LLMs and other foundation models.
- Trade-off resolution is qualitative: Transparency–security, performance–fairness, and other trade-offs are acknowledged but lack quantitative optimization or decision frameworks. Action: create trade-off analysis methods with measurable constraints, stakeholder preference elicitation, and defensible decision records.
- Dynamic/online learning controls are high-level: Concrete drift detection techniques, monitoring KPIs, change-impact analysis, rollback plans, retraining triggers, and re-certification policies are unspecified. Action: define operational thresholds, monitoring playbooks, and governance for model updates.
- Testing under distribution shift is underspecified: There is no protocol for OOD stress testing, scenario coverage metrics, or synthetic scenario generation. Action: develop standardized OOD evaluation suites, coverage metrics, and acceptance criteria per domain.
- Adversarial security testing lacks detail: Threat models, attack surfaces (data, model, pipeline, supply chain), penetration/red-team procedures, and robustness metrics are not concretized. Action: publish adversarial test catalogs, success criteria, and hardening baselines.
- Privacy metrics and audits are incomplete: Concrete parameters for differential privacy, membership/memorization testing, PI re-identification risk, and privacy auditing of data lineage are not provided. Action: set privacy metric targets (e.g., epsilon bounds), auditing procedures, and acceptance thresholds by use case.
- Fairness selection and validation gaps: Criteria to choose fairness definitions per context, intersectional fairness testing, subgroup discovery, and fairness under distribution shift are not operationalized. Action: provide decision trees for fairness metric selection and standardized subgroup/shift evaluation protocols.
- Explanation utility is unmeasured: There is no method to test whether transparency artifacts are comprehensible and actionable for various user roles. Action: define user-centered explanation usability studies, comprehension KPIs, and minimum thresholds.
- Auditability/evidence templates missing: Concrete templates for technical documentation, evidence artifacts, logs, provenance, and chain-of-custody are not included. Action: release standardized evidence schemas and traceability requirements with tool support.
- Responsibility allocation across the AI supply chain: Practical guidance to assign obligations between developers, data providers, model providers, and cloud/infra vendors is limited. Action: propose RACI matrices, contractual clauses, and SLAs aligned to risk.
- Mapping to existing standards is high-level: Detailed, testable mappings to ISO/IEC 42001, ISO 23894, ISO 27001, ISO 26262, IEC 62304, DO-178C, EU MDR/IVDR, etc., are absent. Action: create normative crosswalks and conformance test cases.
- Certification process design unclear: Assessor competence requirements, sampling strategies, black-box vs. white-box access policies, test depth, and surveillance audit cadence are not defined. Action: specify assessor qualification criteria and audit methodology playbooks.
- Inter-rater reliability and reproducibility: There is no plan to measure and improve consistency across assessors or tools. Action: run inter-rater studies, define calibration datasets, and publish target reliability metrics.
- Tooling and automation gaps: No open-source/reference tools, checklists, dashboards, or test harnesses are provided to operationalize the catalog at scale (especially for SMEs). Action: develop toolchains and reference implementations.
- Empirical validation limited: Aside from citing pilots, there is no rigorous evidence that applying the catalog improves safety, fairness, or reliability outcomes. Action: conduct longitudinal, cross-sector studies with pre/post metrics and incident rate analysis.
- Lifecycle re-assessment triggers undefined: Specific triggers (e.g., drift thresholds, incident types, data shifts) and workflows for re-assessment/re-certification are not set. Action: codify trigger conditions and change-control procedures.
- Model IP protection not detailed: Protections against model extraction, inversion, watermarking schemes, and license enforcement are not operationalized. Action: define defense measures, detection tests, and acceptance criteria.
- Uncertainty estimation use is vague: Concrete methods, calibration metrics (e.g., ECE), and decision policies for abstention/triage based on uncertainty are not standardized. Action: mandate calibration checks and integrate uncertainty into human-in-the-loop policies.
- Human factors and oversight design: Detailed guidance to mitigate automation bias, calibrate trust, design escalation paths, and train users is limited. Action: provide HMI design patterns, oversight KPIs, and training curricula.
- Data governance specifics lacking: Standards for dataset documentation (e.g., datasheets), versioning, synthetic data validation, labeling quality control, and augmentation bias are not specified. Action: issue data governance checklists and validation tests.
- Environmental sustainability omitted: Energy, carbon, and hardware footprint metrics and targets are not addressed. Action: add measurement protocols and thresholds for environmental impact.
- Incident reporting and post-market surveillance: Taxonomies, minimal report fields, timelines, and public reporting mechanisms are not defined. Action: create an incident reporting standard and feedback loops into risk controls.
- Legal defensibility of evidence: How the safeguarding argumentation aligns with evidentiary standards for regulators/courts is not clarified. Action: map evidence to legal standards and define retention/immutability requirements.
- Edge/embedded deployment guidance: Impacts of quantization, compression, on-device monitoring, and OTA update assurance are not covered. Action: provide resource-constrained testing and update security requirements.
- Multilingual and cross-cultural performance: Ensuring fairness, reliability, and transparency across languages and cultural contexts is not addressed. Action: develop multi-locale evaluation protocols and acceptance thresholds.
- Data rights and copyright compliance: Procedures to verify lawful data sourcing, licensing, and copyrighted content usage in training are not concretized. Action: define due-diligence checks, attestations, and audit trails.
- Global regulatory alignment: The catalog centers on EU/German context and predates the final EU AI Act; alignment with updated EU provisions and non-EU regimes (US, UK, OECD) is pending. Action: update mappings and gap analyses against current laws and guidance.
Collections
Sign up for free to add this paper to one or more collections.