Papers
Topics
Authors
Recent
Search
2000 character limit reached

Breaking Up with Normatively Monolithic Agency with GRACE: A Reason-Based Neuro-Symbolic Architecture for Safe and Ethical AI Alignment

Published 15 Jan 2026 in cs.AI and cs.CY | (2601.10520v2)

Abstract: As AI agents become increasingly autonomous, widely deployed in consequential contexts, and efficacious in bringing about real-world impacts, ensuring that their decisions are not only instrumentally effective but also normatively aligned has become critical. We introduce a neuro-symbolic reason-based containment architecture, Governor for Reason-Aligned ContainmEnt (GRACE), that decouples normative reasoning from instrumental decision-making and can contain AI agents of virtually any design. GRACE restructures decision-making into three modules: a Moral Module (MM) that determines permissible macro actions via deontic logic-based reasoning; a Decision-Making Module (DMM) that encapsulates the target agent while selecting instrumentally optimal primitive actions in accordance with derived macro actions; and a Guard that monitors and enforces moral compliance. The MM uses a reason-based formalism providing a semantic foundation for deontic logic, enabling interpretability, contestability, and justifiability. Its symbolic representation enriches the DMM's informational context and supports formal verification and statistical guarantees of alignment enforced by the Guard. We demonstrate GRACE on an example of a LLM therapy assistant, showing how it enables stakeholders to understand, contest, and refine agent behavior.

Summary

  • The paper presents GRACE, a neuro-symbolic architecture that decouples normative reasoning from instrumental decision-making for ethical AI alignment.
  • It introduces distinct modules—Moral Module, Decision-Making Module, and Guard—to monitor and enforce ethical constraints through reason-based inference.
  • An evaluation using a therapy assistant scenario demonstrates GRACE’s ability to navigate complex ethical dilemmas with adaptive modular control.

A Neuro-Symbolic Architecture for AI Alignment: GRACE

Introduction

The deployment of AI agents in high-stakes decision-making contexts necessitates the integration of normative considerations with instrumental effectiveness. The paper "Breaking Up with Normatively Monolithic Agency with GRACE: A Reason-Based Neuro-Symbolic Architecture for Safe and Ethical AI Alignment" introduces GRACE, a neuro-symbolic reason-based containment architecture. This architecture aims to separate normative reasoning from instrumental decision-making, allowing AI agents to achieve a balance between optimal performance and ethical alignment. GRACE restructures decision-making into specialized modules, each tailored to distinct roles within the moral and instrumental framework.

Problem Statement and Motivation

Current AI architectures often merge instrumental decision-making and normative constraints into a single opaque policy function. This monolithic approach compromises transparency and contestability, obstructing oversight and accountability within morally charged environments. The paper identifies this "flattening problem" and proposes a solution involving the separation of these dimensions to address normative pluralism and safeguard against ethical liabilities. Through decomposing agent functionalities into distinct parts, GRACE enables more precise verification and modular adaptation of agents to evolving moral standards.

Architecture Overview

GRACE is structured into three core modules:

  1. Moral Module (MM): Responsible for assessing permissible macro actions via deontic logic-based reasoning, this module ensures alignment with moral requirements through symbolic representation.
  2. Decision-Making Module (DMM): Encapsulates the agent’s core functionalities, focusing on choosing instrumentally effective actions within the constraints set by the MM.
  3. Guard: Monitors and enforces compliance with moral actions, ensuring adherence to permissible macro action types.

The architecture fosters a multi-agent system, where modules connect through predefined observation and action spaces. Figure 1

Figure 1: The GRACE containment architecture. The Moral Module determines permissible macro action types via reason-based inference; the Decision-Making Module selects instrumentally optimal actions within these constraints; the Guard enforces compliance; and the Moral Advisor provides external corrective feedback without modifying the encapsulated core agent.

Normative Reasoning and Formalization

Central to GRACE is the use of normative reasons as the basis for moral decision-making. The MM employs a reason-based formalism, providing a semantic foundation for deontic logic. This approach enhances interpretability, contestability, and justifiability. Normative reasoning allows for explicit articulation of the considerations guiding agent actions, facilitating principled resolution of moral conflicts and enabling robust oversight. Figure 2

Figure 2: Graph representation of the reason theory and resulting moral reasoning in various cases within the MM.

Evaluation and Practical Example

The paper illustrates GRACE’s utility through a therapy assistant scenario, highlighting the architecture’s ability to manage complex ethical tensions—such as patient confidentiality and harm prevention. The MM evaluates reasons for action and communicates constraints, while the DMM optimizes the next steps under these constraints. The example underscores GRACE’s capacity for adapting to dynamic moral landscapes, ensuring ethically sound decision-making. Figure 3

Figure 3: Our containment architecture as a multi-agent system: each module (as well as the overall system) implements an agent instance connected with the other modules via their individual observation and action space.

Implications and Future Directions

GRACE has profound implications for AI alignment, offering a framework to ensure safe and ethical agent behavior across varied domains. By decomposing decision-making processes, GRACE potentially streamlines empirical validation and supports more versatile applications without compromising ethical standards. Future research will focus on formalizing macro action types using temporal logic and advancing the interaction dynamics between modular components for improved generalization.

Conclusion

GRACE represents a forward-thinking approach to AI alignment, addressing inherent challenges in existing monolithic systems through principled structural separation. Its neuro-symbolic groundwork facilitates the creation of autonomous agents capable of navigating complex moral terrains confidently. Continued development and empirical evaluations will further refine its application, paving the way for safer, ethically aligned AI solutions.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.