Factored Controller with Typed Interfaces

Updated 7 February 2026

The paper introduces a factored controller that decomposes high-level human-robot dialogue into statically-typed, auditable modules for robust interaction.
It employs a POMDP framework with explicit type-checking and functional mappings, enabling clear input/output contracts and context persistence.
The approach enforces evidence-based outputs through faithfulness constraints and controlled memory design, ensuring reliable, verifiable system claims.

A factored controller with typed interfaces is a systems architecture for sequential decision processes, applied in the JANUS cognitive assistant to decompose high-level human-robot interaction (HRI) into statically-typed, auditable modules with clearly defined input/output contracts. This approach enables persistent context, robust clarification of underspecified requests, evidence-grounded responses, and guarantees of verifiability and modularity over extended interactions. The architecture models dialogue as a partially observable Markov decision process (POMDP), with controller design centered on explicit, type-checked reasoning steps, agentic memory persistence, and faithfulness constraints that enforce evidence-based claims in system outputs (Belcamino et al., 31 Jan 2026).

1. POMDP Formulation and State Factorization

The interaction loop is formalized as a POMDP $M = \langle Z, A, O, T, Z, R, \gamma \rangle$ , where $Z$ is the latent interaction-state space, $A$ the action space, $O$ the observation space, $T: Z \times A \rightarrow \mathrm{Dist}(Z)$ the state transition kernel, $Z: Z \times A \rightarrow \mathrm{Dist}(O)$ the observation model, $R$ the reward function, and $\gamma$ the discount factor. At each dialogue turn $t$ , the state $z_t$ is factored as:

$z_t = (d_t,\ g_t,\ \tau_t,\ \theta_t,\ S_t)$

$d_t \in \mathcal{D}$ : active domain
$g_t \in \mathcal{G}(d_t)$ : underlying human goal
$\tau_t \in \mathcal{T}(d_t)$ : intent schema
$\theta_t$ : parameter assignment for intent
$S_t = \langle H_t,\ C_t, A_t \rangle$ : memory, decomposed into recent history $H_t$ , compact core $C_t$ , and an archival store $A_t$ .

This structured factorization underpins decomposition of the policy $\pi$ into well-specified modules, each transforming and type-checking contextual variables, rather than learning a monolithic $\pi(y_t \mid x_{1:t})$ .

2. Functional Decomposition: Factored Modules and Typed Interfaces

JANUS operationalizes factored control through a pipeline of statically-typed intermediate variables and functional mappings, each described by type signatures:

Module	Input(s)	Output(s) / Signature
Scope Detection	$(x_t, d_{t-1})$	$\hat{d}_t = f_{SD}(x_t, d_{t-1})$
Intent Recognition	$(x_t, \hat{d}_t)$	$(\tau_t, \tilde\theta_t) = f_{IR}(x_t, \hat{d}_t)$
Intent Postprocess	$(\tau_t, \tilde\theta_t, \hat{d}_t)$	$\theta_t = f_{PP}(\tau_t, \tilde\theta_t, \hat{d}_t)$
Memory Retrieval	$(x_t, S_t)$	$(s_t, W_t) = f_{Mem}(x_t, S_t)$
Inner Speech	$(x_t, \hat{d}_t, \tau_t, \theta_t, W_t)$	$(c_t, \rho_t) = f_{IS}(x_t, \hat{d}_t, \tau_t, \theta_t, W_t)$
Query Generation	$(x_t, \hat{d}_t, \tau_t, \theta_t, W_t, c_t)$	$q_t = f_{QG}(x_t, \hat{d}_t, \tau_t, \theta_t, W_t, c_t)$
Tool Execution	$(\hat{d}_t, q_t)$	$E_t = Exec(\hat{d}_t, q_t)$
Outer Speech	$(x_t, \hat{d}_t, \tau_t, \theta_t, W_t, c_t, \rho_t, E_t)$	$y_t = f_{OS}( ... )$
Memory Update	$(S_t, x_t, y_t, ...)$	$S_{t+1} = \mathcal{U}( ... )$

Each module’s typed interface enforces correct structuring of information flow between controller steps. Typed outputs are statically checked; Intent Recognition must output a single intent schema and a dictionary of named slots matching that schema; Inner Speech explicitly gates downstream processing.

3. Control-Flow, Gating, and Constraints

Critical gating decisions are codified via predicates:

Information-sufficiency ( $s_t$ ):

$s_t = SUFF(x_t, C_t, H_t) \in \{0,1\}$ gates whether the working context $W_t$ is assembled locally or augments with archival retrieval. A top- $k$ similarity search $RETRIEVE_k(x_t, A_t)$ produces $W_t$ , with constraints $LEN(W_t)\leq B_W$ and $COHERENT(W_t) = 1$ .

Execution-readiness ( $p_t$ ):

$p_t = COMP(\tau_t, \theta_t) \in \{0,1\}$ ensures that fully-typed parameters are present. $c_t = \text{Proceed} \implies COMP = 1$ ; otherwise $c_t$ must be set to \emph{Clarify} or \emph{Reject}. No missing parameter is silently defaulted.

Tool-grounding ( $\rho_t$ ):

$\rho_t = TOOL(d_t, \tau_t, \theta_t, W_t) \in \{0, 1\}$ checks if sufficient evidence exists for a tool call. $TOOL$ is defined by non-satisfaction of evidence requirements $\mathcal{R}(\tau_t, \theta_t)$ in $W_t$ .

Modules such as Inner Speech implement these gating predicates, controlling whether tool execution and outer speech proceed or a clarification is issued.

4. Data-Flow and Module Synchronization

The single-turn data flow follows an explicit, repeatable sequence:

$\hat{d}_t \leftarrow f_{SD}(x_t, d_{t-1})$
$(\tau_t, \tilde\theta_t) \leftarrow f_{IR}(x_t, \hat{d}_t)$
$\theta_t \leftarrow f_{PP}(\tau_t, \tilde\theta_t, \hat{d}_t)$
$(s_t, W_t) \leftarrow f_{Mem}(x_t, S_t)$
$(c_t, \rho_t) \leftarrow f_{IS}(x_t, \hat{d}_t, \tau_t, \theta_t, W_t)$
If $c_t=\text{Proceed} \wedge \rho_t=1$ , then $q_t \leftarrow f_{QG}(x_t, \hat{d}_t, \tau_t, \theta_t, W_t, c_t)$ ; $E_t \leftarrow Exec(\hat{d}_t, q_t)$ , else $q_t, E_t \leftarrow \emptyset$ .
$y_t \leftarrow f_{OS}(x_t, \hat{d}_t, \tau_t, \theta_t, W_t, c_t, \rho_t, E_t)$
$S_{t+1} \leftarrow \mathcal{U}(S_t, x_t, y_t, \hat{d}_t, \tau_t, \theta_t, c_t, s_t, \rho_t, E_t)$

Median latencies for JANUS modules in dietary-assistant experiments were: SD (0.37s), IR (0.26s), IS (0.81s), QG (0.74s), OS (0.27s); turn-level: domain-route (0.37s), clarification (1.7s), answer (2.5s) (Belcamino et al., 31 Jan 2026).

5. Memory Design and Controlled Consolidation

JANUS introduces a memory agent, factored into three roles:

$H_t$ : bounded recent history buffer (prompt-sized, rapid lookup)
$C_t$ : compact core memory (semantically deduplicated, capacity-limited)
$A_t$ : archival store (indexed for semantic retrieval, not in immediate working set)

Controlled consolidation and revision operators $\mathcal{U}$ manage transfer, deduplication, and contradiction-resolution (e.g., new user facts supersede older entries). Capacity constraints $|H_t|\leq H_{max}$ , $LEN(C_t)\leq B_{core}$ , and fixed- $k$ retrieval for $A_t$ enforce scalability and predictable computational cost.

6. Evidence Grounding, Faithfulness, and Auditable Reasoning

A central design constraint is that all system claims made in outer speech during Proceed turns must be verifiable from an evidence bundle $B_t = (W_t, E_t)$ . For each natural language response $y_t$ , denoting the set of atomic claims as $Claims(y_t)$ and the set of supported claims as $Supp(B_t)$ , the following faithfulness constraint is enforced:

$Claims(y_t) \subseteq Supp(B_t) \cup SafeDefaults \quad \text{(Eq. 8 in [2602.00675])}$

This conjunction of typed interfaces, explicit slot checks, and evidence-based claim restriction guarantees verifiable interaction, eliminating silent parameter defaulting and restricting claims to those justified by the working context or known safe defaults.

7. Scalability, Modularity, and Domain Extensibility

By isolating core modules with typed input–output signatures, the architecture allows each module to be independently implemented (e.g., via LLM-based classifier, prompt-template, or domain-specific LM), so long as type contracts are honored. A domain-customization layer separates schemas, tools, and prompt templates, enabling new domains or capabilities without disrupting existing logic. No global optimizer is assumed; module-level independence enhances robustness and enables targeted improvements.

The architecture has demonstrated high agreement with curated references in domain-specific dietary tasks, alongside practical latency profiles, supporting factored reasoning as a tractable path to scalable, auditable, and evidence-grounded robot assistance over multi-turn horizons (Belcamino et al., 31 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Factored Reasoning with Inner Speech and Persistent Memory for Evidence-Grounded Human-Robot Interaction (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Factored Controller with Typed Interfaces.

Factored Controller with Typed Interfaces

1. POMDP Formulation and State Factorization

2. Functional Decomposition: Factored Modules and Typed Interfaces

3. Control-Flow, Gating, and Constraints

4. Data-Flow and Module Synchronization

5. Memory Design and Controlled Consolidation

6. Evidence Grounding, Faithfulness, and Auditable Reasoning

7. Scalability, Modularity, and Domain Extensibility

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Factored Controller with Typed Interfaces

1. POMDP Formulation and State Factorization

2. Functional Decomposition: Factored Modules and Typed Interfaces

3. Control-Flow, Gating, and Constraints

4. Data-Flow and Module Synchronization

5. Memory Design and Controlled Consolidation

6. Evidence Grounding, Faithfulness, and Auditable Reasoning

7. Scalability, Modularity, and Domain Extensibility

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research