Papers
Topics
Authors
Recent
Search
2000 character limit reached

Factored Controller with Typed Interfaces

Updated 7 February 2026
  • The paper introduces a factored controller that decomposes high-level human-robot dialogue into statically-typed, auditable modules for robust interaction.
  • It employs a POMDP framework with explicit type-checking and functional mappings, enabling clear input/output contracts and context persistence.
  • The approach enforces evidence-based outputs through faithfulness constraints and controlled memory design, ensuring reliable, verifiable system claims.

A factored controller with typed interfaces is a systems architecture for sequential decision processes, applied in the JANUS cognitive assistant to decompose high-level human-robot interaction (HRI) into statically-typed, auditable modules with clearly defined input/output contracts. This approach enables persistent context, robust clarification of underspecified requests, evidence-grounded responses, and guarantees of verifiability and modularity over extended interactions. The architecture models dialogue as a partially observable Markov decision process (POMDP), with controller design centered on explicit, type-checked reasoning steps, agentic memory persistence, and faithfulness constraints that enforce evidence-based claims in system outputs (Belcamino et al., 31 Jan 2026).

1. POMDP Formulation and State Factorization

The interaction loop is formalized as a POMDP M=Z,A,O,T,Z,R,γM = \langle Z, A, O, T, Z, R, \gamma \rangle, where ZZ is the latent interaction-state space, AA the action space, OO the observation space, T:Z×ADist(Z)T: Z \times A \rightarrow \mathrm{Dist}(Z) the state transition kernel, Z:Z×ADist(O)Z: Z \times A \rightarrow \mathrm{Dist}(O) the observation model, RR the reward function, and γ\gamma the discount factor. At each dialogue turn tt, the state ztz_t is factored as:

zt=(dt, gt, τt, θt, St)z_t = (d_t,\ g_t,\ \tau_t,\ \theta_t,\ S_t)

  • dtDd_t \in \mathcal{D}: active domain
  • gtG(dt)g_t \in \mathcal{G}(d_t): underlying human goal
  • τtT(dt)\tau_t \in \mathcal{T}(d_t): intent schema
  • θt\theta_t: parameter assignment for intent
  • St=Ht, Ct,AtS_t = \langle H_t,\ C_t, A_t \rangle: memory, decomposed into recent history HtH_t, compact core CtC_t, and an archival store AtA_t.

This structured factorization underpins decomposition of the policy π\pi into well-specified modules, each transforming and type-checking contextual variables, rather than learning a monolithic π(ytx1:t)\pi(y_t \mid x_{1:t}).

2. Functional Decomposition: Factored Modules and Typed Interfaces

JANUS operationalizes factored control through a pipeline of statically-typed intermediate variables and functional mappings, each described by type signatures:

Module Input(s) Output(s) / Signature
Scope Detection (xt,dt1)(x_t, d_{t-1}) d^t=fSD(xt,dt1)\hat{d}_t = f_{SD}(x_t, d_{t-1})
Intent Recognition (xt,d^t)(x_t, \hat{d}_t) (τt,θ~t)=fIR(xt,d^t)(\tau_t, \tilde\theta_t) = f_{IR}(x_t, \hat{d}_t)
Intent Postprocess (τt,θ~t,d^t)(\tau_t, \tilde\theta_t, \hat{d}_t) θt=fPP(τt,θ~t,d^t)\theta_t = f_{PP}(\tau_t, \tilde\theta_t, \hat{d}_t)
Memory Retrieval (xt,St)(x_t, S_t) (st,Wt)=fMem(xt,St)(s_t, W_t) = f_{Mem}(x_t, S_t)
Inner Speech (xt,d^t,τt,θt,Wt)(x_t, \hat{d}_t, \tau_t, \theta_t, W_t) (ct,ρt)=fIS(xt,d^t,τt,θt,Wt)(c_t, \rho_t) = f_{IS}(x_t, \hat{d}_t, \tau_t, \theta_t, W_t)
Query Generation (xt,d^t,τt,θt,Wt,ct)(x_t, \hat{d}_t, \tau_t, \theta_t, W_t, c_t) qt=fQG(xt,d^t,τt,θt,Wt,ct)q_t = f_{QG}(x_t, \hat{d}_t, \tau_t, \theta_t, W_t, c_t)
Tool Execution (d^t,qt)(\hat{d}_t, q_t) Et=Exec(d^t,qt)E_t = Exec(\hat{d}_t, q_t)
Outer Speech (xt,d^t,τt,θt,Wt,ct,ρt,Et)(x_t, \hat{d}_t, \tau_t, \theta_t, W_t, c_t, \rho_t, E_t) yt=fOS(...)y_t = f_{OS}( ... )
Memory Update (St,xt,yt,...)(S_t, x_t, y_t, ...) St+1=U(...)S_{t+1} = \mathcal{U}( ... )

Each module’s typed interface enforces correct structuring of information flow between controller steps. Typed outputs are statically checked; Intent Recognition must output a single intent schema and a dictionary of named slots matching that schema; Inner Speech explicitly gates downstream processing.

3. Control-Flow, Gating, and Constraints

Critical gating decisions are codified via predicates:

  • Information-sufficiency (sts_t):

st=SUFF(xt,Ct,Ht){0,1}s_t = SUFF(x_t, C_t, H_t) \in \{0,1\} gates whether the working context WtW_t is assembled locally or augments with archival retrieval. A top-kk similarity search RETRIEVEk(xt,At)RETRIEVE_k(x_t, A_t) produces WtW_t, with constraints LEN(Wt)BWLEN(W_t)\leq B_W and COHERENT(Wt)=1COHERENT(W_t) = 1.

  • Execution-readiness (ptp_t):

pt=COMP(τt,θt){0,1}p_t = COMP(\tau_t, \theta_t) \in \{0,1\} ensures that fully-typed parameters are present. ct=Proceed    COMP=1c_t = \text{Proceed} \implies COMP = 1; otherwise ctc_t must be set to \emph{Clarify} or \emph{Reject}. No missing parameter is silently defaulted.

  • Tool-grounding (ρt\rho_t):

ρt=TOOL(dt,τt,θt,Wt){0,1}\rho_t = TOOL(d_t, \tau_t, \theta_t, W_t) \in \{0, 1\} checks if sufficient evidence exists for a tool call. TOOLTOOL is defined by non-satisfaction of evidence requirements R(τt,θt)\mathcal{R}(\tau_t, \theta_t) in WtW_t.

Modules such as Inner Speech implement these gating predicates, controlling whether tool execution and outer speech proceed or a clarification is issued.

4. Data-Flow and Module Synchronization

The single-turn data flow follows an explicit, repeatable sequence:

  1. d^tfSD(xt,dt1)\hat{d}_t \leftarrow f_{SD}(x_t, d_{t-1})
  2. (τt,θ~t)fIR(xt,d^t)(\tau_t, \tilde\theta_t) \leftarrow f_{IR}(x_t, \hat{d}_t)
  3. θtfPP(τt,θ~t,d^t)\theta_t \leftarrow f_{PP}(\tau_t, \tilde\theta_t, \hat{d}_t)
  4. (st,Wt)fMem(xt,St)(s_t, W_t) \leftarrow f_{Mem}(x_t, S_t)
  5. (ct,ρt)fIS(xt,d^t,τt,θt,Wt)(c_t, \rho_t) \leftarrow f_{IS}(x_t, \hat{d}_t, \tau_t, \theta_t, W_t)
  6. If ct=Proceedρt=1c_t=\text{Proceed} \wedge \rho_t=1, then qtfQG(xt,d^t,τt,θt,Wt,ct)q_t \leftarrow f_{QG}(x_t, \hat{d}_t, \tau_t, \theta_t, W_t, c_t); EtExec(d^t,qt)E_t \leftarrow Exec(\hat{d}_t, q_t), else qt,Etq_t, E_t \leftarrow \emptyset.
  7. ytfOS(xt,d^t,τt,θt,Wt,ct,ρt,Et)y_t \leftarrow f_{OS}(x_t, \hat{d}_t, \tau_t, \theta_t, W_t, c_t, \rho_t, E_t)
  8. St+1U(St,xt,yt,d^t,τt,θt,ct,st,ρt,Et)S_{t+1} \leftarrow \mathcal{U}(S_t, x_t, y_t, \hat{d}_t, \tau_t, \theta_t, c_t, s_t, \rho_t, E_t)

Median latencies for JANUS modules in dietary-assistant experiments were: SD (0.37s), IR (0.26s), IS (0.81s), QG (0.74s), OS (0.27s); turn-level: domain-route (0.37s), clarification (1.7s), answer (2.5s) (Belcamino et al., 31 Jan 2026).

5. Memory Design and Controlled Consolidation

JANUS introduces a memory agent, factored into three roles:

  • HtH_t: bounded recent history buffer (prompt-sized, rapid lookup)
  • CtC_t: compact core memory (semantically deduplicated, capacity-limited)
  • AtA_t: archival store (indexed for semantic retrieval, not in immediate working set)

Controlled consolidation and revision operators U\mathcal{U} manage transfer, deduplication, and contradiction-resolution (e.g., new user facts supersede older entries). Capacity constraints HtHmax|H_t|\leq H_{max}, LEN(Ct)BcoreLEN(C_t)\leq B_{core}, and fixed-kk retrieval for AtA_t enforce scalability and predictable computational cost.

6. Evidence Grounding, Faithfulness, and Auditable Reasoning

A central design constraint is that all system claims made in outer speech during Proceed turns must be verifiable from an evidence bundle Bt=(Wt,Et)B_t = (W_t, E_t). For each natural language response yty_t, denoting the set of atomic claims as Claims(yt)Claims(y_t) and the set of supported claims as Supp(Bt)Supp(B_t), the following faithfulness constraint is enforced:

Claims(yt)Supp(Bt)SafeDefaults(Eq. 8 in [2602.00675])Claims(y_t) \subseteq Supp(B_t) \cup SafeDefaults \quad \text{(Eq. 8 in [2602.00675])}

This conjunction of typed interfaces, explicit slot checks, and evidence-based claim restriction guarantees verifiable interaction, eliminating silent parameter defaulting and restricting claims to those justified by the working context or known safe defaults.

7. Scalability, Modularity, and Domain Extensibility

By isolating core modules with typed input–output signatures, the architecture allows each module to be independently implemented (e.g., via LLM-based classifier, prompt-template, or domain-specific LM), so long as type contracts are honored. A domain-customization layer separates schemas, tools, and prompt templates, enabling new domains or capabilities without disrupting existing logic. No global optimizer is assumed; module-level independence enhances robustness and enables targeted improvements.

The architecture has demonstrated high agreement with curated references in domain-specific dietary tasks, alongside practical latency profiles, supporting factored reasoning as a tractable path to scalable, auditable, and evidence-grounded robot assistance over multi-turn horizons (Belcamino et al., 31 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Factored Controller with Typed Interfaces.