Open-Book Paradox: AI & Topology
- Open-Book Paradox is a phenomenon where systems fail to internalize external references, leading to marked performance deficits.
- In NLP, pre-trained models show significant accuracy gains only when supplied with explicit contextual retrieval despite poor closed-book learning.
- In topology, counterexamples where doubles lack open book decompositions challenge established equivalences in manifold structure.
The term "Open-Book Paradox" identifies surprising failures—manifesting in both artificial intelligence and differential topology—where systems that are explicitly granted access to external reference material or structures nevertheless do not internalize or correctly leverage this information, with significant consequences in both question answering and manifold decomposition. This article presents the detailed definitions and manifestations of the open-book paradox in large pre-trained LLMs (PTLMs), open-book question answering (QA), and in high-dimensional topology, specifically regarding open book decompositions of manifolds.
1. The Open-Book Paradox in LLM QA
The open-book paradox in natural language processing, as established by Ciosici et al. (Ciosici et al., 2021), is observed in the context of pre-trained LLMs being evaluated on domain-specific question answering benchmarks such as LEFT. Despite continued pre-training ("studying the book"), PTLMs like T5-3B and GPT-Neo 2.7B fail to internalize and utilize new factual information from small, domain-specific texts in closed-book settings, resulting in test performance that remains near that of random guessing (~50% accuracy). In contrast, when allowed to retrieve external paragraphs ("open-book"), these same models display substantial performance gains (up to ~74% with gold retrieval), indicating that their ability to answer is almost entirely contingent on explicit access to relevant context rather than genuine knowledge acquisition.
A related but distinct paradox is identified in open-book QA with LLMs (Vankov et al., 30 Apr 2025). Even when a model is presented with external documents at test time, it may generate answers that are not grounded in the provided context, defaulting instead to its parametric (internal, pre-trained) knowledge. This undermines the purpose of open-book QA, which is to ensure answers are grounded in the latest, most relevant sources.
2. Formal Characterization and Empirical Findings
The formalization of the open-book paradox proceeds via controlled experimental setups. In the LEFT benchmark (Ciosici et al., 2021), models are evaluated in three configurations:
- Prior-Knowledge: Out-of-the-box PTLMs, with no exposure to the domain material.
- Closed-Book after Reading: PTLMs are lightly continued-pretrained on domain text but take the exam without access to the material.
- Open-Book Retrieval: At test time, the model is supplied with a single retrieved paragraph (automatic or gold-standard).
Empirical results are summarized as follows:
| Model & Setting | AG Accuracy (%) | USH Accuracy (%) |
|---|---|---|
| Prior-Knowledge (T5-3B) | 49.5 | 50.0 |
| Closed-Book (T5-3B + read) | 52.3 | 50.0 |
| Open-Book (T5-3B + auto IR) | 61.2 | 59.9 |
| Open-Book (T5-3B + gold IR) | 74.3 | 68.7 |
Performance improves markedly in the open-book setup, even with imperfect retrieval (sBERT-based), but shows negligible gains from "reading" the textbook in closed-book fashion. This demonstrates that prevailing PTLM architectures and objectives are inefficient at internalizing new domain information outside their initial pre-training, and instead rely on surface-level retrieval at inference (Ciosici et al., 2021).
In open-book QA research (Vankov et al., 30 Apr 2025), the ConSens (Contrastive Perplexity Sensitivity) metric quantifies the degree to which a model's answer is grounded in the supplied context. Experiments show that ungrounded answers often result when the model ignores the context in favor of parametric memory. ConSens distinguishes between context-grounded and ungrounded answers with high accuracy (AUC up to 0.93), empirically confirming the pervasiveness of the paradox.
3. The Open-Book Paradox in High-Dimensional Topology
An independent incarnation of the open-book paradox appears in the study of manifold decompositions (Kotschick, 28 Oct 2025). Classical topology conjectured that being a "double"—a manifold obtained by gluing two copies of a compact manifold along their boundary—was equivalent to admitting an open book decomposition, especially in high dimensions (Ranicki, 1998). However, Kotschick shows that, in every even dimension , there exist closed, oriented -manifolds which are doubles but admit no open book decomposition. Explicitly, products such as (where are closed surfaces of genus and has nonzero signature) constitute such counterexamples.
This paradox is rooted in the phenomenon of non-multiplicativity of the signature in surface bundles: there exist bundles where due to monodromy-induced non-triviality; the existence of such bundles over a manifold obstructs any open book decomposition on —despite 's construction as a double. Notably, all surface products with simplicial volume fail to admit open books (Kotschick, 28 Oct 2025).
4. Underlying Causes and Diagnostics
In PTLM QA, the open-book paradox arises because current pre-training objectives (e.g., causal language modeling, masked language modeling) do not encourage genuine long-term retention or internalization of new domain-specific knowledge from limited texts. Instead, models favor shallow text memorization and context-free pattern recognition. The gap between retrieval-augmented (open-book) and closed-book performance highlights the lack of persistent, structured knowledge integration mechanisms.
For open-book QA with context, the prevalence of parametric fallback—reliance on knowledge “memorized” during pre-training—demonstrates the inadequacy of context-following and grounding objectives. ConSens precisely quantifies to what extent the supplied context actually controls the model’s output, revealing that many answers are not causally linked to the provided contexts (Vankov et al., 30 Apr 2025).
In topology, the obstruction is algebraic: the capacity to construct surface bundles with non-multiplicative signature (as per Atiyah-Kodaira-Hirzebruch) directly precludes open book decompositions by violating signature constraints fundamental to the decomposition structure.
5. Methodologies and Metrics
NLP Domain
- Retrieval Mechanisms: sBERT fine-tuned encoders are used to score question-paragraph pairs by cosine similarity, ranking and retrieving the most relevant paragraph for context augmentation (Ciosici et al., 2021).
- Contrastive Perplexity Sensitivity (ConSens): For a sequence of answer tokens , compute average perplexity with and without the context (, ), then use the log-ratio
and a shifted sigmoid to bound ConSens between –1 and 1. High ConSens indicates strong context grounding, while ConSens 0 or negative suggests the context is underutilized or ignored (Vankov et al., 30 Apr 2025).
Topology Domain
- Signature Obstructions: The existence of surface bundles over with acts as an obstruction to open book decompositions.
- Simplicial Volume: In dimension 4, nonzero Gromov simplicial volume serves as a diagnostic for the absence of an open book decomposition (Kotschick, 28 Oct 2025).
6. Implications and Future Directions
Breaking the open-book paradox in NLP demands:
- Development of reading-comprehension pre-training objectives that induce true knowledge acquisition from new texts (Ciosici et al., 2021).
- Exploration of hybrid neuro-symbolic, memory-augmented, or knowledge-graph-based models to enable persistent, structural domain fact retention and querying.
- Enhanced retrieval models and reranking to bridge the gap between automatic and gold-standard paragraph selection, aiming to close the performance gap in open-book QA and foster true context following.
- Considerations of statement categorization (factual, causal, definitional) to isolate specific learning failures.
The ConSens metric offers a differentiable, computationally efficient measure of grounding that can be used both for automatic QA evaluation suites and as a reward function in reinforcement learning to optimize context usage (Vankov et al., 30 Apr 2025).
In topology, the identification of doubles without open book decompositions overturns prior equivalence assertions and prompts further investigation into the algebraic surgery invariants (e.g., asymmetric signature in ) and their precise correspondence to open book admitance. The question of whether nonzero simplicial volume is fully equivalent to the absence of open book decompositions in dimension 4 remains open (Kotschick, 28 Oct 2025).
7. Summary Table: Open-Book Paradox Manifestations
| Domain | Paradox Manifestation | Diagnostic/Metric |
|---|---|---|
| PTLM QA (NLP) | Models fail to learn from new texts; succeed with open-book look-up | LEFT benchmark; accuracy gaps |
| Open-Book QA | Models ignore context, answer from parametric memory | ConSens metric (perplexity-based) |
| Topology | Existence of doubles lacking open book decompositions | Signature non-multiplicativity, |
The open-book paradox therefore signals a fundamental limitation in both artificial and mathematical systems: explicit provision of external references or decompositions does not guarantee their proper utilization or internalization, revealing the need for new theoretical and architectural advances to bridge the gap between access and genuine comprehension or structure.