AI Must Embrace Specialization via Superhuman Adaptable Intelligence

Published 27 Feb 2026 in cs.AI | (2602.23643v1)

Abstract: Everyone from AI executives and researchers to doomsayers, politicians, and activists is talking about AGI. Yet, they often don't seem to agree on its exact definition. One common definition of AGI is an AI that can do everything a human can do, but are humans truly general? In this paper, we address what's wrong with our conception of AGI, and why, even in its most coherent formulation, it is a flawed concept to describe the future of AI. We explore whether the most widely accepted definitions are plausible, useful, and truly general. We argue that AI must embrace specialization, rather than strive for generality, and in its specialization strive for superhuman performance, and introduce Superhuman Adaptable Intelligence (SAI). SAI is defined as intelligence that can learn to exceed humans at anything important that we can do, and that can fill in the skill gaps where humans are incapable. We then lay out how SAI can help hone a discussion around AI that was blurred by an overloaded definition of AGI, and extrapolate the implications of using it as a guide for the future.

Abstract PDF Upgrade to Chat

Summary

The paper establishes that human-centric AGI benchmarks are flawed and advocates specialization with self-supervised learning as the path to Superhuman Adaptable Intelligence.
It introduces a semantic map categorizing AI definitions along adaptability and task scope, emphasizing the superior performance of specialized systems.
Empirical evidence from modular architectures and negative transfer studies supports that rapid adaptation and domain-specific priors drive practical AI advancements.

Reframing the AI North Star: Specialization and Superhuman Adaptable Intelligence

Critique of Human-Centric Generality

The paper interrogates prevailing conceptions of AGI, pinpointing the anthropocentric bias underlying most definitions. It contends that the purported generality of human intelligence is illusory; humans adapt efficiently only within a narrow evolutionary spectrum of survival-critical tasks. Referencing Moravec's Paradox, the authors expose the cognitive distortion inherent in benchmarking “general intelligence” by human performance. They argue this results in two common errors: circularly defining generality in human terms, and treating our adaptive skills as if they generalize to all conceivable tasks. The paper systematically debunks both by highlighting the finite range and specialized nature of human abilities.

Competing arguments (e.g. Musk, Hassabis) that equate generality with Turing-completeness and universal computability are rejected on operational grounds. Even if brains are Turing-complete, real-world constraints (finite time, memory, data) render human performance a minuscule projection in the space of possible tasks. This terminological confusion is mapped to broader fragmentation in AI discourse.

Definitional Survey and Analytical Framework

Analyzing major AGI definitions, the paper introduces a two-dimensional semantic map for organizing “North Star” measures of artificial intelligence: the vertical axis discriminates performance-based intelligence versus learning/adaptation, while the horizontal axis separates task scope (universal vs. human-centric/economic domains). Definitions cluster as Adaptive Generalists, Cognitive Mirrors, and Economic Engines, with Superhuman Adaptable Intelligence (SAI) defined by its capacity to learn, specialize, and exceed human performance on important tasks (Figure 1).

Figure 1: Semantic map of AGI definitions, showing axes of source (performance vs. adaptability) and scope (universal, human-centric, economic), and clustering of Adaptive Generalists, Cognitive Mirrors, Economic Engines, and SAI.

The survey exposes contradictions, lack of feasibility, and poor assessability in extant AGI definitions. Human-centric and economic-value definitions are not truly general; universal intelligence is computationally intractable; “match human versatility” is inconsistent given the specialized adaptation of humans. Operational metrics grounded in speed of adaptation and breadth of task acquisition are shown to be superior to performance-only or checklist-based evaluation.

Theoretical and Empirical Case for Specialization

Specialization is proposed as both an evolutionary inevitability and a practical requirement in AI. Biological and economic analogs (Forister, Futuyma, Hannan, Loasby) are invoked to illustrate genetic/environmental tradeoffs and selective pressure toward narrowly optimized strategies. “No Free Lunch” theorems contextualize this: maximizing performance invariably requires strong, domain-specific priors—generalist algorithms, given finite compute, will be outperformed by specialists.

Negative transfer in multi-task learning, modular specialization in contemporary architectures (e.g., Mixture of Experts), and domain-driven breakthroughs (AlphaFold) furnish empirical evidence for the superiority of specialization. The theoretical impossibility of tractable planning across arbitrary environments further limits the prospect of genuine universal generality.

The overlap and divergence between human and AI task domains is visualized, demonstrating that AI can both supersede human skill and expand into regions of utility inaccessible to humans (Figure 2).

Figure 2: Task space overlap between human domain and AI domain, within the universal task space.

Proposal: Superhuman Adaptable Intelligence (SAI)

SAI is formally defined as an agent capable of adapting to exceed human performance on any task within or outside the human domain, provided the task has utility. The metric for SAI is the speed of skill acquisition and task adaptation. This shift reframes the target: away from anthropocentrism or static performance benchmarks, toward operational adaptability as measured by learning velocity over relevant task spectra.

The SAI paradigm embraces self-supervised learning (SSL) as a principal route, arguing that SSL is more generalizable and efficient than supervised learning even when labeled data is abundant. SSL underpins recent advances in LLMs and SOTA vision models. Further, world models and latent-space prediction architectures (Dreamer 4, Genie 2, JEPA) are highlighted as substrates for fast adaptation—enabling compact representations, simulation, and planning in complex environments.

SAI eschews architectural monoculture (autogressive LLMs/LMMs), warning that convergence on a single next-token prediction paradigm impedes progress and diversity (Figure 3). The compounding errors in autoregressive modeling are depicted to reinforce the necessity for architectural pluralism.

Figure 3: Exponential divergence of autoregressive model prediction errors.

Implications and Future Directions

SAI offers a practical and theoretically tenable North Star for AI research. It decouples advancement from imitation of human skills, advocating a broader, utility-driven domain that encompasses both economic/cognitive/human-centric and non-human tasks. Diverse specialization architectures and modular world models are envisioned as the principal route toward rapid adaptation and skill transfer. The authors assert that measuring adaptation speed and efficiency over relevant task spectra will drive progress, avoiding the pitfalls of anthropocentric or checklist-based evaluation metrics.

On a practical level, SAI encourages research into SSL, modular composition, dynamic task routing, world modeling, and latent space learning. It predicts a proliferation of specialized AI agents, individually optimized for distinct domains, and a gradual expansion of AI into areas where humans are systematically handicapped by evolutionary or cognitive biases.

The paper speculates that SAI will catalyze new benchmarks and research agendas—emphasizing adaptation, speed, and open-ended skill acquisition. This may support the development of AI systems that integrate specialist modules through engineered coordination, rather than assuming universal competence within a single monolithic model.

Conclusion

The paper systematically dismantles the myth of human generality, exposes the inadequacies of conventional AGI definitions, and advocates a rigorous, operational alternative: Superhuman Adaptable Intelligence. The theoretical arguments, empirical evidence, and architectural prescriptions converge to establish SAI as both a feasible and actionable guide for the future of AI research. The implications extend beyond theory: they prescribe a pragmatic research agenda centered on adaptability, specialization, and architectural diversity, with adaptation speed as the central metric. This paradigm is positioned as the most coherent and productive North Star for advancing artificial intelligence beyond the confines of human-centric generality and into domains of superhuman utility.

Markdown Report Issue

Paper to Video (Beta)

All Videos Subscribe on YouTube

Whiteboard

AI Must Embrace Specialization via Superhuman Adaptable Intelligence

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

Overview

This paper argues that we should stop chasing “AGI” (Artificial General Intelligence) as “a machine that can do everything a human can.” The authors say humans aren’t actually “general,” and aiming for that idea confuses people and slows real progress. Instead, they propose a clearer goal called Superhuman Adaptable Intelligence (SAI): AI that can quickly learn to beat humans at any important task we can do—and also learn to assign value to and perform useful tasks humans can’t do at all.

Key Questions the Paper Asks

Are humans really a good example of “general” intelligence?
Do current definitions of AGI make sense, help us plan research, and let us measure progress?
If “being general at everything” isn’t realistic or even necessary, what should our new goal for AI be?
How could we build and evaluate that kind of AI in practice?

How the Authors Approach the Problem

The paper is a position piece (an argument), not an experiment. Here’s what they do:

They examine popular definitions of AGI and organize them on two simple axes:
- Learn vs. Do: Does the system learn new tasks quickly, or is it judged only on what it can do right away?
- Scope: Is the system meant for literally anything, anything important, anything humans can do, or just economically valuable tasks?
They set three commonsense rules for a good definition of progress: 1) Feasible: Is it possible with limited time, data, and computing? 2) Consistent: Does “general” actually mean general, not just “human-like”? 3) Measurable: Can we track improvement with clear numbers?
They bring in ideas from biology and computer science to argue that specialization works best when resources are limited:
- No Free Lunch: There’s no single best method for every problem—like expecting one pair of shoes to be perfect for running, hiking, and ballet.
- Specialization beats spread-thin generality: Like pro athletes who train for one sport, specialized systems perform better in their domain.
- Negative transfer: Learning many unrelated things together can hurt performance on each.
They propose a new “North Star” called SAI and a simple way to measure it: how fast an AI can learn new, important tasks.
They suggest practical paths toward SAI, such as:
- Self-supervised learning (SSL): Learning patterns from raw data without human labels, like a student learning grammar by reading tons of books and spotting what “sounds right.”
- World models: Internal “mini-simulators” of how the world works, like a mental physics engine that helps plan ahead and adapt quickly.
- Modular, specialized systems: Composing many specialists rather than forcing one model to do everything.

Main Findings and Why They Matter

Humans aren’t truly “general”: We’re amazingly adaptable within a narrow slice of tasks that mattered for our survival (walking, seeing, planning), but we’re weak or incapable in many others. For instance, computers easily beat the best chess players; bats use echolocation, and humans can’t.
The term “AGI” is overloaded and inconsistent: Different groups mean different things, from “human-like at most tasks” to “better than humans at economically valuable jobs.” This muddles the debate and makes it hard to measure progress.
Most AGI definitions fail one of the authors’ three rules:
- Not feasible (tries to cover literally everything with limited resources),
- Not consistent (calls human-like performance “general” even though it isn’t),
- Not measurable (depends on ever-growing lists of tasks).
Specialization wins in practice: In nature, markets, and AI, focusing energy on specific goals works better than trying to be okay at everything. Even “big” AI models often work best by routing different inputs to different expert parts.
A better goal: Superhuman Adaptable Intelligence (SAI)
- Definition: AI that can adapt to surpass humans on any task we can do that matters, and also adapt to useful tasks outside human ability.
- Metric: Speed of adaptation—how fast it learns new, important tasks.
Likely ingredients for SAI:
- Self-supervised learning to build broad, reusable knowledge without needing lots of labeled examples.
- World models to support planning, zero- or few-shot learning, and quick transfer to new tasks.
- Diversity and modularity, not one “model to rule them all.”

Why it matters: Shifting from “be like humans at everything” to “learn fast and specialize where it counts” gives AI research a clearer, testable target. It helps avoid hype and fear, encourages practical progress, and leads to systems that are both more capable and more reliable where it matters.

Implications and Potential Impact

Clearer goals and evaluations: Focusing on adaptation speed makes progress measurable. We can ask, “How quickly did the system learn this new important task?” rather than chasing endless checklists.
Better real-world performance: Specialized, modular AIs can reach superhuman performance in high-impact areas like science, medicine, engineering, and logistics—sometimes far beyond human limits.
Smarter research strategy: Encourages a variety of architectures and methods (not just one dominant approach), speeding innovation and reducing “everyone doing the same thing.”
Safer and more useful AI: Building specialists for specific tasks can make systems more dependable and easier to control than one generalist meant to do everything.

In short: Don’t build one AI to fold both proteins and laundry. Build fast-learning specialists that can adapt to important tasks, outperform humans where needed, and take on useful jobs we can’t do at all. That’s Superhuman Adaptable Intelligence.

View Paper Prompt View All Prompts

Knowledge Gaps

Knowledge Gaps, Limitations, and Open Questions

Below is a single, focused list of what remains missing, uncertain, or unexplored in the paper, formulated to guide actionable future research.

Precise formalization of SAI: a mathematical definition of “adaptation speed” including units, normalization across tasks, and explicit resource budgets (compute, time, data, memory) and priors.
Operational definition of “utility” and “importance”: criteria, measurement frameworks (e.g., economic value, societal impact, safety-criticality), and processes for task selection that are transparent, robust, and non-anthropocentric.
Evaluation of “exceeding humans”: concrete, consistent metrics for cognitive and physical tasks; protocols for tasks without human baselines; standards for tasks outside the human domain.
SAI benchmark suite: standardized, continually refreshed task distributions; reproducible zero-/few-shot protocols; fixed and reported budgets for data, compute, and wall-clock time; methods to generate genuinely novel tasks.
Empirical validation of SAI pathways: comparative studies demonstrating that SSL, latent prediction, and world models reduce adaptation time versus SL, RL, and autoregressive baselines across diverse domains.
Architecture for modular specialization: concrete designs for routing, meta-control, inter-module communication, shared latent spaces, and composition strategies that preserve specialist gains while enabling coordination.
Managing negative transfer and interference: training strategies, curricula, and theoretical bounds to mitigate cross-task gradient conflicts, catastrophic forgetting, and representational competition.
World model requirements: target properties (causality, counterfactual consistency, uncertainty calibration), learning objectives, planning interfaces, fidelity metrics, and rigorous sim-to-real transfer evaluations.
Latent- vs token-level prediction: task-conditional criteria for when latent prediction yields faster adaptation; ablations and diagnostics quantifying benefits and failure modes relative to token-level models.
Long-horizon error compounding in autoregression: formal quantification of divergence, conditions under which it arises, and tested mitigations (e.g., plans-in-latent, hierarchical controllers).
Scaling laws for adaptation: empirical and theoretical relationships between model size, data diversity, compute, and adaptation speed; identification of diminishing returns and optimal scaling regimes.
Resource-aware SAI: frameworks to trade off adaptation speed, accuracy, energy, and monetary cost; dynamic scheduling of specialists under real constraints.
Safety, alignment, and governance for specialists: risks from composing specialized modules; meta-controller oversight; misuse prevention; evaluation protocols and fail-safe mechanisms tailored to specialization.
Robustness and reliability: adaptation under distribution shift, partial observability, adversarial conditions; certification methods and guarantees for rapidly adapted capabilities.
Discovering “outside human domain” tasks: systematic approaches to identify high-utility non-human tasks; ground-truth construction and outcome-based evaluation where human baselines do not exist.
Embodiment and physical tasks: requirements for data, simulation fidelity, tactile sensing, morphology, and control; sim-to-real adaptation speed metrics for robotics and dexterous manipulation.
Cross-modal adaptation: mechanisms for transferring competence across language, vision, audio, action, and symbol manipulation; shared representations and alignment objectives.
Interpretability of specialists and world models: tools to inspect latent states, plans, and policies; methods to detect and explain failure modes during rapid adaptation.
Human–AI co-adaptation: interfaces and protocols for teaching specialists; measuring how human guidance affects adaptation speed and reliability.
Consistency of SAI scope with infeasibility arguments: reconciliation of “exceed humans at any task humans can do” with finite resources; principled scope-limiting strategies (task importance filters, budget-aware selection).
Task-space ontology: formalization and measurement of the human vs AI vs universal task overlaps depicted; methods to quantify coverage, gaps, and progress toward utility-relevant regions.
Data acquisition for fast adaptation: active learning policies, synthetic data generation, task simulators, and data quality metrics tuned to adaptation speed rather than static performance.
Normalizing adaptation metrics across heterogeneous tasks: standards to compare speeds across tasks with different priors, difficulty, observability, and data availability; meta-metrics for fairness and comparability.
Concrete roadmap and milestones: staged targets (e.g., adapt to N novel tasks per fixed budget and time), public leaderboards, and open benchmarks to catalyze progress.
Comparative analysis with AGI definitions: empirical tests demonstrating practical advantages of SAI over AGI-style goals in feasibility, consistency, and assessability; decision-making implications for research and policy.

View Paper Prompt View All Prompts

Practical Applications

Immediate Applications

The following applications can be implemented with current methods and infrastructure by emphasizing specialization, self-supervised learning, world models in narrow domains, and measurable adaptation speed.

Specialized AI portfolios in industry
- Sectors: software, healthcare, finance, manufacturing, logistics
- Application: Replace “one general model” strategies with portfolios of domain-specific models routed by task; deploy expert models for coding assistants, radiology reads, fraud detection, demand forecasting, and quality control.
- Tools/workflows: Mixture-of-Experts routing, model registries, service-level specialization, task taxonomies, capability scoping; fine-tuning and distillation pipelines for narrow tasks; retrieval-augmented inference.
- Assumptions/dependencies: Reliable task segmentation, robust routing accuracy, observability, drift monitoring, and governance to prevent negative transfer.
Adaptation-speed benchmarking as a KPI
- Sectors: academia, software platforms, model evaluation labs
- Application: Evaluate models by time/data/compute required to reach target performance on new tasks; add “time-to-competence” benchmarks to leaderboards and procurement criteria.
- Tools/workflows: Evaluation harnesses, standardized datasets with staged task novelties, A/B testing of training schedules, compute-normalized reporting; runbooks for reproducibility.
- Assumptions/dependencies: Agreement on utility-relevant task sets, standardized metric definitions, reproducible training environments.
SSL-first data pipelines in enterprise AI
- Sectors: healthcare imaging, industrial IoT, retail, telecom, security
- Application: Pretrain models on large unlabeled corpora (text, images, sensor streams) to reduce label dependence and accelerate fine-tuning for specialized tasks.
- Tools/workflows: SimCLR, BYOL, masked autoencoders, JEPA-style implementations; data lake connectors; privacy-preserving pretraining (DP, federated learning).
- Assumptions/dependencies: Access to compliant unlabeled data, data quality controls, privacy/regulatory adherence.
World-model-based planning in constrained environments
- Sectors: robotics (warehouses, micro-fulfillment), gaming/simulation, autonomous inspection
- Application: Use latent world models for short-horizon planning in well-specified domains (navigation, pick-and-place, path optimization) to improve sample efficiency and stability.
- Tools/workflows: Dreamer 4, Genie 2, model predictive control, safety cages; synthetic scenario generation.
- Assumptions/dependencies: Accurate simulators for target environments, robust sim-to-real transfer, human oversight for safety.
Modular orchestration and routing in MLOps
- Sectors: software, cloud providers, platform teams
- Application: Introduce an inference router that dispatches tasks to specialized models; track cost/performance per specialist; enforce capability boundaries.
- Tools/workflows: API gateways, function calling and skill routing, service meshes, monitoring dashboards, cost-aware planners.
- Assumptions/dependencies: Latency/throughput budgets, reliable fallback paths, audit trails.
Domain-specific assistants with scoped capabilities
- Sectors: healthcare (radiology triage), law (research), finance (quant analytics), security (threat intel)
- Application: Deploy assistants specialized to the modalities, data structures, and objectives of a given profession; evaluate by adaptation speed to new sub-tasks (new guidelines, instruments, asset classes).
- Tools/workflows: Fine-tuned LLM/LMM stacks, retrieval with domain ontologies, constrained tool-use, checklists; continual learning pipelines.
- Assumptions/dependencies: Domain data access, regulatory compliance (HIPAA, SOX, GDPR), strong human-in-the-loop practices.
Specialist-first procurement and reporting (policy)
- Sectors: government, public institutions, regulated industries
- Application: Require adaptation-speed metrics, scope declarations, and negative-transfer risk assessments in RFPs and vendor evaluations.
- Tools/workflows: Standardized audit templates, capability-scoping documentation, incident reporting for misrouting.
- Assumptions/dependencies: Consensus on reporting standards; integration with existing procurement frameworks.
Education modules on specialization and meta-learning
- Sectors: higher ed, professional training, bootcamps
- Application: Teach specialization strategy, task decomposition, and adaptation-speed measurement; incorporate rapid skill acquisition labs.
- Tools/workflows: Adaptive tutoring systems, benchmarked mini-projects, curriculum on SSL/world models.
- Assumptions/dependencies: Instructor upskilling, assessment alignment, access to sandbox data.
Capability scoping for AI risk management
- Sectors: all high-stakes deployments
- Application: Limit overreach by generalists; use gated deployment workflows to ensure specialists operate within validated domains.
- Tools/workflows: Capability maps, red-teaming focused on misrouting/negative transfer, policy gates.
- Assumptions/dependencies: Governance buy-in, clear utility definitions, incident response processes.

Long-Term Applications

These applications require further research, scaling, or integration—particularly mature world models, meta-learning, robust safety/verification, and standards for adaptation-speed evaluation.

Superhuman Adaptable Intelligence (SAI) agents across utility domains
- Sectors: scientific R&D, engineering design, legal analysis, complex operations
- Application: Agents that rapidly acquire and exceed human capabilities on important tasks and fill human skill gaps (e.g., high-dimensional optimization, complex mechanistic modeling).
- Tools/workflows: Scalable SSL, powerful latent world models, meta-learning for rapid adaptation, simulation/self-play ecosystems.
- Assumptions/dependencies: Massive compute/data, benchmark suites for adaptation speed, strong safety and governance frameworks.
Automated scientific discovery pipelines
- Sectors: drug discovery, materials science, synthetic biology
- Application: World-model-driven hypothesis generation and experiment planning; closed-loop labs with robotic execution and rapid model adaptation to new assays.
- Tools/workflows: Lab automation, multi-scale simulators, active learning, experiment schedulers.
- Assumptions/dependencies: High-fidelity simulators, standardized lab interfaces, trustworthy causal inference.
Few-shot adaptive household/service robotics
- Sectors: consumer robotics, eldercare, hospitality
- Application: Robots that learn new tasks in minutes (new appliances, tools, surfaces) via latent world models and compositional skill libraries.
- Tools/workflows: Skill primitives, safe teleoperation for teaching, on-device adaptation, formal safety envelopes.
- Assumptions/dependencies: Robust hardware, reliable perception, safety certification, user training protocols.
Patient-specific adaptive medicine
- Sectors: healthcare
- Application: Multi-modal models that adapt rapidly to patient-specific trajectories for diagnosis, treatment planning, and monitoring.
- Tools/workflows: EHR integration, imaging/genomics fusion, continual learning under privacy constraints.
- Assumptions/dependencies: Secure data access, FDA/EMA pathways, bias and fairness controls, real-world evidence.
Adaptive grid and infrastructure optimization
- Sectors: energy, transportation, urban planning
- Application: World-models of grids and networks that adapt to shocks (weather, demand spikes) and reconfigure control strategies in few-shot settings.
- Tools/workflows: Digital twins, online optimization, safety-constrained planners.
- Assumptions/dependencies: Comprehensive sensor coverage, regulatory approval, cyber-resilience.
Real-time systemic risk and market adaptation
- Sectors: finance
- Application: Specialized agents that adapt to regime shifts, liquidity squeezes, and novel instruments; stress-testing via multi-agent simulations.
- Tools/workflows: Market world models, scenario generators, adaptive hedging.
- Assumptions/dependencies: Timely, high-quality data; compliance (MiFID, SEC), robust model governance.
SAI-powered education: personal adaptive tutors
- Sectors: education
- Application: Tutors optimizing a learner’s “time-to-competence,” adapting curricula and modalities to individual cognitive profiles.
- Tools/workflows: Meta-learning on pedagogical strategies, learning analytics, content generation aligned to utility.
- Assumptions/dependencies: Privacy-preserving data pipelines, pedagogy validation, equity safeguards.
Standards for adaptation-speed metrics and reporting (policy)
- Sectors: standards bodies, regulators, industry consortia
- Application: ISO-like standards for adaptation-speed measurement, scope declarations, negative-transfer auditing; certification for specialist systems.
- Tools/workflows: Benchmark catalogs, audit protocols, common data/compute normalization.
- Assumptions/dependencies: Multi-stakeholder consensus, international coordination, enforcement mechanisms.
Platforms for specialization orchestration
- Sectors: cloud/software
- Application: End-to-end platforms to compose, route, monitor, and govern specialist models; marketplaces for domain experts and task packs.
- Tools/workflows: Orchestration runtimes, policy engines, cost/performance optimizers, model provenance tracking.
- Assumptions/dependencies: Interoperability standards, vendor-neutral APIs, security and billing infrastructure.
Formal safety and verification for adaptable systems
- Sectors: safety-critical AI
- Application: Methods to verify behavior as models adapt, including guarantees on capability boundaries and mitigation of compounding errors in long-horizon plans.
- Tools/workflows: Formal methods integrated with learning, interpretable latent dynamics, runtime monitors.
- Assumptions/dependencies: New theory and tooling, accepted certification processes, testbeds.
Workforce co-adaptation and job design
- Sectors: HR, operations, professional services
- Application: Redesign roles to leverage specialist AI; train workers to compose specialists; measure team-level adaptation speed.
- Tools/workflows: Skill-routing frameworks, competency maps, continuous training ecosystems.
- Assumptions/dependencies: Change management, labor agreements, reskilling programs.

View Paper Prompt View All Prompts

Glossary

Adaptation speed: The rate at which an AI acquires new skills or learns new tasks. Example: "Finally, adaptation speed---the speed with which an agent can acquire new skills and learn new tasks, can be measured"
AGI (Artificial General Intelligence): A proposed form of AI aiming for broad, human-like generality across tasks. Example: "Central to all of these views is the concept of Artificial General Intelligence or AGI."
anthropocentric: Human-centered; focusing evaluation or goals around human tasks and constraints. Example: "limiting evaluation to anthropocentric tasks and constraints."
autoregressive models: Sequence models that predict the next token given previous tokens. Example: "GPTs and similar autoregressive models are no exception, they have many flaws"
bitter lesson: The idea that general methods leveraging computation tend to outperform approaches relying on hand-crafted domain knowledge. Example: "this claim does not dispute the bitter lesson"
Coffee Test: A proposed embodied benchmark for AI: making a cup of coffee in an unfamiliar kitchen. Example: "Steve Wozniak's Coffee Test---whether a machine could make a cup of coffee if sent to a random kitchen---"
Dreamer 4: A latent world-model-based reinforcement learning architecture for prediction and control. Example: "such as Dreamer 4, Genie 2, or Joint Embedding Prediction Architecture (JEPA)"
embedding space: A continuous vector space representing data in compact form, used for learning and prediction. Example: "learning in the embedding space as opposed to in the token space may drive performance gains"
evolutionary mismatch hypothesis: The view that traits adapted for past environments can be maladaptive in modern contexts. Example: "the evolutionary mismatch hypothesis argues that many psychological mechanisms were tuned for past selection regimes and can therefore produce maladaptive outputs in contemporary environments"
few-shot adaptation: Adapting to a new task using only a small number of examples. Example: "it is the hallmark of zero shot and few shot adaptation"
Genie 2: A latent world-model/prediction architecture for interactive environments. Example: "such as Dreamer 4, Genie 2, or Joint Embedding Prediction Architecture (JEPA)"
JEPA (Joint Embedding Prediction Architecture): A framework that learns to predict future or missing information by aligning embeddings rather than reconstructing tokens. Example: "such as Dreamer 4, Genie 2, or Joint Embedding Prediction Architecture (JEPA)"
latent prediction architectures: Models that predict in compact latent spaces instead of raw pixels/tokens. Example: "moving from token level prediction to latent prediction architectures such as Dreamer 4, Genie 2, or Joint Embedding Prediction Architecture (JEPA)"
LLMs: Large neural models trained on text to perform language tasks. Example: "Autoregressive LLMs and LMMs have become the dominant architecture"
LMMs (Large Multimodal Models): Models that handle multiple modalities (e.g., text, images) jointly. Example: "Autoregressive LLMs and LMMs have become the dominant architecture"
meta learning: Methods that learn to learn, enabling fast adaptation to new tasks. Example: "Designing maximally adaptable algorithms remains a central pursuit of meta learning"
Moravec's Paradox: The observation that tasks easy for humans (e.g., perception, locomotion) are hard for machines, and vice versa. Example: "This observation has given rise to Moravec's Paradox,"
negative transfer: When training on additional tasks or data harms performance on a target task. Example: "it can lead to 'negative transfer' when tasks compete for representational capacity"
No Free Lunch theorem: The result that no single algorithm is best across all possible problems/distributions. Example: "fall prey to the 'No Free Lunch' theorem"
probabilistic planning: Planning under uncertainty in stochastic environments. Example: "and probabilistic planning inherits similarly severe complexity barriers"
propositional STRIPS: A classical planning formalism defining actions via preconditions and effects in propositional logic. Example: "(e.g., propositional STRIPS variants)"
qualia: Subjective, first-person experiential qualities associated with consciousness. Example: "if it lacks subjective experience (qualia)"
routing (to specialized parameters): Mechanisms that direct inputs to specific subsets of model parameters (modules/experts). Example: "Models that route queries to specialized subsets of model parameters depending on the task"
SAI (Superhuman Adaptable Intelligence): The paper’s proposed goal: AI that adapts to exceed humans on important tasks, including beyond the human domain. Example: "We refer to this as Superhuman Adaptable Intelligence (SAI)."
self-play: Training by having agents play against themselves to improve performance. Example: "improve through self-play, evolutionary search, or large-scale exploration in simulation"
self-supervised learning (SSL): Learning from unlabeled data by predicting parts of the data from other parts. Example: "self-supervised learning (SSL) as a promising way to acquire generic knowledge"
SOTA (state of the art): The best known performance achieved to date on a task/domain. Example: "has reached SOTA performance in most domains."
state (in dynamical systems/MDPs): The sufficient information about the environment needed to predict future dynamics or make optimal decisions. Example: "Pixels are not state."
token space: The discrete symbol space (e.g., text tokens) in which autoregressive models predict. Example: "in the token space"
Turing Machine: An abstract model of computation used to define computability and complexity; Turing-completeness implies the ability to compute any computable function given resources. Example: "general in the Turing Machine sense"
Turing Test: A conversational test of machine intelligence via indistinguishability from human responses. Example: "Language-based tests, such as the Turing Test, where a machine has to pretend to be a human"
Universal Intelligence: A formal definition measuring an agent’s ability to achieve goals across all computable environments. Example: "Universal Intelligence refers to the ability to act intelligently over all computable environments"
Winograd Schema Challenge: A benchmark for commonsense reasoning using pronoun disambiguation in paired sentences. Example: "the Winograd schema challenge that tests common-sense reasoning and natural language understanding"
world model: An internal predictive model of environment dynamics enabling simulation and planning. Example: "A world model allows for simulation, and therefore planning"
zero-shot task transfer: Applying learned knowledge to a new task without any task-specific training/examples. Example: "zero-shot task transfer"

View Paper Prompt View All Prompts

Open Problems

Identify the most performant architecture for rapid adaptation and specialization

AI Must Embrace Specialization via Superhuman Adaptable Intelligence

Summary

Reframing the AI North Star: Specialization and Superhuman Adaptable Intelligence

Critique of Human-Centric Generality

Definitional Survey and Analytical Framework

Theoretical and Empirical Case for Specialization

Proposal: Superhuman Adaptable Intelligence (SAI)

Implications and Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

Overview

Key Questions the Paper Asks

How the Authors Approach the Problem

Main Findings and Why They Matter

Implications and Potential Impact

Knowledge Gaps

Knowledge Gaps, Limitations, and Open Questions

Practical Applications

Immediate Applications

Long-Term Applications

Glossary

Open Problems

Continue Learning

Authors (4)

Collections

Tweets

AI Must Embrace Specialization via Superhuman Adaptable Intelligence

Summary

Reframing the AI North Star: Specialization and Superhuman Adaptable Intelligence

Critique of Human-Centric Generality

Definitional Survey and Analytical Framework

Theoretical and Empirical Case for Specialization

Proposal: Superhuman Adaptable Intelligence (SAI)

Implications and Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

Overview

Key Questions the Paper Asks

How the Authors Approach the Problem

Main Findings and Why They Matter

Implications and Potential Impact

Knowledge Gaps

Knowledge Gaps, Limitations, and Open Questions

Practical Applications

Immediate Applications

Long-Term Applications

Glossary

Open Problems

Continue Learning

Related Papers

Authors (4)

Collections

Tweets