HAIExplore: Human–AI Exploration Framework
- HAIExplore is a human–AI interactive framework that combines automated data summarization with user-driven exploration to facilitate mixed-initiative analysis.
- It employs algorithms like HA-graph, constrained randomization, and neural surrogates to provide real-time, efficient navigation of large-scale datasets and dynamic simulations.
- The framework is applied across diverse fields—from graph analytics and climate simulation to creative co-creation and meta-science—enhancing actionable insights and hypothesis testing.
HAIExplore refers to a class of human–AI interactive exploration frameworks that facilitate collaborative, interpretable, and efficient navigation of complex datasets, creative domains, or empirical research knowledge. These systems combine artificial intelligence techniques (for summarization, emulation, knowledge extraction, or visualization) with human-guided controls, supporting mixed-initiative discovery, hypothesis testing, creative ideation, and evidence synthesis. The HAIExplore paradigm manifests across diverse implementations, from visual analytics in climate science and graph analysis to structured co-creation with generative models and interactive meta-science platforms, each optimized for specific domains but sharing core attributes: AI-accelerated data/model summarization, human-controllable workflows, and real-time feedback for decision support.
1. Formal Core and Conceptual Foundations
The defining feature of HAIExplore systems is the integration of algorithmic automation with human-in-the-loop control, supporting exploratory workflows that combine data-driven insight with hypothesis- or intention-driven human steering. Formally, HAIExplore instantiates this paradigm via abstractions such as the Hub-based Aggregation Graph (HA-graph) for relational data (Wang et al., 2017), surrogate models for simulation-driven hypothesis testing in climate analytics (Hazarika et al., 2023), maximum-entropy constrained permutations for data exploration (Henelius et al., 2018), staged creative-support pipelines (Wen et al., 20 Dec 2025), and LLM-extracted directed knowledge graphs for research synthesis (Archiwaranguprok et al., 29 Sep 2025).
General principles include:
- Mixed-initiative selection of analytic "hubs" or queries, alternating between human and AI-driven suggestions (Wang et al., 2017).
- Seamless toggling between divergent (broad, associative) and convergent (focused, parameterized) exploration modes (Wen et al., 20 Dec 2025).
- Real-time interaction: AI surrogates or indexing schemes accelerate computation to support exploration with sub-second response times even on large datasets (Wang et al., 2017, Hazarika et al., 2023, Henelius et al., 2018).
- Summarization: AI models consolidate high-dimensional information into digestible metrics, projections, or visual aggregates, with explicit representation of uncertainty or out-of-distribution domains (Hazarika et al., 2023, Wang et al., 2017).
- Hypothesis or scenario management: users can persist, compare, and revisit custom analytic or design scenarios (Hazarika et al., 2023, Henelius et al., 2018, Wen et al., 20 Dec 2025).
2. Systems and Architectures Across Domains
Graph Exploration (VCExplorer)
The VCExplorer system operationalizes HAIExplore as an interactive graph exploration stack rooted in the HA-graph abstraction. The HA-graph is constructed by selecting a set of hub vertices (via hub-selection on a subgraph-extraction ) and aggregating information (via functions ) on induced subgraphs between hub pairs (Wang et al., 2017). Efficient Aggregation Sharing (AS) algorithms allow the system to share computations across overlapping subgraphs, supporting interactive exploration at scale (up to 40K nodes, 1.6M edges) with O(10)–O(100) hubs visualized at a time. The UI supports drill-down, roll-up, and edge summary navigation, blending AI-driven suggestions for hub selection with human guidance.
Climate Pattern Analysis (HAiVA)
HAiVA (deployed as HAIExplore) exemplifies hybrid AI-assisted exploration for physical-science simulation. The system encodes the Fluctuation–Dissipation Theorem (FDT) in a family of physics-aware, time-lagged neural surrogates approximating the linear impulse–response operator . Interactive panels allow users to define marine cloud brightening scenarios as input perturbations , instantly propagate them through the surrogate to obtain spatial–temporal responses, and inspect out-of-distribution warnings, tipping-point risks, and teleconnection patterns. The frontend integrates multi-panel controls, principal-component projections, parallel-coordinate plots, and scenario management tables, all supporting rapid, physically constrained "what-if" scenario exploration (Hazarika et al., 2023). Validation against large-scale Earth System Model runs established ≥0.9 spatial correlation for key climate responses.
Data Exploration via Permutation (HGDE)
The Human-Guided Data Exploration (HGDE) framework encodes user knowledge as combinatorial "tiles"—submatrix block constraints—on dataset . It samples maximally entropic dataset surrogates via constrained randomization given user-asserted and hypothesis tiles, then identifies projection views (e.g., maximizing difference in squared correlation) that most discriminate between competing hypotheses. The system is implemented with efficient sampling/tiling algorithms, and demonstrates sub-second interactivity up to ∼ cells (Henelius et al., 2018). Focusing and hypothesis-comparison steps are formally defined via user-selected tile sets, and empirical utility is demonstrated on socio-economic and image segmentation datasets.
Meta-Science and Knowledge Mapping (Atlas of Human-AI Interaction)
In the context of research synthesis, the Atlas of Human-AI Interaction (a deployed instance of HAIExplore) introduces a pipeline for LLM-powered extraction of causal triplets—[cause, relationship, effect, net_outcome]—from >1000 HAI papers, constructing a formal, multi-level knowledge graph with type- and cluster-annotated nodes and rich interactivity. Key algorithmic components include synonym merging using Qwen3-Embedding-8B + DBSCAN, k-means clustering for semantic classes, community detection with Louvain modularity, and computation of structure metrics (degree, betweenness, structural hole score). The web application (Svelte.js, Three.js, D3) enables coordinated exploration via 3D knowledge graphs, cause-effect Sankey diagrams, and direct paper lookup; empirical evaluation with 20 expert users validated its efficacy for research-gap discovery and evidence-based design (Archiwaranguprok et al., 29 Sep 2025).
Generative Co-Creation (HAIExplore for Creativity Support)
In creative domains, HAIExplore structures the human–AI co-creation process into separated, scaffolded stages referencing Wallas’s paradigm: brainstorming (divergent, conceptual ideation) and structured refinement (convergent, parametric manipulation). The system operationalizes idea cards, associative-thinking LLM prompts for diversity, and Python "Sketch" functions translating refinement intentions into interpretable prompt parameters for image generation. A non-linear, tabbed UI supports branching and iterative work (Wen et al., 20 Dec 2025). Empirical studies showed that HAIExplore reduced fixation, increased perceived novelty, usability, learning, and exploration compared to linear chat interfaces.
3. Algorithms and Formal Representations
| System Type | Core Abstraction/Algorithm | Key Formalism/Definition |
|---|---|---|
| Graph Exploration | HA-graph, Aggregation Sharing (AS) | , tag-based aggregation |
| Data Exploration | Constrained randomization via tiles, information-theoretic view scoring | Tiles, , argmax Δcor² |
| Simulation/Climate Surrogate | FDT-guided neural surrogates, multi-panel VA front end | , |
| Knowledge Synthesis | LLM extraction, semantic clustering, DBSCAN, k-means, Louvain | Triplets, , silhouette/ |
| Creativity Co-Creation | Structured ideation/refinement, Sketch-based parameters | Function prompt = Sketch(params) |
These representations allow for efficient implementation of mixed-initiative, real-time, exploratory workflows, leveraging AI strengths in summarization and pattern extraction while retaining user-driven sensemaking and design.
4. User Interaction Workflows
Each HAIExplore system supports explicit user manipulation of analytic or creative paths:
- In VCExplorer, the user selects graph substructures, hub criteria, and navigates summaries via drill-down or roll-up (Wang et al., 2017).
- HAiVA exposes scenario controls, input field selection, perturbation magnitude, spatial zone, and lag/duration settings, with immediate multi-panel result updates and scenario export (Hazarika et al., 2023).
- HGDE implements tile-based hypotheses and allows users to focus, compare, and visualize distinguished projections, adapting interactivity to feedback on brushed clusters or refined tiles (Henelius et al., 2018).
- Atlas of Human-AI Interaction enables filtering, search, and navigation between 3D graph, Sankey, and paper-centric views, each informed by LLM-extracted empirical findings and semantic aggregation (Archiwaranguprok et al., 29 Sep 2025).
- In co-creation, HAIExplore separates brainstorming/card management from refinement/parameter selection, providing menu-driven options, visual previews, and multi-tab workflows for exploration (Wen et al., 20 Dec 2025).
Common features include out-of-distribution warnings (e.g. principal-component projections with flagged extrapolation (Hazarika et al., 2023)), tipping-point risk flags (climate), rapid comparison tools, scenario or hypothesis saving/export, and interactive feedback for next-step guidance.
5. Empirical Validation and Performance Metrics
Multiple HAIExplore instantiations report quantitative and qualitative assessments:
- VCExplorer's AS algorithm achieves 2–5× speedup (dense graphs), 60–74% reduction in aggregation work, and maintains interactivity on graphs with up to 40K nodes for hub counts SV≤20 (Wang et al., 2017).
- HAiVA's surrogate matches ESM responses with region-wise spatial correlation >0.9; sub-second user feedback is reported for "what-if" experiments (Hazarika et al., 2023).
- HGDE enables <1s update latency for datasets with up to ∼ cells; its explorative and focusing capabilities are demonstrated on real multivariate datasets (Henelius et al., 2018).
- Atlas evaluation (N=20 experts) found high ratings for gap discovery (mean=4.95/7), causality tracing (5.45/7), and thematic analysis confirmed acceleration of sensemaking workflows; modularity identified meaningful cluster structure (Archiwaranguprok et al., 29 Sep 2025).
- The creative HAIExplore system improved enjoyment (p=0.0034), exploration (p=0.0029), and novelty (p=0.004) in within-subject comparisons to major chat-based interfaces; users explored more idea clusters and reported higher self-assessed learning (Wen et al., 20 Dec 2025).
6. Limitations and Future Directions
Identified constraints include:
- Domain-specificity in data handling (e.g., HGDE currently limited to 2D, real/categorical data (Henelius et al., 2018); climate surrogates rely on physics-process-aware preprocessing and may be ill-calibrated far outside ensemble regimes (Hazarika et al., 2023)).
- Small evaluation samples in creative support studies and coverage limitations in knowledge graph extraction (Wen et al., 20 Dec 2025, Archiwaranguprok et al., 29 Sep 2025).
- Lack of active-learning loops for focus specification or recommendation; current systems rely on manual or pre-configured exploration (Henelius et al., 2018, Wang et al., 2017).
- Synthesis platforms may miss granular or temporally resolved findings; summative artifacts may require deeper multi-document summarization and temporal/clustered filtering (Archiwaranguprok et al., 29 Sep 2025).
Potential advances include extension to more complex data types (e.g., time series, text, graphs in HGDE), automated focus recommendation (active learning), deeper integration of AI/ML model predictions with summary presentation, and dynamic, user-adaptive scaffolding in creative co-creation.
7. Synthesis and Research Significance
HAIExplore constitutes a schema for augmenting human analytic, creative, and scientific workflows with AI-driven summarization, synthesis, and scenario management, preserving human agency in steering, constraint specification, and interpretive sensemaking. Whether applied to scientific simulation, large-scale graph/network sensemaking, creative artifact generation, or literature meta-analysis, these systems exemplify a design space where hybrid intelligence—rooted in domain knowledge and formal user-driven hypothesis articulation—substantially accelerates discovery while maintaining contextual nuance and interpretability. The diverse implementations underscore the flexibility of the approach and indicate widening adoption for both domain-specific and meta-scientific applications (Hazarika et al., 2023, Wang et al., 2017, Henelius et al., 2018, Archiwaranguprok et al., 29 Sep 2025, Wen et al., 20 Dec 2025).