Cross-Ontology Queries: Techniques & Challenges

Updated 30 January 2026

Cross-ontology queries are formal queries that traverse multiple autonomous ontologies to enable integrated data access, inference, and data mining.
Key methodologies include query rewriting under alignment correspondences, hybrid reasoning approaches, and leveraging large language models for natural query mapping.
Practical systems demonstrate scalable performance using schema closure, NTGA-based execution, and compatibility metrics in large, heterogeneous data environments.

Cross-ontology queries are formal queries that traverse or integrate multiple ontologies, typically to enable data access, inference, or data mining across autonomous, semantically heterogeneous sources. Mechanisms for cross-ontology querying are essential for semantic data integration, knowledge discovery in domains with orthogonal ontologies (e.g., biomedical sciences), and linked data federation. Such queries must address syntactic, semantic, and inferential heterogeneity, as well as computational and schema-compatibility constraints. Cross-ontology query frameworks encompass formal query rewriting, import-by-query reasoning, data mining with semantic metrics, and schema compatibility assessment.

1. Formal Problem Space and Core Definitions

Cross-ontology querying presupposes the existence of multiple ontologies $O_1,\dots,O_n$ , each with its own signature and possibly distinct description logic (DL) expressivity, along with data repositories annotated per ontology, and cross-ontology mappings $M$ (e.g., $\mathtt{owl:sameAs}$ , $\mathtt{skos:exactMatch}$ , or alignment correspondences). The unified schema $O = \bigcup_i O_i \cup M$ supports graph-structured queries $Q$ —typically in SPARQL or conjunctive query (CQ) form—issued against an integrated knowledge base $G$ .

A cross-ontology query aims to retrieve the certain answers $\operatorname{ans}(q, T)$ , where $q$ is the user CQ and $T$ the raw triples from all sources, but subject to the closure $\mathrm{Cl}(T, \mathcal{R})$ under an entailment regime $\mathcal{R}$ (RDFS/OWL rules) (Kim et al., 2016).

Compatibility of a query $Q$ with an existing ontology $O$ is quantified using semantic similarity metrics and schema-centric compatibility scores, notably coverage and flexibility (Zhao et al., 2021).

2. Query Rewriting and Alignment Approaches

The principal technique for interoperable cross-ontology querying is query rewriting under ontology alignments. Given source and target ontologies $O_s$ and $O_t$ , and an alignment $\mathcal{M}$ composed of equivalence ( $=$ ) and subsumption ( $\sqsubseteq$ ) correspondences (both simple and complex, i.e., $(s:s)$ , $(s:c)$ , $(c:c)$ for atomic and DL expressions), a rewriting function $R(Q_s, \mathcal{M}) \to Q_t$ transforms a source query $Q_s$ to a target query $Q_t$ over $O_t$ (Ondo et al., 2 May 2025).

Complex alignments are handled via recursive decomposition: if a $(c:c)$ correspondence specifies that a conjunctive DL formula $F_s$ in $O_s$ is equivalent to $F_t$ in $O_t$ , the rewriting function matches and rewrites graph patterns at the formula level, supporting nested and disjunctive expansions.

Algorithmically, the process scans the triple patterns in $Q_s$ , matches components against alignment correspondences, and recursively rewrites or expands subpatterns based on the correspondence type, with worst-case output size $O(|T|\cdot d^k)$ for $T$ input triple patterns and maximum alignment branching $d$ at nesting depth $k$ .

Practical systems integrate large-LLMs, such as GPT-4, to map natural-language queries to formal subgraph queries, which are then rewritten cross-ontology via the alignment machinery (Ondo et al., 2 May 2025).

3. Reasoning and Inference over Integrated Ontologies

Reasoning across ontologies with partially hidden or restricted content necessitates specialized frameworks. The import-by-query approach formalizes reasoning with hidden ontologies $K_h$ accessed only via an oracle interface (concept-satisfiability, ABox-satisfiability, or ABox-entailment) and a querying ontology $K_v$ (Grau et al., 2014).

Key components:

Shared signature $\Sigma = \operatorname{sig}(K_v)\cap\operatorname{sig}(K_h)$ specifies the interface vocabulary.
Necessary restrictions include prohibiting nominals in $K_h$ , ensuring deductive modularity and HT-safety in $K_v$ , enforcing $\Sigma$ -role acyclicity in $K_v$ , and matching oracle interface capabilities to the logical expressivity required.
The hypertableau-based import-by-query algorithm preprocesses $K_v$ , checks restrictions, simulates a tableau with cuts for $\Sigma$ , and invokes oracles only when $\Sigma$ -components block further expansion.
Complexity for general DLs is at least N2EXPTIME, but for EL fragments reasoning can be tractable (PTIME).

The approach provides sound, complete cross-ontology reasoning without full exposure of $K_h$ , situating itself strictly between monolithic import and oblivious reuse. Comparison with module extraction and uniform interpolation highlights trade-offs in confidentiality, completeness, and computational cost.

4. Scalable Execution and Big Data Integration

In large-scale linked data environments (e.g., life sciences), efficient cross-ontology query processing hinges on (a) ontology integration via schema closure, (b) query rewriting (mainly RDFS and limited OWL-RL rules), and (c) execution on distributed frameworks (Kim et al., 2016).

Key processes:

Schema closure is materialized offline using inheritance, subproperty, and equivalence axioms.
Rule-based query rewriting transforms user SPARQL patterns into unions of conjunctive queries (UCQ) equivalent to deductive closure semantics.
Nested TripleGroup Algebra (NTGA) leverages star-shaped pattern grouping and shared join logic to collapse large UCQs into minimal MapReduce jobs.
Performance evaluations on multi-billion triple datasets (UniProt, Chem2Bio2RDF) show that the NTGA/RAPID+ system achieves 3–9 $\times$ better efficiency on complex union queries versus naïve plans, while scaling to hundreds of gigabytes (Kim et al., 2016).

Limitations include current restriction to RDFS and basic OWL-RL, and challenges with incremental ontology evolution and federated execution.

5. Statistical Data Mining and Interestingness in Cross-Ontology Context

Data mining over cross-ontology-annotated corpora seeks to discover and rank relationships (e.g., $x \to y$ , $x\in O_1$ , $y\in O_2$ ) statistically supported by joint annotations (Manda et al., 2015). The canonical workflow encompasses:

Ontology-guided generalization: Propagation of annotation terms to their ancestors via transitive "is-a" closure, filtering out uninformative terms by normalized information content ( $N_\mathrm{IC}$ ).
Frequent itemset mining (e.g., Apriori) targeting cross-ontology rules, where left and right sides are from different ontologies.
Ranking rules by IRIC (Integrated Rule Information Content), a composite metric:

$\mathrm{IRIC}(x\to y) = [\alpha N_\mathrm{IC}(x) + \beta N_\mathrm{IC}(y)] \cdot N_\mathrm{COMI}(x, y)$

where $N_\mathrm{COMI}$ is normalized cross-ontology mutual information, and $\alpha,\beta$ are weights (Manda et al., 2015).

Integration into a query engine: Pre-calculate and index mined rules, delivering query-time recommendations or associations via a RESTful endpoint.

Qualitative evaluation establishes that IRIC combines semantic specificity and statistical association, outperforming traditional support, confidence, lift, and information gain metrics in terms of sensitivity to ontology structure and informative co-occurrence.

6. Schema Compatibility and Ontology Selection

Schema-level assessment of cross-ontology compatibility quantifies how well an ontology $O$ can support a query $Q$ or legacy schemas $S$ . The method computes:

Etype similarity at label, property, and instance levels, aggregated with tunable thresholds.
Coverage $\operatorname{Cov}(O, Q)$ : the fraction of query classes mapped to $O$ , weighted by structural centrality (degree).
Flexibility $\operatorname{Flx}(O, Q)$ : the proportion of $O$ 's classes not required to answer $Q$ .
Compatibility score $\operatorname{Comp}(Q, O) = \lambda \operatorname{Cov}(O, Q) + (1-\lambda)(1 - \operatorname{Flx}(O, Q))$ , providing a numeric criterion for ontology selection (Zhao et al., 2021).

The algorithm is lightweight (operating at the knowledge graph schema level), but is sensitive to parameter tuning and does not capture deep logical or data property constraints.

7. Limitations, Variants, and Extensions

Cross-ontology querying remains limited by the expressivity of rewriting and execution frameworks (RDFS and partial OWL-RL are practical limits for scalable integration), the difficulty of handling complex alignments (especially arbitrary DL expressions), and the tractability of reasoning with hidden content oracles.

Frameworks such as import-by-query are inapplicable when ontologies use nominals or cyclic dependencies in the shared signature, or when concept-only oracles are insufficient due to quantifier interactions. Methods such as module extraction and uniform interpolation trade query completeness for confidentiality but face computational blow-up and limitations on existence.

Future extensions encompass richer structural similarity metrics (e.g., ontology embeddings), cost-based optimization for federated or distributed execution, incremental maintenance, and learning parameter settings for compatibility metrics from empirical use (Grau et al., 2014, Zhao et al., 2021, Ondo et al., 2 May 2025, Kim et al., 2016).