Papers
Topics
Authors
Recent
Search
2000 character limit reached

Cross-Ontology Queries: Techniques & Challenges

Updated 30 January 2026
  • Cross-ontology queries are formal queries that traverse multiple autonomous ontologies to enable integrated data access, inference, and data mining.
  • Key methodologies include query rewriting under alignment correspondences, hybrid reasoning approaches, and leveraging large language models for natural query mapping.
  • Practical systems demonstrate scalable performance using schema closure, NTGA-based execution, and compatibility metrics in large, heterogeneous data environments.

Cross-ontology queries are formal queries that traverse or integrate multiple ontologies, typically to enable data access, inference, or data mining across autonomous, semantically heterogeneous sources. Mechanisms for cross-ontology querying are essential for semantic data integration, knowledge discovery in domains with orthogonal ontologies (e.g., biomedical sciences), and linked data federation. Such queries must address syntactic, semantic, and inferential heterogeneity, as well as computational and schema-compatibility constraints. Cross-ontology query frameworks encompass formal query rewriting, import-by-query reasoning, data mining with semantic metrics, and schema compatibility assessment.

1. Formal Problem Space and Core Definitions

Cross-ontology querying presupposes the existence of multiple ontologies O1,,OnO_1,\dots,O_n, each with its own signature and possibly distinct description logic (DL) expressivity, along with data repositories annotated per ontology, and cross-ontology mappings MM (e.g., owl:sameAs\mathtt{owl:sameAs}, skos:exactMatch\mathtt{skos:exactMatch}, or alignment correspondences). The unified schema O=iOiMO = \bigcup_i O_i \cup M supports graph-structured queries QQ—typically in SPARQL or conjunctive query (CQ) form—issued against an integrated knowledge base GG.

A cross-ontology query aims to retrieve the certain answers ans(q,T)\operatorname{ans}(q, T), where qq is the user CQ and TT the raw triples from all sources, but subject to the closure Cl(T,R)\mathrm{Cl}(T, \mathcal{R}) under an entailment regime R\mathcal{R} (RDFS/OWL rules) (Kim et al., 2016).

Compatibility of a query QQ with an existing ontology OO is quantified using semantic similarity metrics and schema-centric compatibility scores, notably coverage and flexibility (Zhao et al., 2021).

2. Query Rewriting and Alignment Approaches

The principal technique for interoperable cross-ontology querying is query rewriting under ontology alignments. Given source and target ontologies OsO_s and OtO_t, and an alignment M\mathcal{M} composed of equivalence (==) and subsumption (\sqsubseteq) correspondences (both simple and complex, i.e., (s:s)(s:s), (s:c)(s:c), (c:c)(c:c) for atomic and DL expressions), a rewriting function R(Qs,M)QtR(Q_s, \mathcal{M}) \to Q_t transforms a source query QsQ_s to a target query QtQ_t over OtO_t (Ondo et al., 2 May 2025).

Complex alignments are handled via recursive decomposition: if a (c:c)(c:c) correspondence specifies that a conjunctive DL formula FsF_s in OsO_s is equivalent to FtF_t in OtO_t, the rewriting function matches and rewrites graph patterns at the formula level, supporting nested and disjunctive expansions.

Algorithmically, the process scans the triple patterns in QsQ_s, matches components against alignment correspondences, and recursively rewrites or expands subpatterns based on the correspondence type, with worst-case output size O(Tdk)O(|T|\cdot d^k) for TT input triple patterns and maximum alignment branching dd at nesting depth kk.

Practical systems integrate large-LLMs, such as GPT-4, to map natural-language queries to formal subgraph queries, which are then rewritten cross-ontology via the alignment machinery (Ondo et al., 2 May 2025).

3. Reasoning and Inference over Integrated Ontologies

Reasoning across ontologies with partially hidden or restricted content necessitates specialized frameworks. The import-by-query approach formalizes reasoning with hidden ontologies KhK_h accessed only via an oracle interface (concept-satisfiability, ABox-satisfiability, or ABox-entailment) and a querying ontology KvK_v (Grau et al., 2014).

Key components:

  • Shared signature Σ=sig(Kv)sig(Kh)\Sigma = \operatorname{sig}(K_v)\cap\operatorname{sig}(K_h) specifies the interface vocabulary.
  • Necessary restrictions include prohibiting nominals in KhK_h, ensuring deductive modularity and HT-safety in KvK_v, enforcing Σ\Sigma-role acyclicity in KvK_v, and matching oracle interface capabilities to the logical expressivity required.
  • The hypertableau-based import-by-query algorithm preprocesses KvK_v, checks restrictions, simulates a tableau with cuts for Σ\Sigma, and invokes oracles only when Σ\Sigma-components block further expansion.
  • Complexity for general DLs is at least N2EXPTIME, but for EL fragments reasoning can be tractable (PTIME).

The approach provides sound, complete cross-ontology reasoning without full exposure of KhK_h, situating itself strictly between monolithic import and oblivious reuse. Comparison with module extraction and uniform interpolation highlights trade-offs in confidentiality, completeness, and computational cost.

4. Scalable Execution and Big Data Integration

In large-scale linked data environments (e.g., life sciences), efficient cross-ontology query processing hinges on (a) ontology integration via schema closure, (b) query rewriting (mainly RDFS and limited OWL-RL rules), and (c) execution on distributed frameworks (Kim et al., 2016).

Key processes:

  • Schema closure is materialized offline using inheritance, subproperty, and equivalence axioms.
  • Rule-based query rewriting transforms user SPARQL patterns into unions of conjunctive queries (UCQ) equivalent to deductive closure semantics.
  • Nested TripleGroup Algebra (NTGA) leverages star-shaped pattern grouping and shared join logic to collapse large UCQs into minimal MapReduce jobs.
  • Performance evaluations on multi-billion triple datasets (UniProt, Chem2Bio2RDF) show that the NTGA/RAPID+ system achieves 3–9×\times better efficiency on complex union queries versus naïve plans, while scaling to hundreds of gigabytes (Kim et al., 2016).

Limitations include current restriction to RDFS and basic OWL-RL, and challenges with incremental ontology evolution and federated execution.

5. Statistical Data Mining and Interestingness in Cross-Ontology Context

Data mining over cross-ontology-annotated corpora seeks to discover and rank relationships (e.g., xyx \to y, xO1x\in O_1, yO2y\in O_2) statistically supported by joint annotations (Manda et al., 2015). The canonical workflow encompasses:

  • Ontology-guided generalization: Propagation of annotation terms to their ancestors via transitive "is-a" closure, filtering out uninformative terms by normalized information content (NICN_\mathrm{IC}).
  • Frequent itemset mining (e.g., Apriori) targeting cross-ontology rules, where left and right sides are from different ontologies.
  • Ranking rules by IRIC (Integrated Rule Information Content), a composite metric:

IRIC(xy)=[αNIC(x)+βNIC(y)]NCOMI(x,y)\mathrm{IRIC}(x\to y) = [\alpha N_\mathrm{IC}(x) + \beta N_\mathrm{IC}(y)] \cdot N_\mathrm{COMI}(x, y)

where NCOMIN_\mathrm{COMI} is normalized cross-ontology mutual information, and α,β\alpha,\beta are weights (Manda et al., 2015).

  • Integration into a query engine: Pre-calculate and index mined rules, delivering query-time recommendations or associations via a RESTful endpoint.

Qualitative evaluation establishes that IRIC combines semantic specificity and statistical association, outperforming traditional support, confidence, lift, and information gain metrics in terms of sensitivity to ontology structure and informative co-occurrence.

6. Schema Compatibility and Ontology Selection

Schema-level assessment of cross-ontology compatibility quantifies how well an ontology OO can support a query QQ or legacy schemas SS. The method computes:

  • Etype similarity at label, property, and instance levels, aggregated with tunable thresholds.
  • Coverage Cov(O,Q)\operatorname{Cov}(O, Q): the fraction of query classes mapped to OO, weighted by structural centrality (degree).
  • Flexibility Flx(O,Q)\operatorname{Flx}(O, Q): the proportion of OO's classes not required to answer QQ.
  • Compatibility score Comp(Q,O)=λCov(O,Q)+(1λ)(1Flx(O,Q))\operatorname{Comp}(Q, O) = \lambda \operatorname{Cov}(O, Q) + (1-\lambda)(1 - \operatorname{Flx}(O, Q)), providing a numeric criterion for ontology selection (Zhao et al., 2021).

The algorithm is lightweight (operating at the knowledge graph schema level), but is sensitive to parameter tuning and does not capture deep logical or data property constraints.

7. Limitations, Variants, and Extensions

Cross-ontology querying remains limited by the expressivity of rewriting and execution frameworks (RDFS and partial OWL-RL are practical limits for scalable integration), the difficulty of handling complex alignments (especially arbitrary DL expressions), and the tractability of reasoning with hidden content oracles.

Frameworks such as import-by-query are inapplicable when ontologies use nominals or cyclic dependencies in the shared signature, or when concept-only oracles are insufficient due to quantifier interactions. Methods such as module extraction and uniform interpolation trade query completeness for confidentiality but face computational blow-up and limitations on existence.

Future extensions encompass richer structural similarity metrics (e.g., ontology embeddings), cost-based optimization for federated or distributed execution, incremental maintenance, and learning parameter settings for compatibility metrics from empirical use (Grau et al., 2014, Zhao et al., 2021, Ondo et al., 2 May 2025, Kim et al., 2016).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cross-Ontology Queries.