Papers
Topics
Authors
Recent
Search
2000 character limit reached

The PCI Ontology

Published 26 Jul 2017 in cs.DL | (1708.09326v1)

Abstract: In this paper, an ontology for the description and indexing of the contents of audiovisual resources will be presented. The concerned domain of reference (or: knowledge domain) is the cultural heritage of minorities and indigenous people, hence the name or title of this ontology, PCI ontology (standing for the French "Patrimoine Culturel de Minorit\'es et Peuples Indig`enes").

Summary

  • The paper introduces an ontology framework that uniquely indexes audiovisual materials on the cultural heritage of minorities and indigenous communities.
  • It employs an empirical methodology using iterative testing with the CoGui editor to integrate discourse, referential, and pragmatic facets.
  • The ontology facilitates detailed thematic configurations, enabling nuanced indexing and reuse in diverse publishing and research contexts.

The paper "The PCI Ontology" (1708.09326) presents an ontology specifically designed for the description and indexing of audiovisual resources focusing on the cultural heritage of minorities and indigenous people. Developed within the context of the LOGOS project (Knowledge-on-Demand for Ubiquitous Learning), the PCI ontology aims to make a corpus of interviews, seminars, conferences, and documentaries accessible and reusable in various publishing contexts and for different user profiles. The ontology serves as the "description language" needed for this indexing process, providing both a vocabulary and description schemas.

The core objective is communicational: to identify and describe the information and knowledge conveyed by speakers (researchers, specialists, etc.) within the audiovisual corpus, while also characterizing this information in relation to specific potential users and contexts of use. This contrasts with traditional cultural heritage thesauri or museum object cataloguing tools like Getty's AAT or CIDOC's CRM, which primarily focus on classifying objects or concepts.

The PCI Ontology is structured around three main facets or upper categories:

  1. World_PCI (Referential Facet): Describes the domain of reference, which is the cultural heritage of minorities and indigenous people. This facet captures the knowledge about the "world" being discussed in the audiovisual resources.
  2. Discourse Description (Narrative Facet): Describes the discourses themselves – how the knowledge about the World_PCI is produced, organized, and uttered by speakers.
  3. Pragmatic Description (Pragmatic Facet): Describes the potential uses and contexts for which the uttered information is relevant, linking described content to specific publishing genres or user profiles. In the context of the LOGOS project, this facet was intended to be integrated into an authoring studio tool rather than solely residing within the ontology structure for indexing.

The ontology's vocabulary is composed of three basic types:

  • Concepts (Themes): The terms used for description, organized hierarchically within the three main facets (Discourse Description, Pragmatic Description, World_PCI). Examples include Activity, Actor, Natural Environment within World_PCI, or Discourse Type, Discourse Participant within Discourse Description.
  • Conceptual/Thematic Relations: Relations linking concepts to form descriptive schemas or patterns. The main categories are Situating relations (describing relationships within the referential domain, e.g., causality, spatial), Narrative relations (describing relationships within or between discourses, e.g., argumentative, discursive), and Linguistic relations (describing relationships between linguistic entities).
  • Nesting Contexts: Mechanisms for grouping configurations of themes, such as the Discourse Topic which represents the referential scope of a discourse or discourse unit.

The ontology is built using the CoGui ontology editor, which is based on conceptual graph theory. This formal environment supports the definition of concepts, relations, and nesting contexts, and facilitates the representation of complex knowledge patterns.

The methodology for building and maintaining the PCI ontology follows an empirical approach, heavily reliant on the specific content of the audiovisual corpus itself. The process involves iterative steps:

  1. Defining objectives and constituting a pilot committee.
  2. Empirical work on the corpus, including viewing, rough description using semiotic/text-linguistic tools (like the Interview tool), and identifying potential reuse scenarios.
  3. Familiarization and use of the CoGui ontology editor.
  4. Investigating pre-existing literature and initiatives (thesauri, ontologies, terminologies) related to cultural heritage, social sciences, and discourse analysis (e.g., TEI, GOLD, UNESCO Thesaurus, Getty's AAT, CIDOC CRM, IconClass). This step informs the basic categorization and allows for potential reuse of vocabularies and consideration of interoperability.
  5. Building and testing small, partial versions of the ontology based on empirical findings and theoretical assumptions (e.g., hypotheses on narrative structure, 'lifeworld' organization).
  6. Building a first stable general version (Version 4, the basis of the paper) by merging tested components, aiming for broad but not necessarily deep coverage across the facets.
  7. Working with the stable version to describe and index the actual corpus segments, involving fine-grained segmentation, description, indexing with themes, keywords (instances of themes), and indexing templates (conceptual graphs).
  8. Collecting feedback on the limits, usefulness, and comprehensibility of the ontology during the indexing process.
  9. Updating the ontology based on evaluation and parallel theoretical investigations. 10. Generalizing the ontology by identifying parts reusable in other domains and pilots, exploring interoperability.

Key aspects of the thematic hierarchies detailed in the paper include:

  • Discourse Description: Includes themes for describing the production context (Contextual Setting), metadata (Discourse Generalities), participants (Discourse Participant), the type of discourse (Discourse Type covering Discourse Act and Discourse Genre), and structural units (Discourse Unit).
  • World_PCI: Comprises broad categories drawing inspiration from "life world" concepts and various thesauri. Themes cover aspects like actions (Activity), entities involved (Actor, Animate Matter, Fauna, Flora, Inanimate Matter, Object, Product), characteristics (Attribute, Feature, Symbolic Model and Process, System of Expression and Communication), context (Natural Environment, Social Environment), and temporality (Temporality and History).

The relational part provides the means to connect concepts. The Situating Relation hierarchy includes relationships based on roles (Actantial Relation), opposition (Ant/Agonist Relation), cause/effect (Causality Relation), hypothetical situations (Counterfactual Relation), goals/reasons (Intentional Relation), representation (Manifestation Relation), part-whole (Partonymic Relation), location (Spatial Relation), time (Temporal Relation), and classification modification (Taxinomic Relation). The Narrative Relation connects referential concepts to discourse concepts, including Discursive Relation (relating discourse units) and Rhetorical Relation (relating arguments, e.g., exemplify, explain, justify). The Linguistic Relation primarily focuses on lexical relationships (Lexical Relation) relevant for describing terminology within the corpus.

Conceptual graphs, or thematic configurations, are the result of connecting themes using relations. For the PCI pilot, the focus is on Discourse Topics, which are conceptual graphs representing knowledge about the World_PCI domain. These topic graphs are often embedded within graphs representing the discourse itself, reflecting the fact that the ontology describes uttered knowledge from a specific speaker's point of view. Four principal groups of discourse topics were identified for indexing: Essentials of culture/lifeworld, Intangible heritage, Practical knowledge/know-how, and Investigations in cultural identity. These thematic configurations serve as templates for the manual indexing process, aiming to capture the content and structure of the audiovisual resources for future repurposing and fine-tuned exploration.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.