MOSIG Master 2ND YEAR Research
YEAR 2016/2017

MASTER TOPIC PROPOSAL

ADVISOR: Jérôme David and Jérôme Euzenat

EMAIL: Jerome:David#inria:fr, Jerome:Euzenat#inria:fr

TEAM: Exmo team, INRIA & Univ. Grenoble Alpes

LABORATORY: LIG

MASTER PROFILE: Artificial intelligence and the web

Reference number: Proposal n°2155

TITLE:

Extracting RDF link keys with Formal Context Analysis

The goal of the semantic web is to take advantage of formalised knowledge at the scale of the worldwide web. This has led to the release of a vast quantity of data expressed in semantic web formalisms (RDF) [Heath 2011a]. Part of the added value of linked data lies in the links identifying the same entity in different data sets as it allows for making inference between data sets. For instance, they may identify the same books and articles in different bibliographical data sources. So finding the manifestation of the same entity across several data sets is an important task of linked data.

One way of identifying entities is to use link keys which are a generalisation of keys usually found in data bases to several data sets. A link key [Atencia 2014b] is a statement of the form:

( {⟨p₁, q₁⟩,... ⟨p_n, q_n⟩} link key ⟨c, d⟩ ) stating that whatever an instance of the class c has the same values for properties p₁,... p_n as an instance of class d has for properties q₁,... q_n, then these two are the same entity. For example, it may be that a instance of the class Livre is equivalent to an instance of the class Novel as soon as their properties auteur and titre on the one side and creator and title on the other side have the same values.

Formal concept analysis (FCA) is a technique to extract concepts between two interdependent ordered sets [Ganter 1999a]. It as been used for infering database keys by providing the dependencies between maximal sets of attributes and the partitions of the data that they generate. We provided the generalisation needed for database link keys [Atencia 2014d]. For RDF link keys there are several issues:

Values do not have to be syntactically equal but may be found equal with respect to some theory: this may be a simple set of equality statement (J. D. Salinger=Salinger, Jerome David), a similarity measure (σ(J. D. Salinger, Salinger, Jerome David)≥δ), or may depend on RDF Schemas or OWL ontologies;
Classes may depend on each others (for instance, the class Book will depend on the class Person as the value of the attribute author) or even itself (the class Person depending to itself through the attribute father and mother); this may require to use Relational Concept Analysis [Hacene 2013a]
RDF attributes are not functional and hence yield a more general type of keys (for instance, if two Persons share at least one of their email, they may be considered the same person) [Atencia 2014b, c].

The goal of the project is to study possible extensions of the proposed FCA-based link key extraction techniques to link keys by dealing with some of the issues above.

Expected results

Defining extensions of FCA techniques for RDF link key extraction;
Implementing them for experimenting with key extraction.

References

[Atencia 2014b] Manuel Atencia, Jérôme David, Jérôme Euzenat, Data interlinking through robust link key extraction, Proc. 21st ECAI, Prague (CK), pp15-20, 2014
[Atencia 2014c] Manuel Atencia, Michel Chein, Madalina Croitoru, Michel Chein, Jérôme David, Michel Leclère, Nathalie Pernelle, Fatiha Saïs, François Scharffe, Danai Symeonidou, Defining key semantics for the RDF datasets: experiments and evaluations, in: Proc. 21st ICCS, Iasi (RO), pp65-78, 2014
[Atencia 2014d] Manuel Atencia, Jérôme David, Jérôme Euzenat, What can FCA do for database link key extraction?, Proc. ECAI workshop on "what can FCA do for AI?", Prague (CK), 2014
[Ganter 1999a] Bernhard Ganter, Rudolf Wille, Formal concept analysis: mathematical foundations, Springer, Berlin (DE), 1999
[Heath 2011a] Tom Heath and Christian Bizer, Linked Data: Evolving the Web into a Global Data Space, Morgan & Claypool, 2011
[Hacene 2013a] Mohamed Rouane Hacene, Marianne Huchard, Amedeo Napoli, Petko Valtchev, Relational concept analysis: mining concept lattices from multi-relational data, Annals of Mathematics and Artificial Intelligence

http://exmo.inria.fr/training/M2R-2016-fcakey.html

$Id: M2R-2016-fcakey.html,v 1.6 2021/12/17 16:02:27 euzenat Exp $