Bibliography on ANR-Datalift (2017-06-06)
Maria Roşoiu, Jérôme David, Jérôme Euzenat, A linked data framework for Android, in: Elena Simperl, Barry Norton, Dunja Mladenic, Emanuele Della Valle, Irini Fundulaki, Alexandre Passant, Raphaël Troncy (eds), The Semantic Web: ESWC 2012 Satellite Events, Springer Verlag, Heidelberg (DE), 2015, pp204-218
Mobile devices are becoming major repositories of personal information. Still, they do not provide a uniform manner to deal with data from both inside and outside the device. Linked data provides a uniform interface to access structured interconnected data over the web. Hence, exposing mobile phone information as linked data would improve the usability of such information. We present an API that provides data access in RDF, both within mobile devices and from the outside world. This API is based on the Android content provider API which is designed to share data across Android applications. Moreover, it introduces a transparent URI dereferencing scheme, exposing content outside of the device. As a consequence, any application may access data as linked data without any a priori knowledge of the data source.
Manuel Atencia, Jérôme David, Jérôme Euzenat, Data interlinking through robust linkkey extraction, in: Torsten Schaub, Gerhard Friedrich, Barry O'Sullivan (eds), Proc. 21st european conference on artificial intelligence (ECAI), Praha (CZ), pp15-20, 2014
Links are important for the publication of RDF data on the web. Yet, establishing links between data sets is not an easy task. We develop an approach for that purpose which extracts weak linkkeys. Linkkeys extend the notion of a key to the case of different data sets. They are made of a set of pairs of properties belonging to two different classes. A weak linkkey holds between two classes if any resources having common values for all of these properties are the same resources. An algorithm is proposed to generate a small set of candidate linkkeys. Depending on whether some of the, valid or invalid, links are known, we define supervised and non supervised measures for selecting the appropriate linkkeys. The supervised measures approximate precision and recall, while the non supervised measures are the ratio of pairs of entities a linkkey covers (coverage), and the ratio of entities from the same data set it identifies (discrimination). We have experimented these techniques on two data sets, showing the accuracy and robustness of both approaches.
Zhengjie Fan, Concise pattern learning for RDF data sets interlinking, Thèse d'informatique, Université de Grenoble, Grenoble (FR), April 2014
There are many data sets being published on the web with Semantic Web technology. The data sets contain analogous data which represent the same resources in the world. If these data sets are linked together by correctly building links, users can conveniently query data through a uniform interface, as if they are querying one data set. However, finding correct links is very challenging because there are many instances to compare. Many existing solutions have been proposed for this problem. (1) One straight-forward idea is to compare the attribute values of instances for identifying links, yet it is impossible to compare all possible pairs of attribute values. (2) Another common strategy is to compare instances according to attribute correspondences found by instance-based ontology matching, which can generate attribute correspondences based on instances. However, it is hard to identify the same instances across data sets, because there are the same instances whose attribute values of some attribute correspondences are not equal. (3) Many existing solutions leverage Genetic Programming to construct interlinking patterns for comparing instances, while they suffer from long running time. In this thesis, an interlinking method is proposed to interlink the same instances across different data sets, based on both statistical learning and symbolic learning. The input is two data sets, class correspondences across the two data sets and a set of sample links that are assessed by users as either "positive" or "negative". The method builds a classifier that distinguishes correct links and incorrect links across two RDF data sets with the set of assessed sample links. The classifier is composed of attribute correspondences across corresponding classes of two data sets, which help compare instances and build links. The classifier is called an interlinking pattern in this thesis. On the one hand, our method discovers potential attribute correspondences of each class correspondence via a statistical learning method, the K-medoids clustering algorithm, with instance value statistics. On the other hand, our solution builds the interlinking pattern by a symbolic learning method, Version Space, with all discovered potential attribute correspondences and the set of assessed sample links. Our method can fulfill the interlinking task that does not have a conjunctive interlinking pattern that covers all assessed correct links with a concise format. Experiments confirm that our interlinking method with only 1% of sample links already reaches a high F-measure (around 0.94-0.99). The F-measure quickly converges, being improved by nearly 10% than other approaches.
Interlinking, Ontology Matching, Machine Learning
Zhengjie Fan, Jérôme Euzenat, François Scharffe, Learning concise pattern for interlinking with extended version space, in: Dominik l zak, Hung Son Nguyen, Marek Reformat, Eugene Santos (eds), Proc. 13th IEEE/WIC/ACM international conference on web intelligence (WI), Warsaw (PL), pp70-77, 2014
Many data sets on the web contain analogous data which represent the same resources in the world, so it is helpful to interlink different data sets for sharing information. However, finding correct links is very challenging because there are many instances to compare. In this paper, an interlinking method is proposed to interlink instances across different data sets. The input is class correspondences, property correspondences and a set of sample links that are assessed by users as either "positive" or "negative". We apply a machine learning method, Version Space, in order to construct a classifier, which is called interlinking pattern, that can justify correct links and incorrect links for both data sets. We improve the learning method so that it resolves the no-conjunctive-pattern problem. We call it Extended Version Space. Experiments confirm that our method with only 1% of sample links already reaches a high F-measure (around 0.96-0.99). The F-measure quickly converges, being improved by nearly 10% than other comparable approaches.
Manuel Atencia, Jérôme David, François Scharffe, Keys and pseudo-keys detection for web datasets cleansing and interlinking, in: Proc. 18th international conference on knowledge engineering and knowledge management (EKAW), Galway (IE), (Annette ten Teije, Johanna Voelker, Siegfried Handschuh, Heiner Stuckenschmidt, Mathieu d'Aquin, Andriy Nikolov, Nathalie Aussenac-Gilles, Nathalie Hernandez (eds), Knowledge engineering and knowledge management, Lecture notes in computer science 7603, 2012), pp144-153, 2012
This paper introduces a method for analyzing web datasets based on key dependencies. The classical notion of a key in relational databases is adapted to RDF datasets. In order to better deal with web data of variable quality, the definition of a pseudo-key is presented. An RDF vocabulary for representing keys is also provided. An algorithm to discover keys and pseudo-keys is described. Experimental results show that even for a big dataset such as DBpedia, the runtime of the algorithm is still reasonable. Two applications are further discussed: (i) detection of errors in RDF datasets, and (ii) datasets interlinking.
Data Interlinking, Semantic Web, RDF Data Cleaning
Jérôme David, Jérôme Euzenat, Maria Roşoiu, Mobile API for linked data, Deliverable 6.3, Datalift, 19p., 2012
This report presents a mobile API for manipulating linked data under the Android platform.
mobile, API, linked data, content provider
Jérôme Euzenat, A modest proposal for data interlinking evaluation, in: Pavel Shvaiko, Jérôme Euzenat, Anastasios Kementsietsidis, Ming Mao, Natalya Noy, Heiner Stuckenschmidt (eds), Proc. 7th ISWC workshop on ontology matching (OM), Boston (MA US), pp234-235, 2012
Data interlinking is a very important topic nowadays. It is sufficiently similar to ontology matching that comparable evaluation can be overtaken. However, it has enough differences, so that specific evaluations may be designed. We discuss such variations and design.
Data interlinking, Evaluation, Benchmark, Blocking, Instance matching
Zhengjie Fan, Data linking with ontology alignment, in: Proc. 9th conference on European semantic web conference (ESWC), Heraklion (GR), (Elena Simperl, Philipp Cimiano, Axel Polleres, Óscar Corcho, Valentina Presutti (eds), The semantic web: research and applications (Proc. 9th European semantic web conference poster session), Lecture notes in computer science 7295, 2012), pp854-858, 2012
It is a trend to publish RDF data on the web, so that users can share information semantically. Then, linking isolated data sets together is highly needed. I would like to reduce the comparison scale by isolating the types of resources to be compared, so that it enhances the accuracy of the linking process. I propose a data linking method for linked data on the web. Such a method can interlink linked data automatically by referring to an ontology alignment between linked data sets. Alignments can provide them entities to compare.
François Scharffe, Ghislain Atemezing, Raphaël Troncy, Fabien Gandon, Serena Villata, Bénédicte Bucher, Fayçal Hamdi, Laurent Bihanic, Gabriel Képéklian, Franck Cotton, Jérôme Euzenat, Zhengjie Fan, Pierre-Yves Vandenbussche, Bernard Vatant, Enabling linked data publication with the Datalift platform, in: Proc. AAAI workshop on semantic cities, Toronto (ONT CA), 2012
As many cities around the world provide access to raw public data along the Open Data movement, many questions arise concerning the accessibility of these data. Various data formats, duplicate identifiers, heterogeneous metadata schema descriptions, and diverse means to access or query the data exist. These factors make it difficult for consumers to reuse and integrate data sources to develop innovative applications. The Semantic Web provides a global solution to these problems by providing languages and protocols for describing and accessing datasets. This paper presents Datalift, a framework and a platform helping to lift raw data sources to semantic interlinked data sources.
François Scharffe, Jérôme David, Manuel Atencia, Keys and pseudo-keys detection for web datasets cleansing and interlinking, Deliverable 4.1.2, Datalift, 18p., 2012
This report introduces a novel method for analysing web datasets based on key dependencies. This particular kind of functional dependencies, widely studied in the field of database theory, allows to evaluate if a set of properties constitutes a key for the set of data considered. When this is the case, there won't be any two instances having identical values for these properties. After giving necessary definitions, we propose an algorithm for detecting minimal keys and pseudo-keys in a RDF dataset. We then use this algorithm to detect keys in datasets published as web data and we apply this approach in two applications: (i) reducing the number of properties to compare in order to discover equivalent instances between two datasets, (ii) detecting errors inside a dataset.
data linking, instance matching, record linkage, co-reference resolution, ontology alignment, ontology matching
Jérôme Euzenat, Nathalie Abadie, Bénédicte Bucher, Zhengjie Fan, Houda Khrouf, Michael Luger, François Scharffe, Raphaël Troncy, Dataset interlinking module, Deliverable 4.2, Datalift, 32p., 2011
This report presents the first version of the interlinking module for the Datalift platform as well as strategies for future developments.
data interlinking, linked data, instance matching
François Scharffe, Jérôme Euzenat, MeLinDa: an interlinking framework for the web of data, Research report 7641, INRIA, Grenoble (FR), 21p., July 2011
The web of data consists of data published on the web in such a way that they can be interpreted and connected together. It is thus critical to establish links between these data, both for the web of data and for the semantic web that it contributes to feed. We consider here the various techniques developed for that purpose and analyze their commonalities and differences. We propose a general framework and show how the diverse techniques fit in the framework. From this framework we consider the relation between data interlinking and ontology matching. Although, they can be considered similar at a certain level (they both relate formal entities), they serve different purposes, but would find a mutual benefit at collaborating. We thus present a scheme under which it is possible for data linking tools to take advantage of ontology alignments.
Semantic web, Data interlinking, Instance matching, Ontology alignment, Web of data
François Scharffe, Jérôme Euzenat, Linked data meets ontology matching: enhancing data linking through ontology alignments, in: Proc. 3rd international conference on Knowledge engineering and ontology development (KEOD), Paris (FR), pp279-284, 2011
The Web of data consists of publishing data on the Web in such a way that they can be connected together and interpreted. It is thus critical to establish links between these data, both for the Web of data and for the Semantic Web that it contributes to feed. We consider here the various techniques which have been developed for that purpose and analyze their commonalities and differences. This provides a general framework that the diverse data linking systems instantiate. From this framework we consider the relation between data linking and ontology matching activities. Although, they can be considered similar at a certain level (they both relate formal entities), they serve different purposes: one acts at the schema level and the other at the instance level. However, they would find a mutual benefit at collaborating. We thus present a scheme under which it is possible for data linking tools to take advantage of ontology alignments. We present the features of expressive alignment languages that allows linking specifications to reuse ontology alignments in a natural way.
Semantic web, Linked data, Data linking, Ontology alignment, Ontology matching, Entity reonciliation, Object consolidation
François Scharffe, Zhengjie Fan, Alfio Ferrara, Houda Khrouf, Andriy Nikolov, Methods for automated dataset interlinking, Deliverable 4.1, Datalift, 34p., 2011
Interlinking data is a crucial step in the Datalift platform framework. It ensures that the published datasets are connected with others on the Web. Many techniques are developed on this topic in order to automate the task of finding similar entities in two datasets. In this deliverable, we first clarify terminology in the field of linking data. Then we classify and overview many techniques used to automate data linking on the web. We finally review 11 state-of-the-art tools and classify them according to which technique they use.