unicaen greyc

Research Datasets & Demos

PURE: Pattern Utilization for Representative Entity type classification

Thirty years of the Web have led to a tremendous amount of contents. While contents of the early years have been predominantly “simple” HTML documents, more recent ones have become more and more “machine-interpretable”. Named entities - ideally explicitly and intentionally annotated - pave the way toward a semantic exploration and exploitation of the data. While this appears to be the golden sky toward a more human-centricWeb, it not necessarily is. The key-point is simple: “the more the merrier” is not necessarily the case along all dimensions. For instance, each and every named entity provides via the Web of data a plenitude of information potentially overwhelming the end-user. In particular, named entities are predominantly annotated with multiple types without any order of importance associated. In order to depict the most concise type information, we introduce an approach towards Pattern Utilization for Representative Entity type classification called PURE. To this end, PURE aims at exploiting solely structural patterns derived from knowledge graphs in order to “purify” the most representative type(s) associated with a named entity. Our experiments with named entities in Wikipedia demonstrate the viability of our approach and improvement over competing strategies.

Downloads and Datasets