unicaen greyc


Semantic Fingerprinting: A Novel Method for Entity-level Content Classification

With the constantly growing Web, there is a need for automatically analyzing, interpreting and organizing contents. A particular need is given by the management of Web contents with respect to classification systems, e.g. based on ontologies in the LOD (Linked Open Data) cloud. Research in deep learning recently has shown great progress in classifying data based on large volumes of training data. However, "targeted" and fine-grained information systems require classification methods based on a relatively small number of "representative" samples. For that purpose, we present an approach that allows a semantic exploitation of Web contents and - at the same time - computationally efficient processing based on "semantic fingerprinting". To this end, we raise Web contents to the entity-level and exploit entity-related information that allows "distillation" and fine-grained classification of the Web content by its "semantic fingerprint". In experimental results on Web contents classified in Wikipedia, we show the superiority of our approach against state-of-the-art methods.

Downloads and Datasets