unicaen greyc

Research



Semantic Fingerprinting: A Novel Method for Entity-level Content Classification


With the constantly growing Web, there is a need for automatically analyzing, interpreting and organizing contents. A particular need is given by the management of Web contents with respect to classification systems, e.g. based on ontologies in the LOD (Linked Open Data) cloud. Research in deep learning recently has shown great progress in classifying data based on large volumes of training data. However, "targeted" and fine-grained information systems require classification methods based on a relatively small number of "representative" samples. For that purpose, we present an approach that allows a semantic exploitation of Web contents and - at the same time - computationally efficient processing based on "semantic fingerprinting". To this end, we raise Web contents to the entity-level and exploit entity-related information that allows "distillation" and fine-grained classification of the Web content by its "semantic fingerprint". In experimental results on Web contents classified in Wikipedia, we show the superiority of our approach against state-of-the-art methods.

Downloads and Datasets

Publications




ELEVATE: A Framework for Entity-level Event Diffusion Prediction into Foreign Language Communities


The accessibility to news via the Web or other “traditional” media allows a rapid diffusion of information into almost every part of the world. These reports cover the full spectrum of events, ranging from locally relevant ones up to those that gain global attention. The societal impact of an event can be relatively easily “measured” by the attention it attracts (e.g. in the number of responses it receives and provokes) in the news or social media. However, this does not necessarily reflect its inter-cultural impact and its diffusion into other communities. In order to address the issue of predicting the spread of information into foreign language communities we introduce the ELEVATE framework. ELEVATE exploits entity information from Web contents and harnesses location related data for language-related event diffusion prediction. Our experiments on event spreading across Wikipedia com- munities of different language demonstrate the viability of our approach and improvement over state-of-the-art approaches.

Downloads and Datasets

Publications