unicaen greyc

Research Datasets & Demos



SUIT: Semantic User Interest Tracing


User interest tracing is a common practice in many Web use-cases including, but not limited to, search, recommendation or intelligent assistants. The overall aim is to provide the user a personalized “Web experience” by aggregating and exploiting a plenitude of user data derived from collected logs, accessed contents, and/or mined community context. As such, fairly basic features such as terms and graph structures can be utilized in order to model a user’s interest. While there are clearly positive aspects in the before mentioned application scenarios, the user’s privacy is highly at risk. In order to study inherent privacy risks, this paper studies Semantic User Interest Tracing (SUIT in short) by investigating a user’s publishing/editing behavior of Web contents. In contrast to existing approaches, SUIT solely exploits the (semantic) concepts [categories] inherent in documents derived via entity-level analytics. By doing so, we raise Web contents to the entity-level. Thus, we are able to abstract the user interest from plain text strings to “things”. In particular, we utilize the inherited structural relationships present among the concepts derived from a knowledge graph in order to identify the user associated with a specific Web content. Our extensive experiments on Wikipedia show that our approach outperforms state of the art approaches in tracing and predicting user behavior in a single language. In addition, we also demonstrate the viability of our semantic (language-agnostic) approach in multilingual experiments. As such, SUIT is capable of revealing the user’s identity, which demonstrates the fine line between personalization and surveillance, raising questions regarding ethical considerations at the same time.


Downloads and Datasets


SUIT


Publication


A. Kumar and M. Spaniol
There is a fine Line between Personalization and Surveillance: Semantic User Interest Tracing via Entity-level Analytics
Proceedings of the 14th International ACM Web Science Conference (WebSci'22), Barcelona, Spain, June 26-29, 2022, 12 pages (to appear).
BibTeX