Course: Moteurs de Recherche (WS 18)


Code: M.INF2MD
Course: Second Year of the Master on Internet, Données et Connaissance (IDC)
Teaching Staff: Marc Spaniol and Amit Kumar
Small Description: In this course, we present advanced topics of search engine technology and Web information retrieval.




The project was presented in the first lecture.
1) The Wikipedia version to be used should be the dump of September 1, 2018. The correct(ed) link can be found here.
2) For the indexing task, only (!) the most recent version of a Wikipedia page should be used. Multiple copies (e.g. from previous revisions) should therefore be discarded.
3) The analytics task only (!) concerns "plain" entity pages. To this end, "meta-pages" such as redirects, categories or similar should be filtered out, too.

