Priberam

Learning Temporal-Dependent Ranking Models

Web archives already hold together more than 534 billion files and this number continues to grow as new initiatives arise. Searching on all versions of these files acquired throughout time is challenging, since users expect as fast and precise answers from web archives as the ones provided by current web search engines.

This talk discusses how to improve the search effectiveness of web archives through the creation of novel ranking features and ranking models that exploit the temporal dimension of archived data. A temporal-dependent ranking framework that exploits the variance of web characteristics over time is proposed. Based on the assumption that closer periods are more likely to hold similar web characteristics, this framework learns multiple models simultaneously, each tuned for a specific period. Experimental results show significant improvements over the search effectiveness of single-models created from all data independently of its time, using state-of-the-art learning-to-rank technology. This talk will also address ensemble approaches of ranking models.

Miguel Costa

Miguel Costa is a computer science researcher at the LASIGE (Large-Scale Informatics Systems Laboratory). He is also a PhD student (waiting for PhD defense) in the Faculty of Sciences of the University of Lisbon, the same Faculty from which he received his BSc degree and MSc degrees in 2001 and 2004. His findings have been applied in the developing of the Portuguese Web Archive search system (http://arquivo.pt) for which he was responsible at the Foundation for National Scientific Computing (FCCN). His research interests are information retrieval, machine learning and web archiving.LASIGE, FCUL