logo_upf titulo_upf Workshop: The Future of Web Search
Barcelona - May 19-20, 2006
yahoo bcn_media

Keynote Speakers


  • Gerhard Weikum
    Max-Planck Institute for Informatics, Germany
    "Efficient Top-k Queries for XML Information Retrieval"

    Abstract

    Non-schematic XML data that comes from many different sources and inevitably exhibits heterogeneous structures and annotations (i.e., XML tags) cannot be adequately searched using database query languages like XPath or XQuery. Often, queries either return too many or too few results. Rather the ranked-retrieval paradigm is called for, with relaxable search conditions, various forms of similarity predicates on tags and contents, and quantitative relevance scoring.

    The talk discusses recent advances and open research issues for ranked retrieval of XML data, and exemplifies them by the TopX search engine, a prototype system developed at the Max-Planck Institute for Informatics. TopX supports a probabilistic-IR scoring model for full-text content conditions and tag-term combinations, path conditions for all XPath axes as exact or relaxable constraints, and ontology-based relaxation of terms and tag names as similarity conditions for ranked retrieval. For speeding up top-k queries, various techniques are employed: probabilistic models as efficient score predictors for a variant of the threshold algorithm, judicious scheduling of sequential accesses for scanning index lists and random accesses to compute full scores, incremental merging of index lists for on-demand, self-tuning query expansion, and a suite of specifically designed, precomputed indexes to evaluate structural path conditions.

  • Andrei Broder
    Yahoo! Research, USA
    "From query based Information Retrieval to context driven Information Supply"

    Abstract

    In the past decade, Web search engines have evolved from a first generation based on classic Information Retrieval scaled up to web size and supporting only informational queries, to a second generation supporting navigational queries using web specific information (primarily link analysis), and then to a third generation enabling transactional and other "semantic" queries based on a variety of technologies aimed to directly satisfy the unexpressed "user intent." What is coming next? In this talk, we argue for the trend towards context driven Information Supply, that is, the goal of Web IR will widen to include the supply of relevant information without requiring the user to make an explicit query. The information supply concept greatly precedes information retrieval. (Newspapers, or even the "Acta Diurna" of ancient Rome.) What is new in the web framework, is the ability to supply relevant information specific to a given activity and a given user, while the activity is being performed. A prime example is the matching of ads to content being read, however the information supply paradigm is starting to appear in other contexts such as social networks, e-commerce, browsers, and others.