- Ziv Bar-Yossef
Google Haifa, Israel
External Mining of Search Query logs.
Abstract: Search query logs are valuable data sources that are kept confidential
by search engines, in order to protect their users' privacy. Some search
engines disclose aggregate statistics about queries via services, like
Google Trends and Google Trends for Websites. The information provided
by these services, however, is obfuscated, non-repeatable, and partial.
For example, statistics for medium to low volume queries and sites are
not readily available.
In this talk I will describe algorithms for "external" mining of search
query logs. Our algorithms can be used to estimate the popularity of queries
in the log and the amount of impressions web sites receive from search
results. The algorithms use only public search engine services, like the
web search service and the query suggestion service, and thus do not require
privileged access to confidential search engine data sources. In addition,
the algorithms use modest resources, and hence can be used by anyone to
gather statistics about any query and/or any web site.
Our algorithms rely on tools from information retrieval (keyword extraction),
statistics (importance sampling), and database (tree volume estimation).
The talk will be self-contained. Based on joint work with Maxim Gurevich.
- Giovanni Tummarello
Scalable, Tolerant, Fair... ultimately useful: Web of Data processing
for the benefit of Humans.
Abstract: At the beginning of 2009, hundreds of million of web locations are
willing to provide structured data for integration and reuse. Despite
this, killer applications showcasing the benefits and fulfilling the
promise of the "Web of Data" have still to be seen. A closer look at
the data reveals that, in fairness, there are many reasons why
information reuse is a deceivingly complex task.
Based on the research in Sindice.com, in this talk I'll present a
series of "recipes" for Scalable, Tolerant and Fair Web Data.
I will touch aspects such as Web Data collection, lightweight
indexing, ranking and finally demonstrate how these technology can be leveraged
in user oriented applications.
- Hugo Zaragoza
Yahoo! Research, Spain
Interacting with Semantically Annotated Collections.
Abstract: Semantic annotations of text can be used today in a number of ways: to create richer interfaces
to the information locked in document collections, to help the user express its information need, and to
improve the relevance of the results obtained by the search engine. I will give an overview of our recent
work in these three areas, using example applications on online collections such as Wikipedia, financial
news and Q&As.