ACM Sixteenth Conference on Information and Knowledge Management (CIKM)
|
CIKM 2007 |
|
| Keynote Speakers | |
|
Prabhakar Raghavan Head of Yahoo! Research |
Web Search: From Information Retrieval to Microeconomic Modeling
- In scarcely a decade, web search has gone from simply scaling traditional information retrieval, to a groundswell of new opportunities that are changing marketing as we know it. In this lecture we begin by reviewing this progress, pointing out that web search is no longer a purely computer science problem. We then hint at the role of other disciplines in this ongoing revolution and a number of directions for research
Prabhakar Raghavan has been Head of Yahoo! Research since July 2005. His research interests include text and web mining, and algorithm design. He is a Consulting Professor of Computer Science at Stanford University and Editor-in-Chief of the Journal of the ACM. Raghavan received his PhD from Berkeley and is a Fellow of the ACM and of the IEEE. Prior to joining Yahoo, he was Senior Vice-President and Chief Technology Officer at Verity; before that he held a number of technical and managerial positions at IBM Research.
|
|
Dan Suciu University of Washington, USA |
Management of Data with Uncertainties - Many applications today need to manage large volumes of uncertain data, such as fuzzy object matchings, uncertain schema mappings, data extracted by IE systems, exploratory queries in databases, RFID and sensor data. I this talk I will discuss probabilistic databases as a unified framework for managing large volumes of uncertain data. The central problem in probabilistic databases is the query evaluation problem, which is a particular instance of probabilistic inference. I will describe a general approach to evaluating SQL queries over large probabilistic databases that reuses much of the existing query processing and query optimization infrastructure. Then I will discuss future research directions in management of probabilistic data. Dan Suciu is a professor at the University of Washington. He received his Ph.D. from the University of Pennsylvania in 1995, then was a principal member of the technical staff at AT&T Labs until he joined the University of Washington in 2000. Suciu is conducting research in data management, with an emphasis on topics that arise from sharing data on the Internet, such as management of semistructured and heterogeneous data, data security, and managing data with uncertainties. He is a co-author of the book Data on the Web: from Relations to Semistructured Data and XML, holds six US patents, received the 2000 ACM SIGMOD Best Paper Award, is a recipient of the NSF Career Award and of an Alfred P. Sloan Fellowship.
|
|
Fernando Pereira University of Pennsylvania, USA |
Learning
to Join Everything -Text, speech, images, video, DNA sequences provide
information about entities that people can recognize when looking at a
particular instance. But those entities and their attributes and
relationships are not directly accessible to queries that join across types
of sources. Information extraction methods based on supervised machine
learning recognize mentions of entities and relationships of predefined
types in different kinds of sources, which can then be used to answer some
useful types of queries. However, supervised learning relies on
hand-annotated training sets that are difficult to create and limit what
types of entities and relationships can be joined for new applications.
These limitations have prompted research into unsupervised extraction
methods that rely on correlations among sources rather than hand-annotated
training sets. While these methods are not yet as accurate as those based on
supervised learning, they have the potential for a new query-by-example
approach to information integration in which seed sets of query answers are
expanded into ranked lists of potential answers by learning occurrence
patterns from the seed answers. I will give examples of both types of
methods from our research on biomedical information extraction, leading to
some ideas on a possible convergence of search and databases through machine
learning.
Fernando Pereira is the Andrew and Debra Rachleff Professor and chair of the department of Computer and Information Science, University of Pennsylvania. He received a Ph.D. in Artificial Intelligence from the University of Edinburgh in 1982. Before joining Penn, he held industrial research and management positions at SRI International, at AT&T Labs, where he led the machine learning and information retrieval research department from September 1995 to April 2000, and at WhizBang Labs, a Web information extraction company. His main research interests are in machine-learnable models of language and other natural sequential data such as biological sequences. He made major contributions to advances in finite-state models for speech and text processing now in everyday industrial use. He has over 100 research publications on computational linguistics, speech recognition, machine learning and logic programming, and several issued and pending patents on speech recognition, language processing, and human-computer interfaces. He was elected Fellow of the American Association for Artificial Intelligence in 1991 for his contributions to computational linguistics and logic programming, and he is a past president of the Association for Computational Linguistics.
|
Last update: 12/02/2007