ACM Sixteenth Conference on Information and Knowledge Management (CIKM)
|
CIKM 2007 |
|
|
There will be no ACM CIKM organized tutorials in the 2007 Conference. However, the XLDB Group at the University of Lisbon is organizing a training program that will take place at the University on Monday, November 5, 2007 Venue: Departamento
de Informática, Registration cost is 150 euros. The following 1/2 day tutorials are being offered:
XML Retrieval: IR and DB Challenges
Duration: from 3PM to 6:30 PM Venue: Room 6.3.35 Topics: XML query processing and data management, and integration of text into XML and relational databases. Keywords: XML Retrieval, Database and Information Retrieval Techniques, Indexing & Query Processing, Semi-structured Data and Data Models, Retrieval Evaluation Learning Objectives:
Our main goal is to provide a clear view of the major challenges in XML Retrieval: its problems, its solutions and the pitfalls that should be avoided. Relevance to CIKM 2007 Attendees: In addition to those attendees whose main interest are the topics directly mentioned above, additional attendees will be interested in the challenges of extending retrieval techniques to search the increasing amounts of XML content accessible in the Web and the use of XML as the encoding format for the semantic web. Target audience: The tutorial is targeted to most CIKM attendees. The level of the tutorial can be considered Introductory. In both cases, attendees should have basic introductory knowledge about standard IR and DB models and methods. Tutorial Abstract: The world of data has been developed from two main points of view: the structured relational data model and the unstructured text model. The two distinct cultures of databases and information retrieval now have a natural meeting place in the Web with its semi-structured XML model. As web-style searching becomes an ubiquitous tool, the need for integrating these two viewpoints becomes even more important. This tutorial will provide an overview of the different issues and approaches put forward by the IR and DB communities and survey the DB-IR integration efforts as they focus in the problem of retrieval from XML content. Both earlier proposals as well as recent ones will be discussed. A variety of application scenarios for XML Retrieval will be covered, including examples of current tools and techniques. The tutorial will consist of two parts: the first part will cover the problem space (basic concepts, requirements, models) and the second part the solution space (approaches and techniques). Tutorial History: This is a newly developed tutorial proposal including a substantial amount of original content and a new table of contents. The authors had separately given tutorials in related topics in DB-IR integration and XML retrieval at a number of conferences including: VLDB 2004, COMAD 2005, SIGIR 2005, ASIAN 2005, CIKM 2005, and SIGIR 2006. Tutorial Materials: Handouts of slides and bibliography. Presenter Bios: Sihem Amer-Yahia joined Yahoo! Research in May 2006. Prior to that, she worked for seven years at AT&T Labs in NJ. She received her Ph.D. degree in computer science from the University Paris XI-Orsay and INRIA, France. She has worked on various aspects related to XML query processing and recently has been focusing on XML full-text search. She is a coeditor of the XQuery Full-Text Language Specification and Use Cases published by the W3C Full-Text Task Force. Sihem is also the leader of the GalaTex project, a conformance implementation of XQuery Full-Text. Sihem has been on program committees of multiple conferences such as SIGMOD, VLDB, ICDE andWWW. She was program co-chair ofWebDB 2004 and co-chaired XSym 2006. Ricardo Baeza-Yates is director of Yahoo! Research Barcelona and Yahoo! Research Latinamerica in Santiago, Chile. Until 2005 he was an ICREA Professor at Universitat Pompeu Fabra in Barcelona and also a professor and director of the Center for Web Research, that he founded in 2002, at the CS department of the University of Chile. His research interests include information retrieval, algorithms, and information visualization. He is co-author of the book Modern Information Retrieval, published in 1999 by Addison-Wesley. He received his Ph.D. in CS from the U. of Waterloo, Canada, in 1989. Mariano Consens research interests are in the areas of Data Management Systems and the Web, with a current focus on XML searching, autonomic systems and pervasive computing. He has over 25 publications and two patents, including journal publications selected from best conference papers. Mariano received his PhD and MSc degrees in Computer Science from the University of Toronto. He also holds a Computer Systems Engineer degree from the Universidad de la Republica, Uruguay. Consens has been a faculty member in Information Engineering at the MIE Department, University of Toronto, since 2003. Before that, he was research faculty at the School of Computer Science, University of Waterloo, from 1994 to 1999. In addition, he has been active in the software industry as a founder and CTO of several startups. Mounia Lalmas has a PhD in Computer Science from University of Glasgow, in 1996. Presently she is a Professor of Information Retrieval at the Department of Computer Science, as Queen Mary, University of London, which she joined as a lecturer in 1999. Her research focuses on the development and evaluation of intelligent access to interactive heterogeneous and complex information repositories. She is the co-leader of the INEX initiative, with over 50 participating organizations worldwide. Tutorial Full Description 1. Introduction Motivations. XML and the Web. Historical perspective on DB and IR communities and semi-structured data. 2. Conceptual Framework Goals. Data and Query Requirements. Sample Use Cases. DB-IR Integration issues 3. XML Basics and Standards XML Model, Schemas and Summaries. XPath and XQuery. Structured text models and query algebras. 4. Querying Content Content-only queries. Query specification and users’ expectations. Nature and form of results. Ranking Measures. Comparison. 5. Querying Content and Structure User need specification in a precise query language. XQueryFT. XML IR models. How to obtain document and terms statistics. How to model relationships. How to deal with overlaps. How to interpret structural constraints. 6. Preprocessing and Indexing Content and Structure Data Preparation. Indexing. XML Processing Algorithms: streaming, summaries, and indexes. Query Processing and Optimization. TopK query processing. 7. Evaluation Why we need to evaluate. Introduction to IR evaluation. Evaluation in XML IR, in particular INEX. Document collections, Topics, Tasks, Relevance, and Metrics. Lessons learned. 8. Open Problems Tutorial Selected Bibliography The bibliography includes over 100 references describing specific approaches. Overall, the tutorial will be heavily based on the following publications:
Online
advertising: the underlying technologies and business models
Abstract:
Internet advertising revenues in the United States totaled $16.9 billion for
2006, up 35 percent versus 2005 revenues of $12.5 billion (according to the
Interactive Advertising Bureau). Fueled by these growth rates and the desire
to provide added incentives and opportunities for both advertisers and
publishers, alternative business models to online advertising are been
developed. This tutorial will review the main business models of online
advertising including: the pay-per-impression model (CPM); and the
pay-per-click model (CPC); and a relative new comer, the pay-per-action
model (CPA), where an action could be a product purchase, a site visit, a
customer lead, or an email signup. The tutorial will also discuss in detail
the technology being leveraged to automatically target ads within these
business models that largely derives from the fields of machine learning,
statistical, information retrieval and economics. Challenges and open issues
will also be discussed. Keywords: Online advertising, machine learning, information retrieval Topics that will receive significant treatment include the following:
Presenter Bio Jimi has spent the
last 20 years developing and researching cutting-edge information management
systems to harness information retrieval, linguistics and machine learning.
Prior to being an independent consultant, Jimi was Chief Scientist (and
member of executive team) at Turn Inc. where he focused on the development
and deployment of an online ad targeting system (CPA/CPC/CPM-based) in a
principled and measured way that leveraged advanced statistical and machine
learning techniques;These responsibilities included leveraging the entire
reservoir of data assets in order to develop methods for identifying key
optimizations, deploying relevant analytical tools and improving the user
experience. Prior to joining Turn, Jimi was Principal Research Scientist at
Clairvoyance Corporation where he led the Knowledge Discovery from Text
Group. Before that he was a Research Scientist at Xerox Research Center
Europe (XRCE), where, as a member of the Co-ordination Technologies Group,
he developed Document Souls, a patented document-centric approach to
information access. In the early 90s, he worked on the AI Team within the
Mitsubishi Group in Tokyo.
|
Last update: 11/03/2007