ACM Sixteenth Conference on Information and Knowledge Management (CIKM)

CIKM 2007
Lisboa, Portugal

 
 
 
  Home   Program  Hotel Info  Conference/Hotel Registration   Paper Submissions   Call for Papers   Camera-ready Instructions
  Organizing Committee    Program Committee     Tutorials    Workshops     Sponsors      Keynote Speakers
  List of accepted papers   Information about Lisboa

 

There will be no ACM CIKM organized tutorials in the 2007 Conference. However, the XLDB Group at the University of Lisbon is organizing a training program that will take place at the University on Monday, November 5, 2007

Venue: Departamento de Informática,
           Faculdade de Ciências da Universidade de Lisboa,
           Campo Grande,
           1749-016 Lisboa - Portugal
           (Link to Google Maps)

Registration cost is 150 euros.

The following 1/2 day tutorials are being offered:


 

XML Retrieval: IR and DB Challenges

Sihem Amer-Yahia, Yahoo! Research New York, USA
Ricardo Baeza-Yates, Yahoo! Research Barcelona and Latinamerica 
Mariano Consens, MIE and CS, University of Toronto, Canada
Mounia Lalmas, Queen Mary, University of London, UK 

Registration Form: pdf or doc

Duration: from 3PM to 6:30 PM

Venue: Room 6.3.35

Topics: XML query processing and data management, and integration of text into XML and relational databases.

Keywords: XML Retrieval, Database and Information Retrieval Techniques, Indexing & Query Processing, Semi-structured Data and Data Models, Retrieval Evaluation

Learning Objectives:

  • learn about requirements for combined DB and IR applications that access XML content,

  • learn about models and indexes tailored to XML and their algorithms,

  • understand the specific problems of XML IR and XML query processing, and

  • learn about XML retrieval evaluation, INEX in particular.

Our main goal is to provide a clear view of the major challenges in XML Retrieval: its problems, its solutions and the pitfalls that should be avoided.

Relevance to CIKM 2007 Attendees: In addition to those attendees whose main interest are the topics directly mentioned above, additional attendees will be interested in the challenges of extending retrieval techniques to search the increasing amounts of XML content accessible in the Web and the use of XML as the encoding format for the semantic web.

Target audience: The tutorial is targeted to most CIKM attendees. The level of the tutorial can be considered Introductory. In both cases, attendees should have basic introductory knowledge about standard IR and DB models and methods.

Tutorial Abstract: The world of data has been developed from two main points of view: the structured relational data model and the unstructured text model. The two distinct cultures of databases and information retrieval now have a natural meeting place in the Web with its semi-structured XML model. As web-style searching becomes an ubiquitous tool, the need for integrating these two viewpoints becomes even more important. This tutorial will provide an overview of the different issues and approaches put forward by the IR and DB communities and survey the DB-IR integration efforts as they focus in the problem of retrieval from XML content. Both earlier proposals as well as recent ones will be discussed. A variety of application scenarios for XML Retrieval will be covered, including examples of current tools and techniques.

The tutorial will consist of two parts: the first part will cover the problem space (basic concepts, requirements, models) and the second part the solution space (approaches and techniques).

Tutorial History: This is a newly developed tutorial proposal including a substantial amount of original content and a new table of contents. The authors had separately given tutorials in related topics in DB-IR integration and XML retrieval at a number of conferences including: VLDB 2004, COMAD 2005, SIGIR 2005, ASIAN 2005, CIKM 2005, and SIGIR 2006.

Tutorial Materials: Handouts of slides and bibliography.

Presenter Bios:

Sihem Amer-Yahia joined Yahoo! Research in May 2006. Prior to that, she worked for seven years at AT&T Labs in NJ. She received her Ph.D. degree in computer science from the University Paris XI-Orsay and INRIA, France. She has worked on various aspects related to XML query processing and recently has been focusing on XML full-text search. She is a coeditor of the XQuery Full-Text Language Specification and Use Cases published by the W3C Full-Text Task Force. Sihem is also the leader of the GalaTex project, a conformance implementation of XQuery Full-Text. Sihem has been on program committees of multiple conferences such as SIGMOD, VLDB, ICDE andWWW. She was program co-chair ofWebDB 2004 and co-chaired XSym 2006.

Ricardo Baeza-Yates is director of Yahoo! Research Barcelona and Yahoo! Research Latinamerica in Santiago, Chile. Until 2005 he was an ICREA Professor at Universitat Pompeu Fabra in Barcelona and also a professor and director of the Center for Web Research, that he founded in 2002, at the CS department of the University of Chile. His research interests include information retrieval, algorithms, and information visualization. He is co-author of the book Modern Information Retrieval, published in 1999 by Addison-Wesley. He received his Ph.D. in CS from the U. of Waterloo, Canada, in 1989.

Mariano Consens research interests are in the areas of Data Management Systems and the Web, with a current focus on XML searching, autonomic systems and pervasive computing. He has over 25 publications and two patents, including journal publications selected from best conference papers. Mariano received his PhD and MSc degrees in Computer Science from the University of Toronto. He also holds a Computer Systems Engineer degree from the Universidad de la Republica, Uruguay. Consens has been a faculty member in Information Engineering at the MIE Department, University of Toronto, since 2003. Before that, he was research faculty at the School of Computer Science, University of Waterloo, from 1994 to 1999. In addition, he has been active in the software industry as a founder and CTO of several startups.

Mounia Lalmas has a PhD in Computer Science from University of Glasgow, in 1996. Presently she is a Professor of Information Retrieval at the Department of Computer Science, as Queen Mary, University of London, which she joined as a lecturer in 1999. Her research focuses on the development and evaluation of intelligent access to interactive heterogeneous and complex information repositories. She is the co-leader of the INEX initiative, with over 50 participating organizations worldwide.

Tutorial Full Description

1. Introduction

Motivations. XML and the Web. Historical perspective on DB and IR communities and semi-structured data.

2. Conceptual Framework

Goals. Data and Query Requirements. Sample Use Cases. DB-IR Integration issues

3. XML Basics and Standards

XML Model, Schemas and Summaries. XPath and XQuery. Structured text models and query algebras.

4. Querying Content

Content-only queries. Query specification and users’ expectations. Nature and form of results. Ranking Measures. Comparison.

5. Querying Content and Structure

User need specification in a precise query language. XQueryFT. XML IR models. How to obtain document and terms statistics. How to model relationships. How to deal with overlaps. How to interpret structural constraints.

6. Preprocessing and Indexing Content and Structure

Data Preparation. Indexing. XML Processing Algorithms: streaming, summaries, and indexes. Query Processing and Optimization. TopK query processing.

7. Evaluation

Why we need to evaluate. Introduction to IR evaluation. Evaluation in XML IR, in particular INEX. Document collections, Topics, Tasks, Relevance, and Metrics. Lessons learned.

8. Open Problems

Tutorial Selected Bibliography

The bibliography includes over 100 references describing specific approaches. Overall, the tutorial will be heavily based on the following publications:

  • XML, XPath, XQuery, and XQueryFT standards: WorldWideWeb Consortium (www.w3c.org).

  • Proceedings of the ACM SIGIR Workshops on XML and Information Retrieval (edited by Yoelle Maarek et al.), 2002 & 2002.

  • Proceedings of the workshops of the Initiative for the Evaluation of XML Retrieval (INEX), edited by N. Fuhr, G. Kazai, M. Lalmas et al, 2002-2006.

  • Special JASIST issue on XML and IR, 53(6): 2002. Edited by Ricardo Baeza-Yates, David Carmel, Yoelle Maarek, and Aya Sofer.

  • Proceedings of the International Workshop on XQuery Implementation, Experience and Perspectives (XIME-P) 2004, 2005, 2006.

  • Proceedings of International XML Database Symposium (XSym), 2003-2006.

  • Proceedings of Joint Workshop on XML and DB-IR Integration (edited by Ricardo Baeza-Yates, Yoelle Maarek, Thomas Roelleke, and Arjen P. de Vries), SIGIR 2004, Sheffield, 2004.


Online advertising: the underlying technologies and business models

Dr. James G. Shanahan, Independent Consultant, San Francisco, CA, USA.

Registration: pdf or doc
If interested in attending, please contact the General Chair, Mário J. Silva, mjs AT di.fc.ul.pt (cancelled)

Abstract: Internet advertising revenues in the United States totaled $16.9 billion for 2006, up 35 percent versus 2005 revenues of $12.5 billion (according to the Interactive Advertising Bureau). Fueled by these growth rates and the desire to provide added incentives and opportunities for both advertisers and publishers, alternative business models to online advertising are been developed. This tutorial will review the main business models of online advertising including: the pay-per-impression model (CPM); and the pay-per-click model (CPC); and a relative new comer, the pay-per-action model (CPA), where an action could be a product purchase, a site visit, a customer lead, or an email signup. The tutorial will also discuss in detail the technology being leveraged to automatically target ads within these business models that largely derives from the fields of machine learning, statistical, information retrieval and economics. Challenges and open issues will also be discussed.

Keywords: Online advertising, machine learning, information retrieval

Topics that will receive significant treatment include the following:

  • History of online advertising

  • Cost Per Impressions (CPM)

  • Cost Per Click Model(CPC),

  • Cost Per Action Model(CPA)

  • Auction models

  • Taxonomy of ad targeting technologies

    Statistical machine learning

Information retrieval

Online learning

Design of Experiments (DOE)

  • Click Fraud

  • Estimating click-thru-rates and action rates

  • Ethical, legal and privacy challenges

Presenter Bio

Jimi has spent the last 20 years developing and researching cutting-edge information management systems to harness information retrieval, linguistics and machine learning. Prior to being an independent consultant, Jimi was Chief Scientist (and member of executive team) at Turn Inc. where he focused on the development and deployment of an online ad targeting system (CPA/CPC/CPM-based) in a principled and measured way that leveraged advanced statistical and machine learning techniques;These responsibilities included leveraging the entire reservoir of data assets in order to develop methods for identifying key optimizations, deploying relevant analytical tools and improving the user experience. Prior to joining Turn, Jimi was Principal Research Scientist at Clairvoyance Corporation where he led the Knowledge Discovery from Text Group. Before that he was a Research Scientist at Xerox Research Center Europe (XRCE), where, as a member of the Co-ordination Technologies Group, he developed Document Souls, a patented document-centric approach to information access. In the early 90s, he worked on the AI Team within the Mitsubishi Group in Tokyo.

He has published six books and over 50 research publications in the area of machine learning and information processing. Jimi is General Chair for CIKM 2008. Jimi received his Ph.D. in engineering mathematics from the University of Bristol, U. K. and holds a bachelor of science degree in computer science from the University of Limerick, Ireland. He is a Marie Curie fellow and member of IEEE and ACM.

 

 

 

Last update: 11/03/2007