ICASSP 2007 - April 15-20, 2007 - Honolulu, Hawai'i, U.S.A.

TUT-5: Multimodality in Human-Computer Interfaces

Sunday Afternoon, April 15
14:00 - 17:00
Room 325B

Presented by

Michael Johnston, AT&T Labs Research and Srinivas Bangalore, AT&T Labs Research

Abstract

The ongoing convergence of the web with telephony, through technologies such as Voice over IP, high-speed mobile data networks, and handheld computers and smartphones, enables the creation of natural and highly effective multimodal interfaces for human-human communication and human-machine interaction with automated services. These interfaces allow for user input and system output to be optimally distributed over multiple different modes such as speech, pen, and graphical displays. Research on the computational processing and generation of language has primarily focussed on linear sequences of speech or text where the primitive elements are phonemes, morphemes, or words. Multimodal language can be distributed over two or three spatial dimensions as well as the temporal dimension and involve additional primitive elements such as gestures, drawings, tables, and charts. This tutorial provides an overview of the problem of multimodal language processing and detailed examples showing how representations and techniques from speech, language, and dialog processing can be extended and applied to the parsing, integration, understanding of multimodal inputs and the planning, generation, and presentation of multimodal outputs.

  1. Introduction to Multimodal Interfaces
    • Definitions of multimodality
    • Motivation for multimodal user interfaces
    • Example applications
  2. Multimodal Integration and Understanding
    • Representations for multimodal language
    • Unification-based multimodal parsing
    • Finite-state methods
    • Robust multimodal input processing
  3. Multimodal Dialog Management
    • Representation of multimodal dialog context
    • Multimodal clarification strategies
  4. Multimodal Output Generation
    • Multimedia presentation planning
    • Media synchronization
    • Generation of non-verbal behaviors
  5. Evaluating Multimodal Interfaces
    • Multimodal data collection
    • Data elicitation
    • Multimodal data annotation
  6. Challenges and Future Directions
    • Incrementality, Adaptivity, Authoring

Background Required

This tutorial is intended for students, researchers, and practioners in speech, language, and dialog processing who want to see how many of the techniques developed within the community can be applied to the creation of real-world multimodal interactive systems. It is introductory in nature and no special knowledge or background is required.

Speaker Biographies

Michael Johnston is a Senior Technical Specialist in the IP and Voice-enabled services research lab of AT&T Labs - Research. His research interests span natural language processing, spoken and multimodal interactive systems, and human-computer interaction. For the last ten years, his work has focussed on the extension of language and dialog processing technologies to support multimodal interaction. In 1999, Dr. Johnston was awarded an NSF CAREER award for research on multimodal language processing for natural interfaces. He is also active in the creation of standards supporting spoken and multimodal interface development and serves as editor-in-chief of the World Wide Web consortium EMMA: Extensible Multimodal Annotation specification. Dr. Johnston is a member of the IEEE Speech and Language technical committee (2006-2008), was an area chair for ACL 2004, and has served as a program committee member and reviewer for numerous international conferences, journals, and workshops.

Srinivas Bangalore is a Senior Technical Specialist in the IP and Voice-enabled services research lab of AT&T Labs - Research. His research areas include speech and language processing topics related to parsing, machine translation, multimodal integration, and finite-state methods. His dissertation was on a robust parsing approach called Supertagging that combines the strengths of statistical and linguistic models of language processing. During the past ten years, some of the topics he has worked on include tightly coupling speech recognition and language translation using finite-state speech translation approaches, supertag-based surface realizer for natural language generation, and finite-state based multimodal integration and understanding. Dr. Bangalore has been on the editorial board of Computational Linguistics Journal (2001-2003), the workshop chair for ACL 2004, member of IEEE Speech Technical Committee (2006-2008) and has served as a program committee member for a number of ACL and IEEE conferences and workshops.


©2010 Conference Management Services, Inc. -||- email: webmaster@icassp2007.com -||- Last updated Wednesday, April 04, 2007