Current Projects in Human Language Technology Group


Learning and Optimization for Multimodal Interpretation in Conversation SystemsText Box:

(PI: J. Chai, NSF Career Award)

Multimodal systems allow users to interact with computers through multiple modalities such as speech, gesture, and gaze. These systems are designed to support transparent, efficient, and natural means of human computer interaction. Understanding what the user intends to communicate is one of the most significant challenges for multimodal systems.  Despite recent progress in multimodal interpretation, when unexpected inputs (e.g., inputs that are outside of system knowledge) or unreliable inputs (e.g., inputs that are not correctly recognized) are encountered, these systems tend to fail. Variations in vocabulary and multimodal synchronization patterns, disfluencies in speech utterances, and ambiguities in gestures can seriously impair interpretation performance. This project seeks to improve the robustness of multimodal interpretation by adapting system interpretation capability over time through automated knowledge acquisition and optimizing interpretation through probabilistic reasoning. (Picture: Ph.D. student Zahar Prasov interacts with a system using speech and gesture)

Text Box:  Eye Gaze in Salience Modeling for Robust Spoken Language Processing

(PI: J. Chai, Co-PI: F. Ferreira, Funded by NSF)

Previous psycholinguistic work has shown that eye gaze is tightly linked to human language processing. Almost immediately after hearing a word, the eyes move to the corresponding real-world referent. And right before speaking a word, the eyes also move to the mentioned object. Not only is eye gaze highly reliable, it is also an implicit, subconscious reflex of speech. The user does not need to make a conscious decision; the eye automatically moves towards the relevant object, without the user even being aware. Motivated by these psycholinguistic findings, our hypothesis is that during human machine conversation, user eye gaze information coupled with conversation context can signal a part of the physical world (related to the domain and the graphic interface) that is most salient at each point of communication. This salience in the physical world will in turn prime what users communicate to the system, and thus can be used to tailor the interpretation of speech input. Based on this hypothesis, this project examines the role of eye gaze in human language production during human machine conversation and develops algorithms and systems that incorporates gaze-based salience modeling to robust spoken language understanding. (Picture: Smoothed eye gaze fixations on the graphic display recorded during a user talks to the system)

Discourse Processing in Conversational QAText Box:

(PI: J. Chai, Co-PI: R. Jin, Funded by DTO)

  Question answering (QA) systems take users'  natural language questions and automatically locate answers from large collections of documents. During the interactive QA, user questions are not only guided by users'  information goals, but are also influenced by system responses. User information needs are gradually evolved as the QA session proceeds. Thus it is important to keep track of the interaction context and use the context to interpret user information needs, retrieve relevant information, and control the interaction. Therefore, this project aims to conduct a systematic investigation on how to represent interaction context (i.e., discourse), how to achieve such representation automatically, and how to effectively use the discourse representation in answer retrieval and dialog management. The systematic studies will help identify the appropriate level of discourse representation that will maximize the tradeoff between the impact and limitations of discourse modeling for conversational QA. (Picture: Ph.D. student Matt Gerber demonstrates an interactive QA system, courtesy of GLITR)


