Current Projects in Human Language Technology Group


   

Learning and Optimization for Multimodal Interpretation in Conversation SystemsText Box:

(PI: J. Chai, NSF Career Award)

Multimodal systems allow users to interact with computers through multiple modalities such as speech, gesture, and gaze. These systems are designed to support transparent, efficient, and natural means of human computer interaction. Understanding what the user intends to communicate is one of the most significant challenges for multimodal systems.  Despite recent progress in multimodal interpretation, when unexpected inputs (e.g., inputs that are outside of system knowledge) or unreliable inputs (e.g., inputs that are not correctly recognized) are encountered, these systems tend to fail. Variations in vocabulary and multimodal synchronization patterns, disfluencies in speech utterances, and ambiguities in gestures can seriously impair interpretation performance. This project seeks to improve the robustness of multimodal interpretation by adapting system interpretation capability over time through automated knowledge acquisition and optimizing interpretation through probabilistic reasoning. (Picture: Ph.D. student Zahar Prasov interacts with a system using speech and gesture)

Recent Papers: 

* Salience Modeling based on Non-verbal Modalities for Spoken Language Understanding. S. Qu and J. Chai. ACM 8th International Conference on Multimodal Interfaces (ICMI), pp. 193-200,  Banff, Canada, November 2-4, 2006. 

* Cognitive Principles in Robust Multimodal Interpretation J. Chai, Z. Prasov, and S. Qu. Journal of Artificial Intelligence Research, Vol 27, pp. 55-83, 2006. 

* Linguistic Theories in Efficient Multimodal Reference Resolution: an Empirical Investigation. J. Chai, Z. Prasov, J. Blaim, and R. Jin. The 10th International Conference on Intelligent User Interfaces (IUI-05), pp. 43-50, ACM, San Diego, CA, January 9-12, 2005.

* Optimization in Multimodal Interpretation. J. Chai, P. Hong, M. Zhou, and Z. Prasov. The 42nd Annual Conference of Association of Computational Linguistics (ACL), pp. 1-8, Barcelona, Spain. July 22-24, 2004. 

 

=============================================================================================

 

Text Box:  Eye Gaze in Salience Modeling for Robust Spoken Language Processing

(PI: J. Chai, Co-PI: F. Ferreira, Funded by NSF)

Previous psycholinguistic work has shown that eye gaze is tightly linked to human language processing. Almost immediately after hearing a word, the eyes move to the corresponding real-world referent. And right before speaking a word, the eyes also move to the mentioned object. Not only is eye gaze highly reliable, it is also an implicit, subconscious reflex of speech. The user does not need to make a conscious decision; the eye automatically moves towards the relevant object, without the user even being aware. Motivated by these psycholinguistic findings, our hypothesis is that during human machine conversation, user eye gaze information coupled with conversation context can signal a part of the physical world (related to the domain and the graphic interface) that is most salient at each point of communication. This salience in the physical world will in turn prime what users communicate to the system, and thus can be used to tailor the interpretation of speech input. Based on this hypothesis, this project examines the role of eye gaze in human language production during human machine conversation and develops algorithms and systems that incorporates gaze-based salience modeling to robust spoken language understanding. (Picture: Smoothed eye gaze fixations on the graphic display recorded during a user talks to the system)

Recent papers

* Automated Vocabulary Acquisition and Interpretation in Multimodal Conversational Systems. Y. Liu, J. Chai, and R. Jin. The 45th Annual Meeting of the Association of Computational Linguistics (ACL), Prague, Czech Republic, June 23-30, 2007. 

* An Exploration of Eye Gaze in Spoken Language Processing for Multimodal Conversational Interfaces. S. Qu and J. Chai.  2007 Meeting of the North American Chapter of the Association of Computational Linguistics (NAACL-07), Rochester NY , April, 2007.

* Eye Gaze for Attention Prediction in Multimodal Human Machine Conversation. Z. Prasov, J. Chai, and H. Jeong. The AAAI 2007 Spring Symposium on Interaction Challenges for Artificial Assistants Palo Alto , CA. March 2007. 

 

========================================================================================================

 

Discourse Processing in Conversational QAText Box:

(PI: J. Chai, Co-PI: R. Jin, Funded by DTO)

  Question answering (QA) systems take users'  natural language questions and automatically locate answers from large collections of documents. During the interactive QA, user questions are not only guided by users'  information goals, but are also influenced by system responses. User information needs are gradually evolved as the QA session proceeds. Thus it is important to keep track of the interaction context and use the context to interpret user information needs, retrieve relevant information, and control the interaction. Therefore, this project aims to conduct a systematic investigation on how to represent interaction context (i.e., discourse), how to achieve such representation automatically, and how to effectively use the discourse representation in answer retrieval and dialog management. The systematic studies will help identify the appropriate level of discourse representation that will maximize the tradeoff between the impact and limitations of discourse modeling for conversational QA. (Picture: Ph.D. student Matt Gerber demonstrates an interactive QA system, courtesy of GLITR)

 

Recent papers: 

* Discourse Processing for Context Question Answering based on Linguistic Knowledge. M. Sun and J. Chai. Knowledge-based Systems, Special Issues on Intelligent User Interfaces, Volume 20, Issue 6, pp. 511-526, August 2007.

Towards Conversational QA: Automated Identification of Problematic Situations and User Intent J. Chai, C. Zhang, and T. Baldwin. Proceedings of the International Conference on Computational Linguistics/Association for Computational Linguistics (COLING/ACL-06) Poster Session,  pp. 57-64, Sydney , Australia , July 17-21, 2006.

* Automated Performance Assessment in Interactive Question Answering  J. Chai, T. Baldwin, and C. Zhang. Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval (SIGIR2006), pp. 631-632, Seattle, USA, August 6-11, 2006.