MSU CSE Colloquium Series 2017-2018: Dr. Jianrong Wang

A Multi-task Learning framework for Head Pose Estimation and Actor-Action Semantic Video Segmentation

Dr. Yan Yan
Research Fellow
University of Michigan

Friday, November 10, 2017
11 AM - 12 PM
EB 3105

Multi-task learning, as one important branch of machine learning, has developed very fast during the past decade. Multi-task learning methods aim to simultaneously learn classification or regression models for a set of related tasks. This typically leads to better models as compared to a learner that does not account for task relationships. In this talk, we will investigate a multi-task learning framework for head pose estimation and actor-action segmentation. (1) Head pose estimation from low-resolution surveillance data has gained in importance. However, monocular and multi-view head pose estimation approaches still work poorly under target motion, as facial appearance distorts owing to camera perspective and scale changes when a person moves around. We propose FEGA-MTL, a novel framework based on multi-task learning for classifying the head pose of a person who moves freely in an environment monitored by multiple, large field-of-view surveillance cameras. Upon partitioning the monitored scene into a dense uniform spatial grid, FEGA-MTL simultaneously clusters grid partitions into regions with similar facial appearance, while learning region-specific head pose classifiers. (2) Fine-grained activity understanding in videos has attracted considerable recent attention with a shift from action classification to detailed actor and action understanding that provides compelling results for perceptual needs of cutting-edge autonomous systems. However, current methods for detailed understanding of actor and action have significant limitations: they require large amounts of finely labeled data, and they fail to capture any internal relationship among actors and actions. To address these issues, we propose a novel, robust multi-task ranking model for weakly-supervised actor-action segmentation where only video-level tags are given for training samples. Our model is able to share useful information among different actors and actions while learning a ranking matrix to select representative supervoxels for actors and actions respectively.

Yan Yan is currently a research fellow with EECS in the University of Michigan, Ann Arbor. He received the PhD degree in computer science from the University of Trento Italy, and the M.S. degree from Georgia Institute of Technology. He was a visiting scholar with Carnegie Mellon University in 2013 and a visiting research fellow with the Advanced Digital Sciences Center (ADSC), UIUC, Singapore in 2015. His research interests include computer vision, machine learning, and multimedia. He received the Best Student Paper Award in ICPR 2014 and Best Paper Award in ACM Multimedia 2015. He has published papers in CVPR / ICCV / ECCV / TPAMI / AAAI / IJCAI / ACM Multimedia. He has been PC members for several major conferences and reviewers for referred journals in computer vision and multimedia. He served as a guest editor in TPAMI, CVIU and TOMM. He is a member of the IEEE and the ACM.

Dr. Xiaoming Liu