Disentangled Representation Learning GAN for Pose-Invariant Face Recognition

The large pose discrepancy between two face images is one of the key challenges in face recognition. The conventional approach to pose-robust face recognition either performs face frontalization on, or learns a pose-invariant representation from, a non-frontal face image. We argue that, it is more desirable to perform both tasks jointly to allow them to leverage each other. To this end, this paper proposes Disentangled Representation Learning-Generative Adversarial Network (DR-GAN) with three distinct novelties. First, the encoder-decoder structure of the generator allows DR-GAN to learn the identity representation for each face image, in addition to image synthesis. Second, this representation is explicitly disentangled from other face variations such as pose, through the pose code provided to the decoder and pose estimation in the discriminator. Third, DR-GAN can take one or multiple images as the input, and generate one integrated representation along with an arbitrary number of synthetic images. Quantitative and qualitative evaluation on both constrained and unconstrained databases demonstrate the superiority of DR-GAN over the state of the art.

Learning to Fuse Information with Missing Modalities

One of the key GEOINT capabilities is to be able to automatically recognize a large array of objects from visual data. Depending on the resolution of imagery, objects may range from specific locations or scenes, road, building, forest to vehicle, human, etc. This is clearly a technically challenging problem for both computer vision and machine learning due to the large variations in the appearance of these objects exhibited in the imagery. To address this problem, researchers have developed various fusion methods that combine information collected from multiple sensing modalities, such as RGB imagery, LiDAR point cloud, multispectral imaging, hyperspectral imaging, and GPS, to improve the reliability and accuracy of object recognition. This research direction is motivated by the ever-decreasing sensoring cost, and more importantly, by the complementary characteristics among multiple sensing modalities. Therefore, with the well-funded promise of escalating object recognition performance, a great deal of data analysis research is in urgent need in order to fully take advantage of this massive amount of multi-modality data.

All the prior research oninformation fusion requires that the sensor data of all modalities are available for every training data instance. This requirement significantly limits the application of information fusion methods as missing modalities abound in practical applications.

In recognizing missing modalities as a roadblock toward fulfilling the key GEOINT capability, we propose to develop powerful and computationally efficient approaches that can learn to fuse information from different sensors when a significant portion of training data has missing modalities. The ultimate goal of our project is to develop a suite of computer vision and machine learning tools for geographical imagery analysis that can serve as an aid for geo-spatial analysts to facilitate the analysis and classification of geographical images.

Person tracking and motion analysis for medical applications

New RGBD sensors enable precise tracking of human motions. This can be used for medical appications such as home care.