Keynote Speakers

Prof. Yihong Gong is a distinguished professor, an IEEE Fellow, the dean of School of Software Engineering of Xi’an Jiaotong University, a vice director of the National Engineering Laboratory for Visual Information Processing, and the acting vice director of the Shaanxi Province Joint Key AI Laboratories. His research interests include image/video content analysis, machine learning algorithms and human brain-inspired neural network models. He is among the first batch of researchers in the world initiating research studies on content-based image retrieval, sports video event detection, text/video content summarization, and image classification using the sparse coding image features. He has published more than 300 technical papers and two monographs. To date, his works have received more than 27,300 citations (Google h-index=69), with over 3,800 citations for his most cited paper. In 2015, his ACM SIGIR 2003 paper titled “Document Clustering Based on Non-Negative Matrix Factorization” received “Test of Time Award” Honorable Mentions by the ACM SIGIR Technical Committee. Under his supervision, his teams have won numerous international/domestic competitions in image/video content recognitions.

Title: Brain-Inspired Machine Learning Methods

Abstract: Existing Deep Neural Networks (DNN) face the following three fundimental problems: (1) Rely on continuously increasing the network complexities and training data scales to improve the performance accuracies; (2) have the “texture-bias” problem when apllied for image classification; (3) have the “catastrophic forgetting” problem during the continual (or incremental) learning process. In this talk, I will present several recent research results that are inspired by human brain visual cognitive mechanisms. First, we propose the CNN structure that is inspired by the dual-pathway object cognitive mechanism of the human visual system, which has proven to be effective for solving the “texture bias” problem of the existing DCNN models. To solve the “catastropic forgetting” problem that occurs during the continual learning process, we propose the Anchor Loss objective function that requires the DCNN model to keep the topological structure of the learned feature space. This work is inspired by the latest cognitive scientific research on human visual memory. Finally, I will present our few-shot continual learning mthod that can learn new image categories with few training samples, while keeping the image classification accuracies on the old image categories. These proposed methods are independent of, and can be applied to any DCNN models. Comprehensive performance evaluations show remarkable performance improvements of the representative DCNN models on the respective tasks without increasing their model complexities.

Prof. Junwei Han is currently a Professor with the School of Automation, Northwestern Polytechnical University, China. His research interests include computer vision, pattern recognition, remote sensing image analysis, and brain imaging analysis. He has published more than 70 articles in top journals, such as the IEEE Transactions on Pattern Analysis and Machine Intelligence, the IEEE Transactions on Neural Networks and Learning Systems, and the International Journal of Computer Vision and more than 30 papers in top conferences, such as CVPR, ICCV, MICCAI, and IJCAI. He is an Associate Editor for several journals, such as the IEEE Transactions on Neural Networks and Learning Systems and the IEEE Transactions on Multimedia.

Title: Modern Learning Methodologies for Co-Saliency Detection

Abstract: Visual saliency computing aims to imitate the human visual attention mechanism to identify the most prominent or unique areas or objects from a visual scene. It is one of the basic low-level image processing techniques and can be applied to many downstream computer vision tasks. From the perspective of traditional research, visual saliency computing can be divided into eye fixation prediction and salient object detection. However, recent research progress shows that many new research directions and branches have emerged in this field, including weak/semi-unsupervised saliency learning, co-saliency detection, and multi-mode saliency detection. This report will focus on the key issues in co-saliency detection, introduce co-saliency methods based on advanced learning methods such as multi-instance learning, metric learning, and deep learning, and discuss potential future research directions in this research area.

Dr. Chuang Gan is a principal research staff member at MIT-IBM Watson AI Lab. He is also a visiting research scientist at MIT, working closely with Prof. Antonio Torralba and Prof. Josh Tenenbaum. Before that, he completed his Ph.D. with the highest honor at Tsinghua University, supervised by Prof. Andrew Chi-Chih Yao. His research interests include representation learning, neural-symbolic visual reasoning, audio-visual scene analysis, and robot learning. His research works have been recognized by Microsoft Fellowship, Baidu Fellowship, and media coverage from CNN, BBC, The New York Times, WIRED, Forbes, and MIT Tech Review. He has also served as an area chair of ICCV, ACL, ICLR, ACM Multimedia, and an associate editor of IEEE Transactions on Image Processing and IEEE Transactions on Circuits and Systems for Video Technology.

Title: Human-Centric Audio Visual Learning

Abstract: I will introduce audio-visual learning, a cognitive-inspired framework that capitalizes on the natural synchronization of visual and audio modalities to arrive at a rich understanding of physical events and human activity. I will demonstrate that by seeing and hearing unlabeled videos, our system can learn in an unsupervised way to locate the image region that produces sounds and to separate the input sounds into a set of components representing the sound from each pixel. These multimodal learning techniques mimic how humans learn, so we can build more versatile applications and train AI models that learn more from fewer data.