CVPR 2020 Workshop On Towards Human-Centric Image/Video Synthesis, and the 4th Look Into Person (LIP) Challenge

Introduction

Human-centric image/video synthesis has been intensely researched in computer vision, with the wide application domains such as human motion transfer, virtual try-on, virtual reality, and human-computer interaction. Developing solutions to understand the human-centric image/video synthesis in the practicable scenarios, regarded as one of the most fundamental problems in computer vision, could have a crucial impact in many industrial application domains. Those bring great convenience for the public. However, there exists a large gap between the human-centric synthesis technique and its carrying out applied in the practical scenarios. What is needed by the real-life applications? What is achievable based on modern computer vision techniques? Those all raise the researchers’ attentions and discussions. More human image synthesis, virtual try-on, and 3D graphic analysis research advances are urgently expected for advanced human-centric synthesis. For example, the 2D/3D clothes virtual try-on simulation system that seamlessly fits various clothes into 3D human body shape has attracted numerous commercial interests. The human motion synthesis and prediction can bridge the virtual and real worlds, such as, simulating virtual characters to mimic the human behaviors, empowering robotics more intelligent interactions with human by enabling causal inferences for human activities. The goal of this workshop is to allow researchers from the fields of human-centric image/video synthesis to present their progress, communication and co-develop novel ideas that potentially shape the future of this area and further advance the performance and applicability of correspondingly built systems in real-world conditions. This workshop is designed to build up consensus on the emerging topic of the human-centric image/video synthesis, by clarifying the motivation, the typical methodologies, the prospective trends, and the potential industrial applications.

Tentative SCHEDULE

Time	Schedule
Location:	Date: Friday, 19 June 2020 from 13:20 pm PDT to 18:30 pm PDT. (All times are Pacific Daylight Time, Seattle time).
13:20-13:40	Opening remarks and best paper talk [YouTube Video1] [Bilibili Video1] [YouTube Video2]
13:40-14:20	Invited talk 1: Ira Kemelmacher-Shlizerman, Associate Professor, University of Washington [YouTube] [Bilibili] Talk title: Human Modeling and Synthesis
14:20:15:00	Invited talk 2: William T. Freeman, Professor, MIT [YouTube] [Bilibili] Talk title: Learning from videos playing forwards, backwards, fast, and slow Show Abstract
	Abstract: How can we tell that a video is playing backwards? People's motions look wrong when the video is played backwards--can we develop an algorithm to distinguish forward from backward video? Similarly, can we tell if a video is sped-up? We have developed algorithms to distinguish forwards from backwards video, and fast from slow. Training algorithms for these tasks provides a self-supervised task that facilitates human activity recognition. We'll show these results, and applications of these unsupervised video learning tasks. Joint work with: Donglai Wei, Joseph Lim, Andrew Zisserman, Sagie Benaim, Ariel Ephrat, Oran Lang, Inbar Mosseri, Michael Rubinstein, Michal Irani, Tali Dekel
15:00-15:15	Winner talk 1: Winner of the Multi-Person Human Parsing Challenge [YouTube] [Bilibili] [Slide]
15:15-15:30	Winner talk 2: Winner of the Video Multi-Person Human Parsing Challenge [YouTube] [Bilibili] [Slide]
15:30-16:10	Invited talk 3: Ming-Hsuan Yang, Professor, University of California at Merced [YouTube] [Bilibili] Talk title: Synthesizing Human Images in 2D and 3D Scenes Show Abstract
	Abstract: In this talk, I will present our recent results on synthesizing human images in 2D and 3D scenes. In the first part, I will present a context-aware approach to synthesize and place object instances in an image with semantically coherent contents. In the second part, I will describe a method to synthesizing 3D humans with varying pose in indoors in an image by inferring 3D layout and context. When time allows, I will also present an algorithm to model music-to-dance generation process for synthesizing realistic, diverse, style-consistent, and beat-matching dances from music.
16:10-16:50	Invited talk 4: Jun-Yan Zhu, Assistant Professor, Carnegie Mellon University [YouTube] [Bilibili] Talk title: Visualizing and Understanding GANs Show Abstract
	Abstract: Generative Adversarial Networks (GANs) have recently achieved impressive results for a wide range of real-world applications, and many GAN variants have emerged with improvements in sample quality and training stability. However, they have not been well visualized or understood. How does a GAN represent our visual world internally? What causes the artifacts in GAN results? How do architectural choices affect GAN learning? Answering such questions could enable us to develop new insights and better models. In this talk, I will present several analytic tools to visualize and understand GANs at the unit-, object-, and scene-level. Collectively, these tools highlight what a GAN has learned and has not. We show several practical applications enabled by our method, from comparing internal representations across different layers, models, and datasets, to improving GANs by locating and removing artifact-causing units, to interactively manipulating objects in a real image.
16:50-17:05	Winner talk 3: Winner of the Image-based Multi-pose Virtual Try-on Challenge [YouTube] [Bilibili] [Slide]
17:05-17:20	Winner talk 4: Winner of the Video Virtual Try-on Challenge [YouTube] [Bilibili] [Slide]
17:20-17:35	Winner talk 5: Winner of the Dark Complexion Portrait Segmentation Challenge [YouTube] [Bilibili] [Slide]
17:35-18:30	Oral: Epipolar Transformer for Multi-view Human Pose Estimation.
17:35-18:30	Oral: Yoga-82: A New Dataset for Fine-grained Classification of Human Poses.
17:35-18:30	Oral: The MTA Dataset for Multi Target Multi Camera Pedestrian Tracking by Weighted Distance Aggregation.
17:35-18:30	Poster: LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking.
17:35-18:30	Poster: Fine grained pointing recognition for natural drone guidance.
17:35-18:30	Poster: Reposing Humans by Warping 3D Features.

Accepted Papers

Epipolar Transformer for Multi-view Human Pose Estimation. Yihui He (Carnegie Mellon University)*; Rui Yan (Carnegie Mellon University); Katerina Fragkiadaki (Carnegie Mellon University); Shoou-I Yu (Oculus Research Pittsburgh) (Oral, Best Paper)
Yoga-82: A New Dataset for Fine-grained Classification of Human Poses. Manisha Verma (Osaka University)*; Sudhakar Kumawat (Indian Institute of Technology Gandhinagar); Yuta Nakashima (Osaka University); Shanmuganathan (Indian Institute of Technology (IIT) Gandhinagar) (Oral)
The MTA Dataset for Multi Target Multi Camera Pedestrian Tracking by Weighted Distance Aggregation. Philipp Köhl (Fraunhofer IOSB); Andreas Specker (Fraunhofer IOSB); Arne Schumann (Fraunhofer IOSB)* (Oral)
LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking. Guanghan Ning (JD Finance America Corporation)*; Heng Huang (University of Pittsburgh & JD Digits) (Poster)
Fine grained pointing recognition for natural drone guidance. Oscar Leon Barbed Perez (Universidad de Zaragoza)*; Pablo Azagra Millan (University of Zaragoza); Lucas Teixeira (ETH Zurich); Margarita Chli (ETH Zurich); Javier Civera (Universidad de Zaragoza); Ana Murillo (Universidad de Zaragoza) (Poster)
Reposing Humans by Warping 3D Features. Markus Knoche (RWTH Aachen)*; István Sárándi (RWTH Aachen University); Bastian Leibe (RWTH Aachen University) (Poster)

Challenge Winners

Track 1: Multi-Person Human Parsing Challenge Winners:
1st:
Lu Yang1, Qing Song1, Zhihui Wang1, Songcen Xu2
1BUPT PRIV Lab, 2Noah's Ark Lab

2nd:
Tianfei Zhou1, Wenguan Wang2, Ying Fu3
1Inception Institute of Artificial Intelligence, 2ETH Zurich, 3Beijing Institute of Technology

3rd:
Runxin Mao1, Taiwu Sun1, Zhanwang Zhang1, Xiao Tian1
1Ping An Technology（Shenzhen）Co., Ltd
Track 2: Video Multi-Person Human Parsing Challenge Winners:
1st:
Lu Yang1, Qing Song1, Zhihui Wang1, Songcen Xu2
1BUPT PRIV Lab, 2Noah's Ark Lab

2nd:
Zhanwang Zhang1, Xiao Tian1, Runxin Mao1, Taiwu Sun1
1Ping An Technology（Shenzhen）Co., Ltd
Track 3: Image-based Multi-pose Virtual Try-on Challenge Winners:
1st:
Chieh-Yun Chen1, Hong-Han Shuai1, Wen-Huang Cheng1
1National Chiao Tung University

2nd:
Thai Thanh Tuan1, Matiur Rahman Minar1, Heejune Ahn1
1Seoul National University of Science and Technology

3rd:
Zhipeng Luo1, Junfeng Zheng1, Zhenyu Xu1, FengNi1
1DeepBlue Technology(Shanghai) Co., Ltd
Track 4: Video Virtual Try-on Challenge Winners:
1st:
Andrew Jong1, Gaurav Kuppa1, Xin Liu2, Teng-Sheng Moh1, Ziwei Liu2
1San José State University, 2The Chinese University of Hong Kong

2nd:
Haien Zeng1
1Sun Yat-Sen University
Track 5: Dark Complexion Portrait Segmentation Challenge Winners:
1st:
Chenhang Zhou1, Guoqiang Shang1, Ben Ying1, Leheng Zhang1, Jianliang Lan1, Longan Xiao1, Jiangtao Li1
1Shanghai Transsion Information Technology Limited

2nd:
Bingke Zhu, Peigeng Ding, Xiaomei Zhang, Yingying Chen, Ming Tang, Jinqiao Wang
1National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 2School of Artificial Intelligence, University of Chinese Academy of Sciences

3rd:
Minh-Quan Le1,2, Hoang-Phuc Nguyen-Dinh1,2, Anh-Minh Nguyen1,2 , Tam V. Nguyen3, Minh-Triet Tran1,2
1University of Science, VNU-HCM, Vietnam, 2Vietnam National University, Ho Chi Minh City, Vietnam, 3University of Dayton, U.S.A

Introduction

Tentative SCHEDULE

Time

Schedule

Topics of interest

The submission are expected to deal with human-centric visual perception and processing tasks which include but are not limited to:

Accepted Papers

Challenge Winners

Challenges

Track1

Track2

Track3

Track4

Track5

Speakers

Ira Kemelmacher-Shlizerman
kemelmi@google.com

Ming-Hsuan Yang
myang37@ucmerced.edu

Jun-Yan Zhu
junyanz@cs.cmu.edu

William T. Freeman
billf@mit.edu

Main Organizers

Xiaodan Liang
xdliang328@gmail.com

Haoye Dong
donghy7@mail2.sysu.edu.cn

Zhenyu Xie
xiezhy6@mail2.sysu.edu.cn

Liang Lin
linliang@ieee.org

Jiashi Feng
elefjia@nus.edu.sg

Song-Chun Zhu
sczhu@stat.ucla.edu

Contact

Challenge Submission
Important Dates
Challenge Submission Due Date: ~~May 20, 2020, 00:00 AM UTC~~ Extend to June 6, 2020, 00:00 AM UTC

Time

Schedule

Ira Kemelmacher-Shlizerman kemelmi@google.com

Ming-Hsuan Yang myang37@ucmerced.edu

Jun-Yan Zhu junyanz@cs.cmu.edu

William T. Freeman billf@mit.edu

Xiaodan Liang xdliang328@gmail.com

Haoye Dong donghy7@mail2.sysu.edu.cn

Zhenyu Xie xiezhy6@mail2.sysu.edu.cn

Liang Lin linliang@ieee.org

Jiashi Feng elefjia@nus.edu.sg

Song-Chun Zhu sczhu@stat.ucla.edu