About The Workshop

The joint understanding of language and vision poses a fundamental challenge in artificial intelligence. This problem is particularly relevant because combining images and texts is a very natural way of learning for humans. Therefore, progress on tasks like visual question answering, image-captioning, object referral, etc. would undoubtedly provide a stepping stone towards new products and services. For example, a natural language interface between factory operators and control systems could streamline production processes, resulting in safer and more efficient working environments. In a different vein, being able to express commands in natural language to an autonomous car could eliminate the unsettling feeling of giving up all control. The possible applications are countless. Understandably, this calls for efficient computational models that can address these tasks in a realistic environment. In this workshop, we aim to identify, and address the challenges for deploying vision-language models in practical applications (see list of topics). To receive updates about this workshop or challenge, subscribe here.

Call For Papers Prospective authors are invited to submit a 10-14 page paper, which they can present as a poster or contributed talk during the workshop (see call for papers).

Challenge The workshop will host a challenge where participants need to solve a visual grounding task in a realistic task setting. More specifically, we consider a scenario where a passenger can pass free-form natural language commands to a self-driving car. The workshop challenge is based on the recent Talk2Car dataset (EMNLP19’). A quick-start tutorial for participating in the competition can be found here.

Awards Winners of the best paper award and the challenge will receive prizes.

Speakers

Qi Wu
University of Adelaide

Dhruv Batra
Georgia Tech and FAIR

Dengxin Dai
ETH Zurich

Jean Oh
Carnegie Mellon University

Workshop Schedule

23rd of August @ 8 am - 10 am and 8pm - 10 pm UTC+1
Timeslot 1: (8-10 AM - UTC+1):

1st contributed talk w/ Q&A: Vision-and-Language Navigation by Qi Wu
2nd contributed talk w/ Q&A: Car Companion and Auditory Perception by Dengxin Dai
Q&A with top-performing teams on the C4AV challenge

Session 2 (8 PM - 10 PM UTC+1):

1st contributed talk w/ Q&A: Teaching Household Robots to Follow Commands by Dhruv Batra
2nd contributed talk w/ Q&A: Social Navigation by Jean Oh
Q&A with top-performing teams on the C4AV challenge

Workshop Challenge

The challenge focuses on tackling a visual grounding task in a self-driving car scenario. Given a natural language command, the goal is to predict the referred object in the scene. More information about this challenge can be found here.

Important Dates

Important Dates (UTC-12 midnight.)	Event
~~March 20 2020~~	~~Release of the challenge~~
~~March 27 2020~~	~~Opening of leaderboard and submissions~~
~~April 24 2020~~	~~Call for papers opened~~
~~July 10 2020~~	~~Paper submission deadline~~
~~July 18 2020~~	~~Freezing of challenge leaderboard~~
~~July 24 2020~~	~~Decision to authors~~
~~August 1 2020~~	~~End of challenge~~
August 23 2020	Workshop @ ECCV2020 in Glasgow
Septermber 14 2020	Camera ready version