2nd Edition — IEEE Intelligent Vehicles Symposium 2026

Vision, Language, and Multimodal Human Instructions
for Interactive Intelligent Vehicles

Date and Time TBA  •  VL-IIV 2026

About the Workshop

The Vision, Language, and Multimodal Human Instructions for Interactive Intelligent Vehicles (VL-IIV 2026) workshop explores the intersection of computer vision, language understanding, and multimodal reasoning for human-in-the-loop autonomous driving. The workshop focuses on systems and datasets that allow vehicles to perceive, interpret, and respond to visual and linguistic instructions.

Interactive autonomous systems capable of interpreting multimodal human instructions are critical to the next generation of safe and trustworthy transportation. This workshop promotes human-centered autonomy, reducing risks from fully unsupervised systems while enhancing transparency and user control.

doScenes Instructed Driving Challenge

VL-IIV 2026 hosts the doScenes Instructed Driving Challenge

The challenge evaluates how well vision-language models predict trajectories conditioned on human driving instructions. The dataset contains scene-level captions, driver intent labels, and natural-language instructions for upcoming maneuvers — all human-generated and labeled by multiple annotators, creating a diverse set of descriptors mapping to the same maneuver.

Participants predict the vehicle's future trajectory conditioned on any combination of (1) visual scene input (multi-camera), (2) language instruction, and (3) scene context (history + map), evaluated using displacement error, visualization, and explainability.

View Challenge Details →

Topics

We welcome contributions with a strong focus on — but not limited to — the following topics:

Human-in-the-loop and instructed autonomy
Representation learning and foundation models for embodied, instruction-conditioned behavior
Multimodal learning and grounding (gesture, speech, gaze)
Multi-agent interactions
Vision-language models for driving and robotics
Scene understanding for control transitions
Safety, trust, explainability, and transparency in human-interactive AV systems
Datasets, benchmarks, and evaluation metrics for interactive autonomy
Generative and contrastive modeling for multimodal control

Invited Speakers

Mustafa Bal
Mustafa Bal
NomadicML

Additional speakers to be announced.

Schedule

Tentative half-day schedule. Exact start time and room TBA pending IEEE IV 2026 program release.

+0:00 Welcome
Opening Remarks
Prof. Ross Greer & Prof. Mohan Trivedi
15 min
+0:15 Invited Talk
TBA
Mustafa Bal — NomadicML
25 min
+0:40 Invited Talk
TBA
Speaker TBA
25 min
+1:05 Invited Talk
TBA
Speaker TBA
25 min
+1:30 Break
Coffee Break
15 min
+1:45 Invited Talk
TBA
Speaker TBA
25 min
+2:10 Challenge
doScenes Instructed Driving Challenge — Results & Top Team Presentations
Challenge leads: Kianna Ng, Angel Martinez, Parthib Roy
40 min
+2:50 Oral Papers
Oral Paper Presentations
Selected IV workshop papers — 15 min each
INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Reasoning — Dianwei Chen, Zifan Zhang, Lei Cheng, Yuchen Liu, Xianfeng Yang
30–45 min
+3:35 Closing
Closing Remarks & Awards
Organizers
10 min

Organizers

Lead Organizers

Prof. Ross Greer
University of California, Merced
Prof. Mohan Trivedi
University of California, San Diego

Organizing Committee & Challenge Leads

Max Ronecker
TU Graz
Walter Zimmer
University of Sydney / UCLA
Rui Song
UCLA
Kianna Ng
UC Merced
Angel Martinez
UC Merced
Maitrayee Keskar
UC San Diego
Anas Saeed
Bonsai Robotics
Erika Maquiling
UC Merced
Edmund Chao
UCLA
Giovanni Tapia Lopez
UC Merced
Marcus Blennemann
UC San Diego
Parthib Roy
UC Merced
Afnan Alofi
Princess Nourah bint Abdulrahman University

Contact

For inquiries, please contact rossgreer@ucmerced.edu.