2nd Edition — IEEE Intelligent Vehicles Symposium 2026
Vision, Language, and Multimodal Human Instructions
for Interactive Intelligent Vehicles
Date and Time TBA • VL-IIV 2026
About the Workshop
The Vision, Language, and Multimodal Human Instructions for Interactive Intelligent Vehicles (VL-IIV 2026) workshop explores the intersection of computer vision, language understanding, and multimodal reasoning for human-in-the-loop autonomous driving. The workshop focuses on systems and datasets that allow vehicles to perceive, interpret, and respond to visual and linguistic instructions.
Interactive autonomous systems capable of interpreting multimodal human instructions are critical to the next generation of safe and trustworthy transportation. This workshop promotes human-centered autonomy, reducing risks from fully unsupervised systems while enhancing transparency and user control.
doScenes Instructed Driving Challenge
VL-IIV 2026 hosts the doScenes Instructed Driving Challenge
The challenge evaluates how well vision-language models predict trajectories conditioned on human driving instructions. The dataset contains scene-level captions, driver intent labels, and natural-language instructions for upcoming maneuvers — all human-generated and labeled by multiple annotators, creating a diverse set of descriptors mapping to the same maneuver.
Participants predict the vehicle's future trajectory conditioned on any combination of (1) visual scene input (multi-camera), (2) language instruction, and (3) scene context (history + map), evaluated using displacement error, visualization, and explainability.
View Challenge Details →Topics
We welcome contributions with a strong focus on — but not limited to — the following topics:
Invited Speakers
Additional speakers to be announced.
Schedule
Tentative half-day schedule. Exact start time and room TBA pending IEEE IV 2026 program release.
| +0:00 | Welcome |
Opening Remarks
Prof. Ross Greer & Prof. Mohan Trivedi
|
15 min |
| +0:15 | Invited Talk |
TBA
Mustafa Bal — NomadicML
|
25 min |
| +0:40 | Invited Talk |
TBA
Speaker TBA
|
25 min |
| +1:05 | Invited Talk |
TBA
Speaker TBA
|
25 min |
| +1:30 | Break |
Coffee Break
|
15 min |
| +1:45 | Invited Talk |
TBA
Speaker TBA
|
25 min |
| +2:10 | Challenge |
doScenes Instructed Driving Challenge — Results & Top Team Presentations
Challenge leads: Kianna Ng, Angel Martinez, Parthib Roy
|
40 min |
| +2:50 | Oral Papers |
Oral Paper Presentations
Selected IV workshop papers — 15 min each
INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Reasoning — Dianwei Chen, Zifan Zhang, Lei Cheng, Yuchen Liu, Xianfeng Yang
|
30–45 min |
| +3:35 | Closing |
Closing Remarks & Awards
Organizers
|
10 min |