Zhi-Qi Cheng, Ph.D.

Pronunciation: Zhì-qí Chéng ("Jih-Chee Chung")Chinese name: Chéng Zhì-qí.

Assistant Professor: Computer Science & Systems
School of Engineering & Technology
University of Washington Tacoma
Graduate Faculty: Doctoral Endorsement
UW Graduate School
Director: Multimodal Intelligence Lab (MILab)
Previously: Postdoctoral Research Associate and Project Scientist
Carnegie Mellon University

About

I am a tenure-track Assistant Professor of Computer Science & Systems in the School of Engineering & Technology at the University of Washington Tacoma. I direct the Multimodal Intelligence Lab (MILab), where we study multimodal AI, embodied intelligence, and intelligent systems for open-world decision-making. I am also a member of the Graduate Faculty with doctoral endorsement through the University of Washington Graduate School.

My research explores how AI systems can learn, reason, and act from multimodal experience in complex real-world environments. At MILab, we develop foundation models, embodied agents, and deployable AI systems that connect perception, reasoning, planning, and action across visual, linguistic, temporal, and physical contexts, with applications in robotics, mobility, public safety, and human-centered decision support.

News

May 1, 2026: Incoming Ph.D. student Fengyi Wu received the 2026 UW GSFEI Top Scholar Award.
Apr 4, 2026: Two ACL 2026 papers accepted: Sign-Language Datasets at Scale (Main) and GoVIG (Findings).
Jan 25, 2026: Lossless Hierarchical Speculative Decoding was accepted as an oral presentation at ICLR 2026.
Sep 18, 2025: MaxSup was accepted as an oral presentation at NeurIPS 2025.
Sep 3, 2025: Ph.D. student Yifei Dong received the Carwein-Andrews Ph.D. Fellowship.
Jun 12, 2025: Our Anti-UAV survey received the Best Paper Award at the CVPR 2025 Anti-UAV Workshop.
Mar 31, 2025: Incoming Ph.D. student Yifei Dong received the 2025 UW GSFEI Top Scholar Award.
Feb 26, 2025: Four papers were accepted to CVPR 2025: two main conference papers and two workshop papers.

Research

Multimodal AI for reliable perception, reasoning, and deployment.

My research develops AI systems that integrate visual, linguistic, spatial, and temporal evidence to support reliable understanding and decision-making. At MILab, we study foundation models, embodied agents, and deployable AI for robotics, mobility, public safety, and responsible decision support.

Core Question

How can AI systems use multimodal evidence to understand, predict, and act reliably in complex environments?

01

Multimodal Foundation Models

Learning and evaluation for models that reason across language, vision, audio, maps, and structured knowledge.

02

Embodied AI & World Models

Agents that connect perception, memory, prediction, planning, and interaction in dynamic physical environments.

03

Deployable AI for Mobility & Public Safety

AI systems for traffic and mobility intelligence, public safety sensing, secure perception, and robust operation in constrained environments.

Applied Collaborations

Selected collaborations connect this agenda to applied work in visual evidence analysis, mobility intelligence, public safety, and responsible decision support.

Explore research → View publications →

Teaching & Mentorship

Courses and research supervision in AI, robotics, graphics, and multimodal systems.

I teach courses that connect core computer science foundations with current advances in AI, robotics, computer graphics, and multimodal systems. My teaching emphasizes technical depth, hands-on implementation, empirical evaluation, reproducible experimentation, and open-ended projects. Current UW students across Seattle, Tacoma, and Bothell can enroll via UW cross-campus rules. Undergraduates follow UW cross-campus registration requirements; graduate students are not subject to cross-campus registration restrictions.

I also mentor undergraduate and M.S. students through the Multimodal Intelligence Lab (MILab), independent study, supervised research, thesis projects, and capstone projects. Students across all three UW campuses can pursue research opportunities and research credit through TCSS 499, TCSS 600, TCSS 700, or TCSS 702 with instructor approval. Students interested in research opportunities should contact me to discuss research interests and potential projects.

Courses and Supervision

TCSS 437 — Mobile Robotics
TCSS 455 — Introduction to Machine Learning
TCSS 458 — Computer Graphics
TCSS 590 — Vision-Language Models
TCSS 499 / 600 / 700 / 702 — Independent Research, Thesis, and Design Project Supervision

View Courses & Supervision →

Prospective Students & Researchers

Ph.D. advising and MILab opportunities in multimodal AI.

As a member of the UW Graduate Faculty with doctoral endorsement, I advise Ph.D. students through the UW Graduate School and serve on doctoral supervisory committees across eligible UW graduate programs. My primary Ph.D. recruiting pathway is the CSS Ph.D. program. I welcome inquiries from prospective Ph.D. students, postdoctoral researchers, and research assistants interested in multimodal AI, embodied intelligence, robotics, mobility intelligence, and responsible AI.

Students and researchers interested in MILab opportunities should complete the MILab interest form and email me at zhiqics@uw.edu with a brief note describing their background, research interests, and potential fit. Opportunities depend on research fit, preparation, project needs, funding availability, and mentoring capacity in a given quarter.

What to Include

Academic background: CV and unofficial transcript, if applicable
Research interests: topics, questions, and research directions of interest
Relevant experience: projects, publications, software systems, open-source contributions, or prior research
MILab fit: why you are interested in MILab and how your interests align with current projects

Competitive Ph.D. applicants may be considered for nomination to UW fellowships or GSFEI awards, subject to program procedures, eligibility requirements, nomination criteria, and funding availability.

Complete interest form → Join MILab →

Awards & Service

Intel Ph.D. Fellowship, 2017–2019
CVPR Anti-UAV Workshop Best Paper Award, 2025
ICCV Outstanding Reviewer, 2023
CSC–IBM Outstanding Student Scholarship, 2017–2019
ACM SCF Best Student Paper Award, 2016
ICPC Asia Regional Silver Medal, 2013
Scales Figure Scholarship, 2016 & 2018
Nominee, "Star of Self-Improvement of Chinese University Students," 2014
Technical contributor to The Washington Post 2022 Pulitzer-winning Public Service coverage

View awards & service →

Selected Publications

Representative papers and systems grouped by research theme.

Multimodal Models & Efficient AI Foundation models, generative systems, calibrated learning, and efficient decoding.

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning (NeurIPS 2024) Code Website
MaxSup: Overcoming Representation Collapse in Label Smoothing (NeurIPS 2025, Oral) Code
Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding (ICLR 2026, Oral) Code
Towards Calibrated Robust Fine-Tuning of Vision-Language Models (NeurIPS 2024) Code
MetaDesigner: AI-Driven, User-Centric, Multilingual WordArt Synthesis (ICLR 2025) Demo
SHIELD: LLM-Driven Schema Induction for Predictive Analytics in EV Battery Supply-Chain Disruptions (EMNLP 2024, Industry Oral) Code Website
ChartReader: A Unified Framework for Chart Derendering and Comprehension (ICCV 2023) Code
FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio (CVPR 2024) Code
StableAnimator: High-Quality Identity-Preserving Human Image Animation (CVPR 2025) Code Website

Embodied AI & World Models Navigation, world modeling, activity understanding, and multimodal reasoning.

Human-Aware Vision-and-Language Navigation (NeurIPS 2024, Spotlight) V2 Code V1 Code Project
Towards Unified World Models for Visual Navigation via Memory-Augmented Planning and Foresight (arXiv 2025) Code
Language-Conditioned World Modeling for Visual Navigation (arXiv 2026) Code
GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement (ACM Multimedia 2022, Oral) Code
VDAct: A Video-grounded Dialogue Dataset and Metric for Event-driven Activities (AAAI 2025, Oral) Code & Data
ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding (NAACL 2025, Oral) Code & Data

Mobility, Safety & Deployment Transportation intelligence, public safety, secure sensing, and robust perception.

Rethinking Spatial Invariance of Convolutional Networks for Object Counting (CVPR 2022) Code
BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition (CVPR 2024) Code
DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving (IJCAI 2023) Code
SHIELD: LLM-Driven Schema Induction for Predictive Analytics in EV Battery Supply-Chain Disruptions (EMNLP 2024, Industry Oral) Code Website
POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search (AAAI 2025) Code
Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions (CVPR 2025 Anti-UAV Workshop, Best Paper)