Zhi-Qi Cheng, Ph.D.

Assistant Professor of Computer Science & Systems
UW Graduate School &
Tacoma School of Engineering & Technology
University of Washington


Multimodal Generative AI · Embodied Intelligence · Intelligent Transportation

Email: zhiqics@uw.edu

Office Phone: +1 253-692-5538

Office: Milgard Hall 221.6

Short Bio

Portrait of Zhi-Qi Cheng

I am a tenure‑track Assistant Professor of Computer Science & Systems, jointly appointed in the UW Graduate School and the Tacoma School of Engineering & Technology at the University of Washington, and hold an adjunct appointment with the Paul G. Allen School of Computer Science & Engineering and the Washington State Transportation Center (TRAC). I direct the Multi‑Foundation Model Lab (MF‑Lab), advise Ph.D. students in the Computer Science & Systems Ph.D. program, and since June 2025 have served as a Research Scientist with the Meta AI AGI team. My work in Multimodal Generative AI, Embodied / Robotic Intelligence, and Intelligent Transportation has appeared in CVPR, ICCV, NeurIPS, ICLR, AAAI, IJCAI and ACM MM, and powered The Washington Post investigations that earned the 2022 Pulitzer Prize for Public Service.

Prior to UW I spent seven years at Carnegie Mellon University’s Language Technologies Institute, rising from Research Associate and Post‑doc to Project Scientist / Instructor under the mentorship of Prof. Alexander  G. Hauptmann and Prof. Teruko Mitamura, and collaborating closely with Prof. David R. Mortensen (KAIROS TA‑1 & Eratosthenes backend), and Prof. Alan W. Black (KAIROS TA‑2). I served as technical lead for DARPA KAIROS and KAIROS‑Plus Projects, and contributed to IARPA DIVA and NIST PSIAP Projects. Earlier, I completed research internships at Alibaba DAMO Academy, Google Brain, and Microsoft Research; and received the Intel Ph.D. Fellowship, IBM Outstanding Student Scholarship and Huawei Scholarship.

My research has been featured by The Washington Post, The New York Times, CBS News and other outlets, influencing public‑service reporting, safety analytics and mobility policy. Interested in joining? If you are excited about multimodal generation & reasoning, embodied AI, or large‑scale streaming perception, please email me—motivated students, post‑docs and collaborators are always welcome.

News

Research Interests

Multimodal Foundation Models

  • Large‑scale vision–language pre‑training
  • Instruction tuning & alignment
  • Emotion reasoning & affective computing
  • Parameter‑efficient & low‑rank adaptation

Embodied / Robotic Intelligence

  • Language‑conditioned navigation & manipulation
  • Sim‑to‑real transfer & domain adaptation
  • Multimodal sensing & scene understanding
  • Human‑aware task & motion planning

Intelligent Transportation

  • City‑scale traffic‑video analytics
  • Driver‑state monitoring & risk prediction
  • Route‑level mobility simulation
  • Edge AI for real‑time perception

Professional Experience

Assistant Professor (UW, Dec 2024 – present); Project Scientist (CMU, 2022–24); Research Associate (CMU, 2017–19)

  • Assistant Professor, University of Washington (2024 – present)

    • Teach AI/ML and Computer Graphics courses; supervise research-driven student projects.
    • Lead MF-Lab research on large-scale multimodal AI, with a focus on robotics and transportation-scale mobility analytics.
    • Graduate Faculty; collaborate closely with UW Traffic Center & the Paul G. Allen School.
  • Carnegie Mellon University — School of Computer Science, LTI (2017 – 2024)

    • Project Scientist / Instructor (2022-24): Taught “11-775 Large-Scale Multimedia Analysis” course with Prof. A. G. Hauptmann; technical lead for DARPA KAIROS & KAIROS-Plus projects.
    • Post-doc (2019-22): Advanced video-language representation & event extraction with Prof. T. Mitamura & Prof. A. G. Hauptmann; orchestrated multi-lab KAIROS integration.
    • Research Associate (2017-19): Under Prof. A. G. Hauptmann's guidance, built large-scale multimodal event analytics for NIST PSIAP & IARPA DIVA.
  • Research Assistant, City University of Hong Kong (2016 – 2017)

  • Industry Experience

    • Meta AI — External Consultant (2025 – present): accelerating multimodal foundation models and generative agents.
    • Microsoft Research — Human-Action AI Intern (2019).
    • Google Brain — AutoML Intern (2018), mentored by Dr. Quoc V. Le.
    • Alibaba DAMO Academy — Research Scientist (2015–2016), mentored by Prof. Xian-Sheng Hua; contributed to development of multimodal recommendation and retrieval systems including Pailitao and click-and-buy platforms.

Selected Publications

2025 Emotion-LLaMA – NeurIPS 24
  • Cheng et al. Emotion-LLaMA – NeurIPS 24; first multimodal LLaMA for emotion reasoning.
  • Oh et al. Calibrated Robust Fine-Tuning – NeurIPS 24; boosts VL-model reliability.
  • Li et al. Human-Aware VL Navigation – NeurIPS 24 Spotlight; bridges sim‑to‑real robot navigation.
  • He et al. MetaDesigner – ICLR 25; LLM‑driven artistic text design; WordArt API > 1 M calls.
  • Wang et al. Dataset Distillation via DF – CVPR 25; efficient data‑distillation that accelerates model training.
  • Tu et al. StableAnimator – CVPR 25; high‑quality ID‑preserving animation.
  • Imrattanatrai et al. Video-Grounded Dialogue – AAAI 25 Oral; new dataset and metric for event‑driven QA.
  • Xiang et al. POPoS – AAAI 25; efficient facial-landmark detection.
  • Hasegawa et al. ProMQA – NAACL 25 Oral; multimodal procedural QA dataset.
  • Huang et al. DyRoNet – WACV 25; dynamic routing for streaming perception.
  • Jiang et al. UCDR-Adapter – WACV 25; cross-domain VL retrieval.
  • Oh et al. BlackVIP++ – TPAMI 25; black-box visual prompting.
2024 MotionEditor – CVPR 24
  • Tu et al. MotionEditor – CVPR 24; content-aware diffusion editing.
  • Fang et al. PROS – CVPR 24; prompt-to-simulate retrieval.
  • Zhu et al. DCPT – ICRA 24; night-time UAV tracking.
  • Xu et al. FaceChain‑ImagineID – CVPR 24; high‑fidelity talking faces for digital humans; 9 k★ GitHub.
  • Zhou et al. BlockGCN – CVPR 24; topology-aware action recognition.
  • Cheng et al. Shield – EMNLP 24 Oral; LLM supply-chain analytics.
  • He et al. WordArt Designer API – NeurIPS ML4Creativity 24 Spotlight; Best Demo Award.
2023 Implicit Temporal Alignment – ICCV 23 Oral
  • Chen et al. HDFormer – IJCAI 23; compact SOTA 3-D pose.
  • He et al. DAMO-StreamNet – IJCAI 23; streaming perception.
  • Tu et al. Implicit Temporal Alignment – ICCV 23 Oral; VDN winner.
  • Cheng et al. ChartReader – ICCV 23; rule-free chart de-rendering.
  • He et al. WordArt Designer – EMNLP 23; first LLM typography API.
  • Bao et al. KeyPosS – ACM-MM 23 Oral; plug-and-play landmark detection.
  • Liu et al. Posynda – ACM-MM 23; robust 3-D pose adaptation.
≤ 2022 Rethinking Spatial Invariance – CVPR 22

View my  full publication list on Google Scholar ↗

Honors & Awards

Pulitzer Prize 2022; Intel Ph.D. Fellowship 2017; IBM Outstanding Student Scholarship 2017–19

Pulitzer Prize for Public Service 2022

2nd-ranked technical expert for The Washington Post report.

Intel Ph.D. Fellowship 2017

Recognized for excellence in AI & semiconductor research.

IBM Outstanding Student Scholarship 2017–2019

Awarded by IBM to outstanding Chinese students studying overseas.

CVPR Anti‑UAV Workshop Best Paper 2025

“Securing the Skies: A Comprehensive Survey on Anti‑UAV Methods, Benchmarking, and Future Directions.”

ICCV Outstanding Reviewer 2023

Recognized as an Outstanding Reviewer for ICCV 2023.

Huawei Scholarship 2018

Achievements in AI & communication technology.

Scales Figure Scholarship 2016 & 2018

Image-processing talent award.

Tang Li Xin Scholarship 2017

Top 0.1 % of Chinese university students.

ACM SCF Best Student Paper 2016

Best student paper award at the ACM China Sichuan Chapter.

ICPC Asian Regional Silver Medal 2013

Team Silver in the ACM International Collegiate Programming Contest, Asian region.

“Star of Self-Improvement” Nominee 2014

National top 0.01 % honor.

Teaching

TCSS 455; TCSS 458; TCSS 590; Independent research supervision

University of Washington

  • TCSS 455
    Introduction to Machine Learning
    Aut’25 · T/Th 1:30 – 3:30 PM · CP 106
  • TCSS 458
    Computer Graphics
    Win’25 · M/W 3:40 – 5:40 PM · CP 325
  • TCSS 590
    Vision‑Language Models
    Spr’25 · M/W 3:40 – 5:40 PM · JOY 215
  • Independent research supervision — TCSS 499 / 600 / 702 (ongoing)

Carnegie Mellon University

  • 11‑775
    Large‑Scale Multimedia Analysis
    Fall’23 · M/W 5:00 – 6:20 PM · GHC 4307
  • 11‑775
    Large‑Scale Multimedia Analysis
    Spr’24 · M/W 5:00 – 6:20 PM · GHC 4102

Students Supervised

8 Ph.D.; 20 Master’s; 6 Undergrad; 7 High‑School mentees

Doctoral Students

University of Washington

  • Yifei Dong (GSFEI Top Scholar ’25; Columbia M.S.)
  • Fengyi Wu (RA @ UW; incoming Ph.D. ’26)

Carnegie Mellon University

  • Tingyao Hu (Ph.D., CMU; now @ Apple Research)
  • Yuxuan Zhou (RA @ CMU; Ph.D., MPI)
  • Changdae Oh (RA @ CMU; Ph.D., UW–Madison)
  • Hyesu Lim (RA @ CMU; Ph.D., KAIST‑AI)
  • Mingze Sun (RA @ CMU; Ph.D., Tsinghua)
  • Xudong Yan (RA @ CMU; Ph.D., CityU Macau)

Master’s Students

University of Washington

  • Sanjian Zhang
  • Guanyu Chen
  • Charlie LeWarne

Carnegie Mellon University

  • Siyao Li (now @ Amazon)
  • Heng Li · Xiwen Chen · Sean Chang · Joong Ho Choi · Yuran Wang

Other Institutions

  • Hanbing Liu (Tsinghua)
  • Yuzhi Hu (Boston U)
  • Bukun Ren (UC Berkeley)
  • Yuxiang Lin · Fan Zhang (Georgia Tech)
  • Shuyuan Tu (Fudan)
  • Minghan Li (SWJTU)
  • Zebang Cheng · Xiang Li · Jue Wang · Xiaolong Wang (SZTU)

Undergraduate Students

University of Washington

  • Masumi Yano
  • Johnny Garnica

Other Institutions

  • Yusen Hu (Imperial → MSR@CMU)
  • Aike Shi (Georgia Tech)
  • Wei Liu (U-Michigan)
  • Zekai Li (NUS)

High-School Mentoring

  • Jasmine Liu · Aimee Zhang · Susan Wang (MIT)
  • Zejia Shao (UIUC)
  • Yulun Wu (Harvey Mudd)
  • Chiyu Zhou (Imperial)
  • Qiyuan Gu (UChicago)

Advisees won ISEF & Yau Science Awards; now at top universities.

Sponsored Projects (selected)

Key sponsored projects that funded my Ph.D. and post‑doctoral research (completed).

USDOT Mobility21 Logo

Mobility 21 UTC

Semantic perception module for AVs to detect & predict road‑user behaviour (2022 – 2023) (final report).

CMU Logo

Digital Twins for Manufacturing Resiliency

ML‑driven twin platform forecasting material‑supply shocks via multimedia analytics (project page; arXiv preprint).

HA‑VLN Overview

CMU–AIST Bridge Project

Human‑Inclusive Dynamic Control & Navigation (CMU–AIST Bridge, 2024 – present). Key publications: HA‑VLN (NeurIPS 24 Spotlight); ProMQA (NAACL 25 Oral); VG Dialogue (AAAI 25 Oral).

NIST Logo

NIST PSIAP 2017

Geo-spatial fusion of live & crowd-sourced video for first-responder dashboards (2017 – 2019).

DARPA Logo

DARPA KAIROS

Schema‑guided event understanding from video, image & text streams (Tech‑lead, 2019 – 2024). (final system description)

DARPA Logo

DARPA AIDA

Multi-hypothesis reasoning over multimodal sources to score competing narratives (2018 – 2023).

IARPA Logo

IARPA DIVA

Real-time multi-camera analytics with graph reasoning for critical-activity detection (2017 – 2021).

Media Coverage (selected)

Selected press featuring my CMU-era research & technology deployments.

Washington Post

Washington Post Apr 15 2021

“17 requests for backup in 78 minutes”
Provided crowd-counting analytics for the Capitol riot.

Washington Post

Washington Post Nov 21 2021

“Astroworld Festival Analysis”
Assisted in crowd-density reconstruction.

Washington Post

Washington Post Aug 25 2021

“Anatomy of a crackdown”
Multimedia gunshot & shooter analysis.

Washington Post

Washington Post Jun 2022

“How Shireen Abu Akleh was killed”
Contributed gunshot & shooter trajectory analysis.

Professional Service (selected)

Leadership & Organization

  • Area Chair — NAACL 2025
  • Session Chair — IEEE ICASSP 2023
  • Workshop Organizer — CVPR Anti-UAV Series 2025

Conference Committees

  • ACM Multimedia 2019 – 24
  • IEEE CVPR 2018 – 24
  • NeurIPS 2020 – 24

Additional (selected):

  • AAAI · ICLR · EMNLP · NAACL · IJCAI
  • ICCV · ECCV · WACV · ICASSP · ICMR · AISTATS
  • MM Asia · ICME · ICML

Journal Reviewing

IEEE: T-PAMI · T-MM · TIP · TNNLS · T-CSVT · T-Cybernetics · IoT J · J-STSP · SPL · Sensors(+Letters) · RA-L · OJ-SP · TCDS

ACM: TOMM · TECS

Others: IJCV · Pattern Recognition · Information Fusion · CVIU · etc.

Community & Outreach

  • Judge & Advisor — Regeneron ISEF 2019 – 24
  • Advisor — S.-T. Yau High-School Science Award 2019 – 24
  • Founder — CMU Annual Ice-Cream Social 2021 –
  • Organizer — CMU Postdoc Association 2021 – 23
  • Volunteer — Greater Pittsburgh Community Food Bank 2018 – 19