Zhi-Qi Cheng, Ph.D.

Assistant Professor of Computer Science and Systems,
Tacoma School of Engineering & Technology
University of Washington


Multimodal Generative AI · Embodied Intelligence · Intelligent Transportation

Email: zhiqics@uw.edu

Office Phone: +1 253-692-5538

Office: Milgard Hall 221.6

Short Bio

Portrait of Zhi-Qi Cheng

I am a tenure-track Assistant Professor in the Tacoma School of Engineering & Technology at the University of Washington (UW), and an Endorsed Graduate Faculty member qualified to chair doctoral committees. I advise Ph.D.  students in the Computer Science & Systems program and direct the Multi-Foundation Model Lab (MF-Lab). Our work on large-scale vision-language models, embodied agents, and mobility-centric robotics is conducted in close collaboration with UW TRAC and the Paul G. Allen School of CSE. Off-campus, I consult part-time for the Meta AI AGI team.

Earlier, during my seven-year tenure at Carnegie Mellon University’s Language Technologies Institute (part of the School of Computer Science), I progressed from Research Associate to Post-doctoral Associate and finally Project Scientist / Course Instructor, co-teaching 11-775 Large-Scale Multimedia Analysis. I led R&D for DARPA KAIROS, IARPA DIVA, and NIST PSIAP; our event-analysis system powered The Washington Post investigations that earned the 2022 Pulitzer Prize for Public Service. Previously, I was a joint Ph.D. student at Carnegie Mellon University and City University of Hong Kong, completed internships at Google Brain and Microsoft Research, and received Intel Ph.D. Fellowship and IBM Outstanding Student Scholarships.

News

  • 15 Jun 2025 | Dr. Cheng will join the Meta AI AGI team as a part-time research scientist.
  • 31 Mar 2025 | Incoming Ph.D. student Yifei Dong receives UW GSFEI Top Scholar Award 2025.
  • 28 Mar 2025 | Dr. Cheng serves as Area Chair for NAACL 2025.
  • 10 Feb 2025 | Four papers accepted at CVPR 2025 (2 main‑conference, 2 workshop papers).
  • 05 Feb 2025 | One paper accepted at ICLR 2025.
  • 20 Jan 2025 | Two papers accepted at AAAI 2025, including one Oral presentation.

Research Interests

Multimodal Foundation Models

Vision–language pre-training, instruction tuning, emotion reasoning, efficient adaptation.

Embodied / Robotic Intelligence

Language-conditioned navigation, sim-to-real transfer, multimodal sensing, human-aware planning.

Intelligent Transportation

City-scale traffic-video analytics, driver-state monitoring, route-level risk prediction, mobility simulation.

Professional Experience

Assistant Professor (UW, Dec 2024 – present); Project Scientist (CMU, 2022–24); Research Associate (CMU, 2017–19)

  • Assistant Professor, University of Washington (Dec 2024 – present)

    • Teach AI/ML and Computer Graphics courses; supervise research-driven student projects.
    • Spearhead MF-Lab research in large-scale multimodal AI for robotics and transportation mobility analytics.
    • Graduate Faculty; collaborate closely with UW Traffic Center & the Paul G. Allen School.
  • Carnegie Mellon University — School of Computer Science, LTI (2017 – 2024)

    • Project Scientist / Instructor (2022-24): Taught “11-775 Large-Scale Multimedia Analysis” course with Prof. A. G. Hauptmann; technical lead for DARPA KAIROS & KAIROS-Plus projects.
    • Post-doc (2019-22): Advanced video-language representation & event extraction with Prof. T. Mitamura & Prof. A. G. Hauptmann; orchestrated multi-lab KAIROS integration.
    • Research Associate (2017-19): Under Prof. A. G. Hauptmann's guidance, built large-scale multimodal event analytics for NIST PSIAP & IARPA DIVA.
  • Research Assistant, City University of Hong Kong (Dec 2016 – Nov 2017)

  • Industry Experience

    • Meta AI — External Consultant (2025 – present): accelerating multimodal foundation models and generative agents.
    • Microsoft Research — Human-Action AI Intern (2019).
    • Google Brain — AutoML Intern (2018), mentored by Dr. Quoc V. Le.
    • Alibaba DAMO Academy — Research Scientist (2015–2016), mentored by Prof. Xian-Sheng Hua; contributed to development of multimodal recommendation and retrieval systems including Pailitao and click-and-buy platforms.

Selected Publications

2025 Emotion-LLaMA – NeurIPS 24
  • Cheng et al. Emotion-LLaMA – NeurIPS 24; first multimodal LLaMA for emotion reasoning.
  • Oh et al. Calibrated Robust Fine-Tuning – NeurIPS 24; boosts VL-model reliability.
  • Li et al. Human-Aware VL Navigation – NeurIPS 24 Spotlight; bridges sim‑to‑real robot navigation.
  • He et al. MetaDesigner – ICLR 25; LLM‑driven artistic text design; WordArt API > 1 M calls.
  • Wang et al. Dataset Distillation via DF – CVPR 25; efficient data‑distillation that accelerates model training.
  • Tu et al. StableAnimator – CVPR 25; high‑quality ID‑preserving animation.
  • Imrattanatrai et al. Video-Grounded Dialogue – AAAI 25 Oral; new dataset and metric for event‑driven QA.
  • Xiang et al. POPoS – AAAI 25; efficient facial-landmark detection.
  • Hasegawa et al. ProMQA – NAACL 25 Oral; multimodal procedural QA dataset.
  • Huang et al. DyRoNet – WACV 25; dynamic routing for streaming perception.
  • Jiang et al. UCDR-Adapter – WACV 25; cross-domain VL retrieval.
  • Oh et al. BlackVIP++ – TPAMI 25; black-box visual prompting.
2024 MotionEditor – CVPR 24
  • Tu et al. MotionEditor – CVPR 24; content-aware diffusion editing.
  • Fang et al. PROS – CVPR 24; prompt-to-simulate retrieval.
  • Zhu et al. DCPT – ICRA 24; night-time UAV tracking.
  • Xu et al. FaceChain‑ImagineID – CVPR 24; high‑fidelity talking faces for digital humans; 9 k★ GitHub.
  • Zhou et al. BlockGCN – CVPR 24; topology-aware action recognition.
  • Cheng et al. Shield – EMNLP 24 Oral; LLM supply-chain analytics.
  • He et al. WordArt Designer API – NeurIPS ML4Creativity 24 Spotlight; Best Demo Award.
2023 Implicit Temporal Alignment – ICCV 23 Oral
  • Chen et al. HDFormer – IJCAI 23; compact SOTA 3-D pose.
  • He et al. DAMO-StreamNet – IJCAI 23; streaming perception.
  • Tu et al. Implicit Temporal Alignment – ICCV 23 Oral; VDN winner.
  • Cheng et al. ChartReader – ICCV 23; rule-free chart de-rendering.
  • He et al. WordArt Designer – EMNLP 23; first LLM typography API.
  • Bao et al. KeyPosS – ACM-MM 23 Oral; plug-and-play landmark detection.
  • Liu et al. Posynda – ACM-MM 23; robust 3-D pose adaptation.
≤ 2022 Rethinking Spatial Invariance – CVPR 22

View my  full publication list on Google Scholar ↗

Honors & Awards

Pulitzer Prize 2022; Intel Ph.D. Fellowship 2017; IBM Outstanding Student Scholarship 2017–19

Pulitzer Prize for Public Service 2022

2nd-ranked technical expert for The Washington Post Capitol-Riot analysis.

Intel Ph.D. Fellowship 2017

Recognized for excellence in AI & semiconductor research.

IBM Outstanding Student Scholarship 2017–2019

Huawei Scholarship 2018

Achievements in AI & communication technology.

Scales Figure Scholarship 2016 & 2018

Image-processing talent award.

Tang Li Xin Scholarship 2017

Top 0.1 % of Chinese university students.

ACM SCF Best Student Paper 2016

ICPC Asian Regional Silver Medal 2013

“Star of Self-Improvement” Nominee 2014

National top 0.01 % honor.

Teaching

TCSS 455; TCSS 458; TCSS 590; Independent research supervision

University of Washington

  • TCSS 455
    Introduction to Machine Learning
    Aut’25 · T/Th 1:30 – 3:30 PM · CP 106
  • TCSS 458
    Computer Graphics
    Win’25 · M/W 3:40 – 5:40 PM · CP 325
  • TCSS 590
    Vision‑Language Models
    Spr’25 · M/W 3:40 – 5:40 PM · JOY 215
  • Independent research supervision — TCSS 499 / 600 / 702 (ongoing)

Carnegie Mellon University

  • 11‑775
    Large‑Scale Multimedia Analysis
    Fall’23 · M/W 5:00 – 6:20 PM · GHC 4307
  • 11‑775
    Large‑Scale Multimedia Analysis
    Spr’24 · M/W 5:00 – 6:20 PM · GHC 4102

Students Supervised

9 Ph.D.; 6 Master’s; 5 Undergrad; 6 High‑School mentees

Doctoral Students

University of Washington

  • Yifei Dong (GSFEI Top Scholar ’25; Columbia M.S.)
  • Fengyi Wu (RA @ UW; incoming Ph.D. ’26)
  • Tianyu Wang (RA @ UW; Ph.D., Georgia Tech)

Carnegie Mellon University

  • Tingyao Hu (co-advised; now @ Apple Research)
  • Yuxuan Zhou (Ph.D. candidate, MPI)
  • Changdae Oh (Ph.D. candidate, UW-Madison)
  • Hyesu Lim (Ph.D. candidate, KAIST-AI)
  • Mingze Sun (Ph.D. candidate, Tsinghua)
  • Xudong Yan (Ph.D. candidate, CityU Macau)

Master’s Students

University of Washington

  • Sanjian Zhang
  • Guanyu Chen
  • Charlie LeWarne

Carnegie Mellon University

  • Siyao Li (now @ Amazon)
  • Heng Li · Xiwen Chen · Sean Chang · Joong Ho Choi · Yuran Wang

Other Institutions

  • Hanbing Liu (Tsinghua)
  • Yuzhi Hu (Boston U)
  • Bukun Ren (UC Berkeley)
  • Yuxiang Lin · Fan Zhang (Georgia Tech)
  • Shuyuan Tu (Fudan)
  • Minghan Li (SWJTU)
  • Zebang Cheng · Xiang Li · Jue Wang · Xiaolong Wang (SZTU)

Undergraduate Students

University of Washington

  • Masumi Yao
  • Johnny Garnica

Other Institutions

  • Yusen Hu (Imperial → MSR@CMU)
  • Aike Shi (Georgia Tech)
  • Wei Liu (U-Michigan)
  • Zekai Li (NUS)

High-School Mentoring

  • Jasmine Liu · Aimee Zhang · Susan Wang (MIT)
  • Zejia Shao (UIUC)
  • Yulun Wu (Harvey Mudd)
  • Chiyu Zhou (Imperial)
  • Qiyuan Gu (UChicago)

Advisees won ISEF & Yau Science Awards; now at top universities.

Sponsored Projects (selected)

Key sponsored projects I led or contributed to during my CMU tenure (completed).

USDOT Mobility21 Logo

Mobility 21 UTC

Semantic perception module for AVs to detect & predict road-user behaviour (2022 – 2023).

CMU Logo

Digital Twins for Manufacturing Resiliency

ML-driven twin platform forecasting material-supply shocks via multimedia analytics (MFI ’23 RFP).

HICAN Logo

HICAN

Human‑Inclusive Dynamic Control and Navigation for next‑generation robotics (CMU–AIST Bridge project, 2024 – present).

NIST Logo

NIST PSIAP 2017

Geo-spatial fusion of live & crowd-sourced video for first-responder dashboards (2017 – 2019).

DARPA Logo

DARPA KAIROS

Schema-guided event understanding from video, image & text streams (Tech-lead, 2019 – 2024).

DARPA Logo

DARPA AIDA

Multi-hypothesis reasoning over multimodal sources to score competing narratives (2018 – 2023).

DARPA Logo

DARPA GAILA

Grounded AI that acquires language like children via visual context (2019 – 2022).

IARPA Logo

IARPA DIVA

Real-time multi-camera analytics with graph reasoning for critical-activity detection (2017 – 2021).

Media Coverage (selected)

Selected press featuring my CMU-era research & technology deployments.

Washington Post

Washington Post Apr 15 2021

“17 requests for backup in 78 minutes”
Provided crowd-counting analytics for the Capitol riot.

Washington Post

Washington Post Nov 21 2021

“Astroworld Festival Analysis”
Assisted in crowd-density reconstruction.

Washington Post

Washington Post Aug 25 2021

“Anatomy of a crackdown”
Multimedia gunshot & shooter analysis.

Washington Post

Washington Post Jun 2022

“How Shireen Abu Akleh was killed”
Contributed gunshot & shooter trajectory analysis.

Professional Service (selected)

Leadership & Organization

  • Area Chair — NAACL 2025
  • Session Chair — IEEE ICASSP 2023
  • Workshop Organizer — CVPR Anti-UAV Series 2025

Conference Committees

  • ACM Multimedia 2019 – 24
  • IEEE CVPR 2018 – 24
  • NeurIPS 2020 – 24

Additional (selected):

  • AAAI · ICLR · EMNLP · NAACL · IJCAI
  • ICCV · ECCV · WACV · ICASSP · ICMR · AISTATS
  • MM Asia · ICME · ICML

Journal Reviewing

IEEE: T-PAMI · T-MM · TIP · TNNLS · T-CSVT · T-Cybernetics · IoT J · J-STSP · SPL · Sensors(+Letters) · RA-L · OJ-SP · TCDS

ACM: TOMM · TECS

Others: IJCV · Pattern Recognition · Information Fusion · CVIU · etc.

Community & Outreach

  • Judge & Advisor — Regeneron ISEF 2019 – 24
  • Advisor — S.-T. Yau High-School Science Award 2019 – 24
  • Founder — CMU Annual Ice-Cream Social 2021 –
  • Organizer — CMU Postdoc Association 2021 – 23
  • Volunteer — Greater Pittsburgh Community Food Bank 2018 – 19