Zhi-Qi Cheng, Ph.D.

Assistant Professor of Computer Science & Systems

Multimodal Generative AI · Embodied Intelligence · Intelligent Transportation

Ex‑CMU, Google, Microsoft · Intel Ph.D. Fellowship · IBM Outstanding Student Scholarship

  • 📧 Email: zhiqics@uw.edu
  • ☎︎ Phone:  412‑623‑9121
  • 📍 Office: Milgard Hall 221.6

Short Bio

Portrait of Zhi-Qi Cheng

I am a tenure‑track Assistant Professor of Computer Science & Systems, jointly appointed in the UW Graduate School and the Tacoma School of Engineering & Technology at the University of Washington. I direct the Multi‑Foundation Model Lab (MF‑Lab), advise Ph.D. students in the Computer Science & Systems Ph.D. program, and since June 2025 have served as a Visiting Faculty Researcher with the Meta AI AGI team. My research on Multimodal Generative AI, Embodied and Robotic Intelligence, and Intelligent Transportation—in close collaboration with the Paul G. Allen School of Computer Science & Engineering and the Washington State Transportation Center (TRAC)—has appeared in CVPR, ICCV, NeurIPS, ICLR, AAAI, IJCAI and ACM MM, and contributed to The Washington Post investigations that earned the 2022 Pulitzer Prize for Public Service.

Prior to UW I spent seven years at Carnegie Mellon University’s Language Technologies Institute, rising from Research Associate and Post‑doc to Project Scientist / Instructor under the mentorship of Prof. Alexander  G. Hauptmann and Prof. Teruko Mitamura, and collaborating closely with Prof. David R. Mortensen (KAIROS TA‑1 & Eratosthenes backend), and Prof. Alan W. Black (KAIROS TA‑2). I served as technical lead for DARPA KAIROS and KAIROS‑Plus Projects, and contributed to IARPA DIVA and NIST PSIAP Projects. Earlier, I completed research internships at Alibaba DAMO Academy, Google Brain, and Microsoft Research; and received the Intel Ph.D. Fellowship and IBM Outstanding Student Scholarship.

My research has been featured by The Washington Post, The New York Times, CBS News and other outlets, influencing public‑service reporting, safety analytics and mobility policy.

Interested in joining? I am recruiting Ph.D. students and postdoctoral researchers whose interests align with my research directions: Multimodal Generative AI, Embodied & Robotic Intelligence, and Intelligent Transportation. If you are excited about these areas, please email me or fill in this form.

News

Research Interests

Multimodal Foundation Models

Multimodal Foundation Models

AI that reads, sees, and generates across modalities.

  • Vision–language pre-training
  • Instruction & alignment
  • Emotion reasoning & affective computing
  • Efficient adaptation (LoRA/PEFT)
Embodied / Robotic Intelligence

Embodied / Robotic Intelligence

Agents that perceive, reason, and act safely.

  • Language-guided navigation & manipulation
  • Sim-to-real robustness
  • Multimodal sensing & scene understanding
  • Human-aware planning
Intelligent Transportation

Intelligent Transportation

Mobility as a living lab—measuring real-world impact.

  • Traffic-video analytics
  • Risk & driver-state modeling
  • Mobility simulation
  • Edge AI for real-time perception

Selected Publications

View my  full publication list on Google Scholar ↗

2025 Emotion-LLaMA – NeurIPS 24
Emotion-LLaMA

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

[Code][Colab]

Zebang Cheng; Zhi-Qi Cheng*; Jun-Yan He; Kai Wang; Yuxiang Lin; Zheng Lian; Xiaojiang Peng; Alexander Hauptmann

(NeurIPS 2024) Advances in Neural Information Processing Systems, 2024

Calibrated Robust Fine-Tuning

Towards Calibrated Robust Fine-Tuning of Vision-Language Models

[Code]

Changdae Oh; Hyesu Lim; Mijoo Kim; Dongyoon Han; Sangdoo Yun; Jaegul Choo; Alexander Hauptmann; Zhi-Qi Cheng*; Kyungwoo Song

(NeurIPS 2024; CMU 11775 course project) Advances in Neural Information Processing Systems, 2024

Human-Aware VL Navigation

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions

[Code][Project Page]

Heng Li; Minghan Li; Zhi-Qi Cheng*; Yifei Dong; Yuxuan Zhou; Jun-Yan He; Qi Dai; Teruko Mitamura; Alexander Hauptmann

(NeurIPS 2024 Spotlight) Advances in Neural Information Processing Systems, 2024

MetaDesigner

MetaDesigner: Advancing Artistic Typography Through AI-Driven, User-Centric, and Multilingual WordArt Synthesis

[Project Page]

Jun-Yan He; Zhi-Qi Cheng*; Chenyang Li; Jingdong Sun; Qi He; Wangmeng Xiang; Hanyuan Chen; Jin-Peng Lan; Xianhui Lin; Kang Zhu; et al.

(ICLR 2025; WordArt; ~1M visits) International Conference on Learning Representations, 2025

Dataset Distillation via DF

Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios

[Code]

Kai Wang; Zekai Li; Zhi-Qi Cheng*; Samir Khaki; Ahmad Sajedi; Ramakrishna Vedantam; Konstantinos N. Plataniotis; Alexander Hauptmann; Yang You

(CVPR 2025) IEEE/CVF Computer Vision and Pattern Recognition Conference, 2025

StableAnimator

StableAnimator: High-Quality Identity-Preserving Human Image Animation

[Code][Huggingface][Youtube]

Shuyuan Tu; Zhen Xing; Xintong Han; Zhi-Qi Cheng; Qi Dai; Chong Luo; Zuxuan Wu

(CVPR 2025) IEEE/CVF Computer Vision and Pattern Recognition Conference, 2025

Video-Grounded Dialogue

Video-Grounded Dialogue: New dataset and metric for event-driven QA.

[Code]

Wiradee Imrattanatrai; Masaki Asada; Kimihiro Hasegawa; Zhi-Qi Cheng; Ken Fukuda; Teruko Mitamura

(AAAI 2025 Oral) AAAI Conference on Artificial Intelligence, 2025

POPoS

POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search

[Code]

Chong-Yang Xiang; Jun-Yan He; Zhi-Qi Cheng; Xiao Wu; Xian-Sheng Hua

(AAAI 2025) AAAI Conference on Artificial Intelligence, 2025

ProMQA

ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding

[Code]

Kimihiro Hasegawa; Wiradee Imrattanatrai; Zhi-Qi Cheng; Masaki Asada; Susan Holm; Yuran Wang; Ken Fukuda; Teruko Mitamura

(NAACL 2025 Oral) Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, 2025

DyRoNet

DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous-Driving Streaming Perception

[Code]

[Project Page]

Xiang Huang; Zhi-Qi Cheng*; Jun-Yan He; Chenyang Li; Wangmeng Xiang; Baigui Sun

(WACV 2025) IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

UCDR-Adapter

UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision–Language Models for Universal Cross-Domain Retrieval

[Code]

Haoyu Jiang; Zhi-Qi Cheng*; Gabriel Moreira; Jiawen Zhu; Jingdong Sun; Bukun Ren; Jun-Yan He; Qi Dai; Xian-Sheng Hua

(WACV 2025) IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

BlackVIP++

Black-Box Visual Prompting for Robust Adaptation of Foundation Models

[Code]

Changdae Oh; Gyeongdeok Seo; Geunyoung Jung; Zhi-Qi Cheng*; Hosik Choi; Jiyoung Jung; Kyungwoo Song

(BlackVIP (CVPR'23 ext.), submitted to TPAMI) IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

2024 BlockGCN - CVPR 24
BlockGCN

BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition

[Code]

Yuxuan Zhou; Xudong Yan; Zhi-Qi Cheng*; Yan Yan; Qi Dai; Xian-Sheng Hua

(CVPR 2024) IEEE/CVF Computer Vision and Pattern Recognition Conference, 2024

MotionEditor

MotionEditor: Editing Video Motion via Content-Aware Diffusion

[Code]

Shuyuan Tu; Qi Dai; Zhi-Qi Cheng; Han Hu; Xintong Han; Zuxuan Wu; Yu-Gang Jiang

(CVPR 2024) IEEE/CVF Computer Vision and Pattern Recognition Conference, 2024

PROS

PROS: Prompting-to-Simulate Generalized Knowledge for Universal Cross-Domain Retrieval

[Code]

Kaipeng Fang; Jingkuan Song; Lianli Gao; Pengpeng Zeng; Zhi-Qi Cheng; Xiyao Li; Heng-Tao Shen

(CVPR 2024) IEEE/CVF Computer Vision and Pattern Recognition Conference, 2024

Shield

Shield: LLM-Driven Schema Induction for Predictive Analytics in EV Battery Supply-Chain Disruptions

[Code]

Zhi-Qi Cheng; Yifei Dong; Aike Shi; Wei Liu; Yuzhi Hu; Jason O’Connor; Alexander Hauptmann; Kate Whitefoot

(EMNLP 2024 Oral, Industry Track) Conference on Empirical Methods in Natural Language Processing, 2024

DCPT

DCPT: Darkness Clue-Prompted Tracking in Night-Time UAVs

[Code]

Jiawen Zhu; Huayi Tang; Zhi-Qi Cheng; Jun-Yan He; Bin Luo; Shihao Qiu; Shengming Li; Huchuan Lu

(ICRA 2024) IEEE International Conference on Robotics and Automation, 2024

FaceChain-ImagineID

FaceChain-ImagineID: High-Fidelity Diverse Talking Faces from Disentangled Audio

[Code]

Chao Xu; Yang Liu; Jiazheng Xing; Weida Wang; Mingze Sun; Jun Dan; Tianxin Huang; Siyuan Li; Zhi-Qi Cheng; Ying Tai; et al.

(CVPR 2024; Part of FaceChain (9.3K GitHub stars)) IEEE/CVF Computer Vision and Pattern Recognition Conference, 2024

WordArt Designer API

WordArt Designer API: Artistic Typography with LLMs on ModelScope

[Code]

Jun-Yan He; Zhi-Qi Cheng*; Chenyang Li; Jingdong Sun; Xianhui Lin; Xiaoyang Kang; Zengke Jin; Yusen Hu; Bin Luo; et al.

Spotlight @ NeurIPS ML4Creativity workshop

2023 DAMO-StreamNet - IJCAI 23
DAMO-StreamNet

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving

[Code]

Jun-Yan He; Zhi-Qi Cheng*; Chenyang Li; Wangmeng Xiang; Binghui Chen; Bin Luo; Yifeng Geng; Xuansong Xie

(IJCAI 2023) International Joint Conference on Artificial Intelligence, 2023

HDFormer

HDFormer: High-Order Directed Transformer for 3D Human Pose Estimation

[Code]

Hanyuan Chen; Jun-Yan He; Wangmeng Xiang; Zhi-Qi Cheng*; Wei Liu; Hanbing Liu; Bin Luo; Yifeng Geng; Xuansong Xie

(IJCAI 2023) International Joint Conference on Artificial Intelligence, 2023

Implicit Temporal Alignment

Implicit Temporal Modeling with Learnable Alignment for Video Recognition

[Code]

Shuyuan Tu; Qi Dai; Zuxuan Wu; Zhi-Qi Cheng; Han Hu; Yu-Gang Jiang

(ICCV 2023 Oral) International Conference on Computer Vision, 2023

ChartReader

ChartReader: A Unified Framework for Chart Derendering and Comprehension Without Heuristic Rules

[Code]

Zhi-Qi Cheng; Qi Dai; Alexander G. Hauptmann

(ICCV 2023) International Conference on Computer Vision, 2023

WordArt Designer

WordArt Designer: User-Driven Artistic Typography Synthesis Using Large Language Models

[Project Page]

Jun-Yan He; Zhi-Qi Cheng*; Chenyang Li; Jingdong Sun; Wangmeng Xiang; Xianhui Lin; Xiaoyang Kang; Zengke Jin; Yusen Hu; Bin Luo; et al.

(EMNLP 2023, Industry Track) Conference on Empirical Methods in Natural Language Processing, 2023

KeyPosS

KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration

[Code]

Xu Bao; Zhi-Qi Cheng*; Jun-Yan He; Wangmeng Xiang; Chenyang Li; Jingdong Sun; Hanbing Liu; Wei Liu; Bin Luo; Yifeng Geng; et al.

(ACM-MM 2023) ACM International Conference on Multimedia, 2023

Posynda

Posynda: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation

[Code]

Hanbing Liu; Jun-Yan He; Zhi-Qi Cheng*; Wangmeng Xiang; Qize Yang; Wenhao Chai; Gaoang Wang; Xu Bao; Bin Luo; Yifeng Geng; et al.

(ACM-MM 2023) ACM International Conference on Multimedia, 2023

≤ 2022 Rethinking Spatial Invariance – CVPR 22
Rethinking Spatial Invariance

Rethinking Spatial Invariance of Convolutional Networks for Object Counting

[Code]

Zhi-Qi Cheng; Qi Dai; Hong Li; Jingkuan Song; Xiao Wu; Alexander G. Hauptmann

(CVPR 2022; used for WaPo's Pulitzer coverage) IEEE/CVF Computer Vision and Pattern Recognition Conference, 2022

GSRFormer

GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention

[Code]

Zhi-Qi Cheng; Qi Dai; Siyao Li; Teruko Mitamura; Alexander Hauptmann

(ACM-MM 2022 Oral) ACM International Conference on Multimedia, 2022

Learning Spatial Awareness

Learning Spatial Awareness to Improve Crowd Counting

Zhi-Qi Cheng*; Jun-Xiu Li*; Qi Dai; Xiao Wu; Alexander G. Hauptmann

(ICCV 2019 (Oral); CMU's INF-Public-Safety-Tools) International Conference on Computer Vision, 2019

Video2Shop

Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images

Zhi-Qi Cheng; Xiao Wu; Yang Liu; Xian-Sheng Hua

(CVPR 2017; Alibaba Pailitao (~500M users)) IEEE/CVF Computer Vision and Pattern Recognition Conference, 2017

Video E-commerce++

Video eCommerce++: Toward Large-Scale Online Video Advertising

Zhi-Qi Cheng; Xiao Wu; Yang Liu; Xian-Sheng Hua

(ACM-MM 2017 Oral; ACM-SCF Best Student Paper) ACM International Conference on Multimedia, 2016