Multimodal Foundation Models
Learning and evaluation for models that reason across language, vision, audio, maps, and structured knowledge.
Pronunciation: Zhì-qí Chéng ("Jih-Chee Chung")Chinese name: Chéng Zhì-qí.
I am a tenure-track Assistant Professor of Computer Science & Systems in the School of Engineering & Technology at the University of Washington Tacoma. I direct the Multimodal Intelligence Lab (MILab), where we study multimodal AI, embodied intelligence, and intelligent systems for open-world decision-making. I am also a member of the Graduate Faculty with doctoral endorsement through the University of Washington Graduate School.
My research explores how AI systems can learn, reason, and act from multimodal experience in complex real-world environments. At MILab, we develop foundation models, embodied agents, and deployable AI systems that connect perception, reasoning, planning, and action across visual, linguistic, temporal, and physical contexts, with applications in robotics, mobility, public safety, and human-centered decision support.
Before joining the University of Washington, I spent seven years at Carnegie Mellon University's School of Computer Science, primarily in the Language Technologies Institute, first as a research associate and later as a postdoctoral research associate and project scientist. My CMU work focused on multimodal understanding, event-centric reasoning, and large-scale AI systems that integrate video, language, audio, maps, and knowledge sources. During this period, Prof. Alexander G. Hauptmann was my long-term advisor at CMU, and Prof. Teruko Mitamura mentored my postdoctoral research.
From 2019 to 2024, I was a core technical lead for Carnegie Mellon's DARPA KAIROS system, a collaborative CMU effort in event understanding and schema-guided knowledge integration. I also contributed to related DARPA, IARPA, and NIST programs, including DARPA AIDA, DARPA GAILA, IARPA DIVA, and NIST PSIAP. Across these efforts, my work spanned multimodal perception, grounded language understanding, reasoning, and deployable intelligent systems.
My publications appear in venues such as NeurIPS, ICLR, CVPR, ICCV, ACL, AAAI, and ACM Multimedia. Related work has been featured in The Washington Post, The New York Times, and CBS News. I have also spent time at industry research labs, including Meta, Google, and Microsoft, on multimodal learning, visual understanding, and large-scale AI systems.
Multimodal AI for reliable perception, reasoning, and deployment.
My research develops AI systems that integrate visual, linguistic, spatial, and temporal evidence to support reliable understanding and decision-making. At MILab, we study foundation models, embodied agents, and deployable AI for robotics, mobility, public safety, and responsible decision support.
How can AI systems use multimodal evidence to understand, predict, and act reliably in complex environments?
Learning and evaluation for models that reason across language, vision, audio, maps, and structured knowledge.
Agents that connect perception, memory, prediction, planning, and interaction in dynamic physical environments.
AI systems for traffic and mobility intelligence, public safety sensing, secure perception, and robust operation in constrained environments.
Selected collaborations connect this agenda to applied work in visual evidence analysis, mobility intelligence, public safety, and responsible decision support.
Courses and research supervision in AI, robotics, graphics, and multimodal systems.
I teach courses that connect core computer science foundations with current advances in AI, robotics, computer graphics, and multimodal systems. My teaching emphasizes technical depth, hands-on implementation, empirical evaluation, reproducible experimentation, and open-ended projects. Current UW students across Seattle, Tacoma, and Bothell can enroll via UW cross-campus rules. Undergraduates follow UW cross-campus registration requirements; graduate students are not subject to cross-campus registration restrictions.
I also mentor undergraduate and M.S. students through the Multimodal Intelligence Lab (MILab), independent study, supervised research, thesis projects, and capstone projects. Students across all three UW campuses can pursue research opportunities and research credit through TCSS 499, TCSS 600, TCSS 700, or TCSS 702 with instructor approval. Students interested in research opportunities should contact me to discuss research interests and potential projects.
Ph.D. advising and MILab opportunities in multimodal AI.
As a member of the UW Graduate Faculty with doctoral endorsement, I advise Ph.D. students through the UW Graduate School and serve on doctoral supervisory committees across eligible UW graduate programs. My primary Ph.D. recruiting pathway is the CSS Ph.D. program. I welcome inquiries from prospective Ph.D. students, postdoctoral researchers, and research assistants interested in multimodal AI, embodied intelligence, robotics, mobility intelligence, and responsible AI.
Students and researchers interested in MILab opportunities should complete the MILab interest form and email me at zhiqics@uw.edu with a brief note describing their background, research interests, and potential fit. Opportunities depend on research fit, preparation, project needs, funding availability, and mentoring capacity in a given quarter.
Competitive Ph.D. applicants may be considered for nomination to UW fellowships or GSFEI awards, subject to program procedures, eligibility requirements, nomination criteria, and funding availability.
Representative papers and systems grouped by research theme.