Multimodal Foundation Models
Learning, adaptation, and evaluation for models that reason across video, language, audio, sensors, maps, and structured knowledge.
Pronunciation: Zhì-qí Chéng ("Jih-Chee Chung")Chinese name: Chéng Zhì-qí.
I am a tenure-track Assistant Professor of Computer Science & Systems in the School of Engineering & Technology at the University of Washington Tacoma. I direct the Multimodal Intelligence Lab (MILab), where we study multimodal AI, embodied intelligence, and intelligent systems for open-world decision-making. I am also a member of the Graduate Faculty with doctoral endorsement through the University of Washington Graduate School.
My research asks how AI systems can learn, reason, and act from multimodal experience in complex real-world environments. At the intersection of multimodal foundation models, embodied AI, and intelligent decision-making, I develop systems that connect perception, reasoning, planning, and action across visual, linguistic, temporal, and physical contexts. At MILab, we advance the foundations of multimodal embodied intelligence while developing deployable AI technologies for mobility, public safety, and human-centered decision support, integrating research with project-based education and student mentoring to create responsible, reproducible, and real-world-ready AI systems.
Before joining the University of Washington, I spent seven years at Carnegie Mellon University’s School of Computer Science, primarily in the Language Technologies Institute, where I held research associate, postdoctoral researcher, and project scientist roles. My work focused on multimodal understanding, event-centric reasoning, and large-scale AI systems that integrate video, language, audio, maps, and knowledge sources for complex real-world environments. I was fortunate to work closely with Prof. Alexander G. Hauptmann and Prof. Teruko Mitamura, whose mentorship shaped my research trajectory, system-building experience, and contributions to large-scale AI programs.
From 2019 to 2024, I served as a core technical and system-delivery lead for Carnegie Mellon’s DARPA KAIROS system, contributing to multimodal reasoning systems for event understanding and schema-guided knowledge integration. I also contributed to U.S. government-funded AI programs including DARPA AIDA, KAIROS-Plus, IARPA DIVA, and NIST PSIAP.
My work has appeared in leading AI conferences including NeurIPS, ICLR, CVPR, ICCV, ACL, AAAI, and ACM Multimedia. My research has been featured by The Washington Post, The New York Times, and CBS News. I have also held research appointments or internships at Meta AI, Alibaba DAMO Academy, Google Research, and Microsoft Research, and have been recognized with the Intel Ph.D. Fellowship and the CSC–IBM Outstanding Student Scholarship.
Multimodal AI for understanding, reasoning, and acting in the real world.
My research asks how AI systems can learn from multimodal evidence, reason under uncertainty, and support action in complex environments. At MILab, we develop foundation models, embodied agents, and deployable AI systems for robotics, mobility, public safety, and responsible decision support.
How can AI systems turn noisy multimodal evidence into reliable understanding, prediction, and action?
Learning, adaptation, and evaluation for models that reason across video, language, audio, sensors, maps, and structured knowledge.
Agents that connect perception, memory, prediction, planning, and interaction in dynamic physical environments.
AI systems for traffic and mobility intelligence, public-safety sensing, secure perception, and robust deployment under real-world constraints.
Selected projects translate this agenda into public-interest settings where visual, audio, spatial, and temporal evidence is noisy, incomplete, and high-stakes.
Courses and research supervision in AI, robotics, graphics, and multimodal systems.
At UW Tacoma, I teach courses that connect core computer science foundations with current advances in AI, robotics, computer graphics, and multimodal systems. My teaching emphasizes technical depth, hands-on implementation, empirical evaluation, reproducible experimentation, and open-ended projects. Current UW students across Seattle, Tacoma, and Bothell can enroll in these courses through UW cross-campus registration, subject to course capacity, prerequisites, registration periods, and home-campus requirements.
I also mentor undergraduate and M.S. students through the Multimodal Intelligence Lab (MILab), independent study, supervised research, thesis projects, and capstone projects. Current UW students across Seattle, Tacoma, and Bothell can pursue research credit with instructor approval through TCSS 499, TCSS 600, TCSS 700, or TCSS 702. Undergraduates should meet home-campus credit and registration-period requirements; graduate students have no cross-campus registration restrictions. Students should contact me before registration to discuss research fit, project scope, deliverables, and quarter timeline; some credits may require a faculty number or departmental support.
As a member of the University of Washington Graduate Faculty with doctoral endorsement, I advise Ph.D. students and serve on doctoral supervisory committees across eligible UW graduate programs. My primary Ph.D. recruiting pathway is the Computer Science & Systems — School of Engineering & Technology (Tacoma) — PhD program. I welcome inquiries from prospective Ph.D. students, postdoctoral researchers, and research assistants interested in multimodal AI, embodied intelligence, robotics, mobility intelligence, and responsible AI.
Prospective Ph.D. applicants should apply through the CSS Ph.D. program. Postdoctoral researchers and research assistants are encouraged to contact me when their background aligns with MILab’s research agenda and current projects. Interested applicants should complete the MILab Research Interest Form and email me at zhiqics@uw.edu with a brief note describing their background, research interests, and potential fit with the lab. Opportunities depend on research fit, preparation, project needs, funding availability, and mentoring capacity.
Competitive Ph.D. applicants may be considered for nomination to University of Washington Graduate School fellowships and GSFEI Top Scholar Award, subject to program procedures, eligibility requirements, and funding availability.
Representative papers grouped by models, agents, systems, and sponsored research reports.