Home page of Ka Yee Yeung

kayee photo I am an Associate Professor in the Institute of Technology at University of Washington - Tacoma, and an Adjunct Associate Professor in the Department of Microbiology at University of Washington - Seattle. My research focuses on the development of machine learning tools, their application to computational biology and reproducibility of data analyses. My passion is method development for integration of multiple sources of big data. In particular, I am interested in the development of methods to effectively integrate heterogeneous high-throughput data sources in the construction of regulatory networks and the identification of biologically meaningful biomarkers.

I am a computer scientist by training (Ph.D. in Computer Science from University of Washington - Seattle under the supervision of Larry Ruzzo). My research spans multiple fields, including computational biology, big data analytics, cloud computing, virtualization, reproducibility of research, statistics and machine learning.

Publications   Software   Data   Teaching   Curriculum vitae

Email me
Last Update: 1/7/2017

Media Coverage

UW Tacoma's Dr. Ka Yee Yeung-Rhee, Ph.D, is working with Dr. Pamela Becker, a UW Medicine hematologist who specializes in blood cancer research. Dr. Yeung-Rhee, a big-data expert, is looking for statistical patterns in cancer patients' genetic mutations and the drugs they respond to. More

Madigan pairs with UW to improve medicine, by Suzanne Ovel, Northwest Guardian, Jan. 5, 2017. Full article

Research Highlights

BioDepot Reproducibility of Research: Our goal is to create easy-to-use and easy-to-deploy containerized tools for the creation, sharing and execution of reproducible bioinformatics workflows. Our latest work BiocImageBuilder allows biomedical scientists to create Docker containers in a point-and-click user interface.
Related manuscripts: PLOS One 2016, 11(4):e0152686, bioRxiv 099028, bioRxiv 099010, Gigascience 2017, 6(4)1-6, bioRxiv 144816,
Related GitHub: BioDepot, Bioconductor notebooks


BioSummer school Summer Institute for Research Education in Bioinformatics and Biostatistics: This is a 3-week training program for undergraduate students interested in learning the basics of bioinformatics, biostatistics, statistical computing and data science. This summer program consists of one week of lectures and labs, followed by hands-on project experience over a two-week period. At the end of the program, participants will present the results of their group projects, and network with existing professionals, faculty and graduate students in bioinformatics and biostatistics. For more information: BioSummer School 2017


LINCS DCIC NIH LINCS consortium: I serve as an external data science collaborator on a NIH-funded BD2K-LINCS Perturbation Data Coordination and Integration Center (DCIC). The NIH LINCS project is generating millions of experiments, measuring the cell's response to drug and genetic perturbations. We work with other investigators in the LINCS consortium to develop computational methods and tools to build predictive models of complex diseases and drug responses. Related work: Mathematical Biosciences and Engineering 2016, 13(6):1241-1251, arXiv:1602.06316, bioRxiv 099028, bioRxiv 099010


Yeung et al. PNAS 2011 Network inference from diverse genomics data: Interactions among genes and their gene products comprise a regulatory network. The goal of network inference is to generate testable hypotheses of gene-to-gene influences and subsequently design bench experiments to confirm network predictions. In the November 2011 issue of PNAS, Yeung and colleagues presented a methodology to construct gene regulatory networks from time series expression data in yeast, integrating various types of external biological knowledge available from public repositories. We generated microarray data measuring time-dependent gene-expression levels in 95 genotyped yeast segregants subjected to a drug perturbation. Our algorithm is capable of generating feedback loops and we showed that the inferred network recovered existing and novel regulatory relationships. In addition, we generated independent microarray data on selected deletion mutants to prospectively test network predictions. Related work: BMC Systems Biology 2012, BMC Systems Biology 2014, bioRxiv 099036

Nov 2013: Our proposal "Inference of Gene Networks Studying Human Cancers On The Cloud" won Microsoft's Windows Azure for Research Award.


Yeung et al. Bioinformatics 2012 From computational discoveries to translational research: The development of genetic predictors of clinical outcomes contributes to risk assessment in personalized medicine. In collaboration with Dr. Jerry Radich and Dr. Vivian Oehler at the Fred Hutchinson Cancer Research Center, we aim to develop computational models that can predict patient responses to therapy at diagnosis, which allow us to tailor therapy to individual patients of chonic myeloid leukemia (CML). We have previously applied Bayesian Model Averaging to a gene expression data studying the progression of CML, and identified 6 predictive genes in Blood 2009. Building on this work, we developed a network-driven approach that uses expert knowledge and predicted functional relationships to guide our search for signature genes in the March 2012 issue of Bioinformatics. We showed that our gene signatures of advanced phase CML are predictive of relapse even after adjustment for known risk factors associated with transplant outcomes.


Pattern discovery and feature selection: I have also contributed to the development and application of pattern discovery and feature selection in computational biology, including clustering algorithms and supervised learning methods.