William D. Lewis


Affiliate Assistant Professor

Department of Linguistics

University of Washington

For nearly 14 years, until December 2020, I worked on the Machine Translation Incubation team, most of the time embedded at Microsoft Research (and later in Microsoft Azure). Prior to that, I worked with Emily Bender and Fei Xia in founding and teaching within the Computational Linguistics Master of Science Program at UW. Before that I was faculty at California State University Fresno, where I helped found the Computational Linguistic and Cognitive Science Programs at the university (and served as initial chair for both programs).

My initial focus while at Microsoft Translator was to develop and ship new languages, zeroing in specifically on data: locating wherever it could be found and processing it so that it could be consumed by our MT training infrastructure. For many of our highest-resourced languages, we quickly found that we had far too much data to train models in any reasonable timeframe (if at all). Over a number of years, I worked on data filtration and reduction methods to combat this problem, resulting in numerous papers in that space.

Starting in 2010, my research interests took some interesting turns. In February of that year, a massive earthquake struck the country of Haiti. Within days, my team was asked by the Microsoft Emergency Response Team on the ground in Haiti if there was any MT software available for Haitian Kreyòl. At the time, there was no commercially available engine for Kreyòl. Acting quickly, we managed to ship the world's first commerical Haitian Kreyòl MT in less than 5 days, a feat that is still considered a record (e.g., developing and shipping MT from scratch starting with no data and no prior knowledge). The earthquake in Haiti and the subsequent MT project catalyzed my interest in the use of language technologies broadly for any kind of crisis response (my joint paper on the Crisis MT Cookbook is a good example of pre-emptive response to crises). For the past couple of years I have been focusing on a research project, with collaborators at UW and a few other universities, called Language Technologies for Crisis Preparedness and Response (LT4CPR). LT4CPR evolved directly from a larger initiative on developing resources for under-resourced languages in response to the COVID-19 pandemic (called TICO-19). These resources were used by Translators without Borders (now CLEAR Global) in affected communities. I will share more on LT4CPR as it evolves.

Developing MT for a low-resource language such as Haitian Kreyòl also prompted my growing interest in developing MT for low-resourced languages. Over the decade following the development of Haitian Kreyòl MT, I led efforts to develop and ship MT for a number of under-resourced languages, including White Hmong, Queretaro Otomí, Yucatec Maya, Welsh, Icelandic, Canadian French, and Inuktitut. (I have been interested in low-resource languages, specifically endangered and threatened ones, since grad school. The NSF-funded Online Database of Interlinear text (ODIN) project is a good example.)

In line with my interests in the use of language technologies in crisis response, I will be teaching a graduate seminar in Spring 2023 called, not surprisingly, Language Technologies for Crisis Response (LT4CR). The seminar will focus on the development and use of language technologies (i.e., NLP, ASR, MT, etc.) in crisis scenarios (e.g., earthquakes, floods, pandemics, refugee crises, etc.). More details to follow.


