Gina-Anne Levow's Research Interests
I'm an assistant professor of Linguistics at the University of Washington. My research interests focus on natural language processing, particularly in discourse and dialogue, and spoken language systems. My research explores the role of prosody, including features such as pitch, loudness, and duration, in spoken language, to improve automatic language understanding.
Multilingual Tone and Intonation RecognitionPitch plays a crucial role in language understanding, whether at the level of word meaning in languages with lexical tone like Mandarin Chinese or isiZulu or at the level of discourse prominence as in English. However, the variability of these tonal forms and the scarcity of labeled training data has hampered the use of this prosodic information in language understanding systems. My work exploits improved modeling of context-based variation in a common multilingual framework to improve tone and pitch accent recognition in this wide range of languages. This research has demonstrated the some of the first uses of unsupervised and semi-supervised approaches to tone recognition, reducing the reliance on large quantities of labeled training data. We are currently investigating the learnability of tone from hyperarticulated child-directed speech. We also employ these enhanced tone recognition to support language learning, by providing automatic feedback on tone and intonation to learners of English and Chinese.
Prosody in Multi-modal DialogueIn face-to-face interaction, speakers draw on a wealth of multi-modal cues beyond the basic words in the speech signal to facilitate smooth, flexible communication. These cues include gesture, gaze, and posture as well as prosody. In collaboration with colleagues in Psychology, this research explores the integration of multi-modal communicative signals. In conjunction with the NSF-funded Cyberinfrastructure project, SIDGrid, we are developing tools to support the annotation, archiving, and analysis of multimodal data, employing techniques which exploit the distributed processing capabilities of the TeraGrid. Some of the these tools and algorithms will be used in advanced coursework in Discourse analysis, as part of a University of Chicago Academic Technology Project, as well as in support of research in dyadic rapport and deception detection. This research also aims to characterize the multimodal cues to high dyadic rapport or social resonance to support the automatic recognition of such states in a range of cultural contexts and to improve the naturalness of embodied conversational agents (with Jon Gratch at USC/ICT, Susan Duncan in Psychology at U of C, and Dan Loehr at MITre).
Prosody in Discourse and Dialogue StructureUnderstanding the structure of extended discourse and dialogue in terms of topics and turns facilitates automatic topic segmentation and summarization as well as contextually appropriate speech synthesis. Intonation plays a crucial role in signalling this structure in conjunction with lexical evidence. My research has explored the robust prosodic and text-based indicators to topic and turn changes in both English and Mandarin Chinese, a tone language, demonstrating the cross-lingual use of elevated pitch and intensity at topic and turn beginnings and reduction in these features as cues to finality. My current work focuses on the integration of this prosodic evidence with enhanced approaches to recognizing semantic coherence based on spectral embedding.
Spoken Corrections in Human-Computer DialogueMiscommunication in human-computer interaction is unavoidable, and thus recovery from such communicative failures is essential for the success of spoken dialogue systems. My research has explored the changes in speaking style, typically hyperarticulation, employed by users in the face of speech recognition errors. This work builds on Wizard-of-Oz studies of global and local hyperarticulation conducted in collaboration with Sharon Oviatt. Subsequent work employed data from prototype spoken language systems, including Sun's SpeechActs, to demonstrate that the increases in duration and pausing as well as changes in pitch contour can be used to automatically detect spoken corrections with a variety of supervised machine learning techniques (Levow 1998, Levow 2004).