Learning Tone

Human languages make crucial use of pitch - both height and movement - to convey information. This information ranges from basic lexical distinctions in languages with lexical tone to syntactic, information status, and pragmatic information across a wide range of languages. Furthermore, child language reearch has demonstrated that young children are sensitive to the pitch and other prosodic features that are accented in hyper-articulated child-directed speech and has suggested that the cues play a role in bootstrapping lexical and syntactic acquisition. However, despite the fundamental importance of this tone information, computational approaches to speech recognition and processing have largely viewed such pitch variation as a source of noise to be normalized away to improve recognition. Even speech recognition systems for tone languages such as Chinese often do not explicitly recognize tone as part of the word recognition process.

We believe that may of the difficulties impeding the use of tone stem from limited understanding of sources of pitch variation. While speaker-based normalization is well-accepted, only recently has phonetic research demonstrated the crucial role of context in the acoustic realization of underlying tone and provided a general explanatory mechanism, based on maximum speed of pitch change and tonal coarticulation. Many of these findings on tone languages have application across a wide range of language types. These coarticulatory effects operate across a broader context than that of the classic Hidden Markov Model recognizer.

To improve recognition of tone and intonation, this project aims to develop a broader-context articulatorily motivated representation that employs a common framework across a range of language and tone typologies. The research also aims to develop techniques to strongly leverage sparse manually annotated resources in semi-supervised and unsupervised learning frameworks. Furthermore, the project exploits and assess the distinctive characteristics of child-directed speech for learning and modeling. Research on the project has pursued each of these main investigative areas: modeling of broader context for tone recognition applied across a range of languages, experimentation with unsupervised and minimally supervised techniques leveraging sparse annotated data, and initial experimentation with analysis and exploitation of child-directed speech.