Resources For Studying Spoken English

IPA Phones and Phonemes of English

For the full IPA alphabet with latest revisions, visit the IPA's own home site, where more information on fonts can also be obtained.

You can hear Peter Ladefoged pronounce all of the vowel and consonant symbols on the basic IPA chart. These are .aiff sound clips. For a perhaps more robust alternative link with mp3 clips, try this at the University of Victoria. Alternatively, you can click the symbols on the chart and hear Paul Meier pronouncing them in a very nice Flash file. Meier is at the University of Kansas, Armstsrong at York University in Toronto.

IPA Help from Summer Institute of Linguistics also provides clickable charts on line. These are .rm files but do not properly play on my computers. The entire IPA Help package can be downloaded onto Windows machines where it works like a champ (sounds in the package are .wav files).

John Wells and the Dept. of Phonetics and Linguistics, University College London have made up a cassette and cd of all of the sounds of the IPA which they will happily send to you for a fairly nominal sum. Also located at UCL is information about the emerging European standard ipaascii type alphabet, namely SAMPA.

Unicode IPA -—if you have a font such as lucida sans unicode you can transcribe and browse with it. Here is how to encode the Unicode IPA extensions (and others) as character entities in HTML with and without an editor. Microsoft has put out a full version of Arial (Arial Unicode) that supports almost all of the Unicode set. At 23.5 Meg,it is overkill if all you want is support for the IPA extensions.

The vowel sounds of American English are here linked to the symbols of the International Phonetic Alphabet and the ipaascii "ARPAbet". The sounds are taken from the Ibiblio archive of Sun "phonemes" along with a few that I have added. While not super and not complete, they can perhaps provide a set of reference points. These are phones, and include a couple of sounds that are not contrastive (phonemic) in anyone's speech (indicated with parentheses).

And here are the main consonant phones of English (with some non-phonemic allophones shaded in yellow.) Consonants, especially stopped ones, are very difficult to clip out of words as recognizable single segments. So here the consonant phones are exemplified with monosyllabic words, recorded by yrs truly.

Variation in the sounds of English

Mark Liberman at the University of Pennsylvania introduces the uses of John Wells' lexical sets of key words to compare Received Pronunciation and Gen American.

The 24 lexical sets are thoroughly employed in Wikipedia's overly informative IPA chart for English Dialects, where the dialects compared cover all of the main 'English-speaking' countries.

[bullet] Accents of English from Around the World The University of Edinburgh has a large collection of parallel pronunciations of individual words

[bullet] British and American Vowels The vowel sounds and IPA symbols of American English in the Vowel quadrilateral; then a contrasting set of vowels for British English in a second quadrilateral. (No-Java version). For another display of these differences, concentrating on vowel diphthongs and triphthongs, see Meier and Armstrong's page on same.

[bullet] Sounds in words and phrases: Ben Holmberg at Virginia Tech gives instructions for sounding American (does Homeland Security know?).

[bullet]The Varieties of English page by The Language Samples Project at the University of Arizona has descriptions and samples of distinguishing differences for seven varieties of North American English.

[bullet]Eric Armstrong at York University in Canada has recorded a loving description of the sounds of an Irish Dialect sample.

[bullet]Sounds Familiar? is the major site in the British Library with substantial samples of accents and notes of dialect features from all around the UK, including the Six Counties.

[bullet]John Newman at University of Alberta has put up sound clips of a New Zealand speaker pronouncing series of words that exhibit typical NZ vowel mergers. The files are .aiff format, but I have added RealAudio versions. The site also has much valuable information on New Zealand English.

[bullet]Donn Bayard of Otago University put up clips of readings of a paragraph by male and female speakers of Kiwi, Australian, British, and American English which were used to explore "Evaluating English Accents Worldwide." University students in various countries listened to the clips, evaluated the speakers for various personality and voice traits, and estimated the speaker's age, ethnicity, educational level, income, and social class. The American voices are rated surprisingly high on most of the traits worldwide. One would hope Weinberg's students would tackle the transcription of these samples! Here (soon) is a little page with the passage written out and a few of the samples converted from .mov format to mp3 format.


[bullet]Steven Weinberger at George Mason University has a collection of over 2000 readings of a paragraph (currently 2138) by English learners from around the world who have been resident in the USA for various amounts of time. Each of these readings is transcribed with the IPA, so it is a great way to hear some of these sounds and learn to transcribe them. For some of them, there is also a list of "different" phonological rules that seem to be in use. (The samples are .mov files; here is his page with a few examples converted to mp3s.)

[bullet]Paul Meier of the University of Kansas has built a comparable archive of recorded readings of a set passage ("Comma gets a cure"—which uses all of the key words in John Wells' lexical sets) by speakers of English all around the world. Only ten of the many, many readings are transcribed, but each page includes a recorded clip of unscripted speech as well.

[bullet] is collecting readings of a number of English texts by speakers of English from around the world. A welcome relief from Please call Stella.

Speech Waveform Analysis


[bullet]Here is a basic tutorial in speech analysis from (in?) Sweden.

[bullet] The Center for Spoken Language Understanding (CSLU) at the University of Oregon also has an excellent tutorial on Spectrograph reading .

Pic of AJR [bullet] The most extensive and heavy-duty tutorials on line are Tony Robinson's notes from his Lenten Term 1998 Speech Analysis class. Not for the math-challenged, they come also as a well-illustrated 65 page Postscript file for closer study offline. This is the place to go after you have read the new, second edition of Peter Ladefoged's Introduction to Acoustic Phonetics and Keith Johnson's Acoustic and Auditory Phonetics . (Look, Ma--books!)


[bullet]The SIL Speech Analyzer is the largest of their Speech Tools; it allows simultaneous display of a sound clip's transcription, amplitude wave form, pitch contour, and spectrogram. It is a Windows program, quite large, and very servicable on Win2K. The larger package includes a Speech Manager and Audio Converter.

[bullet]Alex Quarmby has put up a nifty realtime sound spectrograph Winspec for Windows95+. It takes input from a microphone and gives very fast, real time display (which however requires screen capture if you want to study it--no Save or Print buttons). Very nifty if you want to see the effects of changing pace or pitch as you speak, or otherwise modifying your speech.

[bullet] University College London offers the SFS (Sound File System) package of tools for speech analysis. The package has its own sound file format, way of doing headers, and combining different perspectives on a sound chunk. A nice feature is that the files you get into .sfs format can be used in the kpe vowel synthesizer, which allows you to emulate a sound clip by dragging formant frequencies up and down with a mouse. SFS has now been ported to Windows 95/98.

[bullet] Another solid sound analysis package for Unix/X/Linux with an emphasis on analysis and re-synthesis is mxv , which does spectrographic analysis and pitch contour extraction; in addition, it will do linear predictive coding and pulse vocoding and so is a nice introduction to that. Underdocumented as normally is the case, but take a look at cmix in the same directory. Development of MixViews seems to have slowed.

[bullet]Another new sound analyzer and editor for Linux is snd, a project of Stefan Schwandter, TU Wien, and included in The Stanford University Center for Computer Research in Music and Acoustics (CCRMA). Nice features.

[bullet] Not to be left out, MSDOS and Windows users can download CECIL or WINCECIL --Computerized Extraction of Components of Intonation in Language--which package also contains the current version of the IPA fonts from the Summer Institute of Linguistics.

[bullet]The CSLU Speech Toolkit (of which more below) includes an excellent Speech View tool (the successor to Lyre) that records wave files and displays wave forms, spectrographic analysis, and pitch contour.

[bullet]Perhaps the newest (2003) and most advanced package for speech analysis is Praat by Paul Boersma and David Weenink at the Institute for Phonetic Science at the University of Amsterdam. Praat is available for all platforms and is well-documented with many advanced features. Praised by Karen Chung in her class notes on Advanced Speech Analysis Tools.

[bullet] A spectro<->synthesizer! A sound spectrogram is a plot of the amount of energy at various frequencies over time. Speech sounds, especially vowels, have distinctive signatures (patterns of bands at certain frequencies). Vowels, for example, are identifiable by two or three bands of energy (called "formants") at certain intervals, or in the case of diphthongs, movement of the bands over time. The standard assumption is that a spectrogram is derived from a speech sound by a kind of filtering and analysis, and could not be used to regenerate the sound on which it was based.

Peter Meijer of Philips Research has made a Java applet that takes a grayscale .gif and assigned tones to each pixel--the higher on the y-axis the higher the pitch, the darker (or lighter, depending on how you set it) the louder and thus resynthesized the spectrogram. His sonifier will also take an audio input (in RIFF .wav format) and make a gif plot of it, so it is both a sound spectrograph and a synthesizer, but it will also take gifs made from any other sound spectrograph. Here (if you have Java enabled) it is, adapted to our purposes .

Text to Speech (TTS) Synthesizers

[bullet] Rsynth-2.0 , though also free, is a full-fledged speech synthesizer based on the vocal tract modeling of Dennis Klatt. Nick Ing-Simmons has put many options on the command line, introduced some lexical stress (as a pitch boost), and set up options for either an American English lexicon or a British one (each is about 9 meg when made up as .db files). Rsynth is the Unix counterpart to the moribund Monologue for Windows. It too is a text-to-talk speech synthesizer and, equipped with one of the lexicons (or both!) it does well from AA to zucchinis. A nice feature is that it can display its "phonemic" mapping ("transcription") of text. It "says" either text entered on the command line or from a file redirected to it. Thanks to a classic piece of interactive programming by Axel Belinfonte at the University of Twente, you can sample rsynth and enter text which it will convert to speech and send back to you (press Enter to send your text before moving the cursor out of the entry box).

RealSpeak lady[bullet]At the right is the generic face of Nuance's RealSpeak synthesizer. Their demo (limited to less than Tweet length) has voices for many languages and several accents of English, including Irish and Indian, Scots and Australian, and South African. Clicking on her image will get you a sound clip of the American English voice, Jill, reading a sentence. (harmless registration required).

Or you can paste much longer texts into Nuance's Demo Vocalizer 5 with pretty much the same array of voices and languages. Sound capture is not easy.

Or you can check out ATT Research's Natural Voices demo or their retail product at Wizzard Media with the best supply of voices and qualities.

[bullet]These sites give a good indication of the state of the art, which is dramatically advanced over that reflected in this clip of the RealSpeech ladybot from 2002 For a large set of samples from synths of previous decades which were originally recorded by Dennis Klatt and released with his 1987 "Review of text-to-speech conversion for English." see this page at Indiana University or this page at the CSLI at the Oregon Graduate Institute.

[bullet]The largest list of active synth links is Gregor Möhler's at the University of Stuttgart. It covers synthesizers for 25 languages.

[bullet] Since MBROLA can impose an intonational contour, it can sing as well as speak. Not to be missed on Hamilton's site are the various German, Brazilian, French, Dutch, and Swedish voices singing the Doremi song from The Sound of Music and "I'm Popeye the Sailor Man." CSLI has more advanced singing.

Euler profile--pinched [bullet] MBROLA does not take text as an input, and hence is not a complete TTS. Rather it takes a ".pho" input file with the text already analysed into individual phonemes and with information added on length of segment and pitch. To be part of a TTS, MBROLA needs a front end that will take text and turn it into pho type representation. The Euler project uses different modules for different languages [French only, at present] and produces such a representation which it feeds to MBROLA.

[bullet] The Edinburgh Centre for Speech Technology Research has put up version 2 of Festival with Demos, where different methods of synthesis may be compared (75 characters at a time).

Festival supports sable, which is an XML-conforming markup language for marking up a text for prosodic features such as emphasis, pause, lowered pitch for backgrounding information, speed, volume, and other points. For a demo of SABLE markup and other Festival demos, the best site is Alan Black's at Carnegie Mellon. These demos are of Festival 1.4.0, (1999) which was still doing diphone synthesis. A somewhat newer Festival package but still using diphone voices is available at Center for Spoken Language Understanding (CSLI).But the Edinburgh site gives Festival at its most current state of development.

[bullet] The Center for Spoken Language Understanding has put up version 2.0 of a huge Toolkit (minimum 31M) which gets Windows users into the speech synthesis and recognition game along with faces. It includes a modified version of Festival 1.4.2 working as a TTS with CLSU's own diphone voices; other voices can be downloaded (at upwards of 8M per voice). In addition to the excellent waveform analysis tools, they include several "faces" and head+torsos which lip sync to what one of the "voices" is saying. Moreover, it provides software to engage these talking heads in dialogues you script. That is, it includes some speech recognition. All wrapped up in TclTk, it is definitely worth a look, though a beta; all this 3D simulation needs graphics acceleration (of the GL variety) and the package will not install cleanly without it. Lip sync at 3 fps is definitely underwhelming. Development seems to have stopped in 2004.

[bullet] Excellent overviews of facial models of speaking cn be found at Perceptual Science Laboratory at UC Santa Cruz and at Haskins Lab in New Haven, Conn.

[bullet] FreeTTS is a new, very large package from Sun programmers that illustrates the use of the new Java Speech API to synthesize speech. It can use some MBROLA voices. There are things to explore here, as the speed, fundamental frequency and pitch range are all tunable on the fly.

[bullet]Janet Cahn of the MIT Media Laboratory has put up several demos of synth voices with prosody. One simulates emotional speech by varying one or more of 17 affect parameters (pitch, timing, articulation, volume, etc.). This work helps to explain why most synthesized speech is tinged with ennui.

[bullet]Bill Hollingsworth has written the Cynthia speech engine to test linguistic modelling of rhythm and intonation. Among other things, Cynthia syllabifies, assimilates across word boundaries, and can include certain allophones such as the flap/tap in (American) butter, little etc. He has given 'her' a Southern American accent.

Speech Recognition

[Decorative  inline jpeg] How to wreck a nice beach.

[bullet] Continuous  speechrecognizers are either concatenated word recognizers or they analyze words into possible phonemic strings which they match probabilistically against words in a lexicon. The commercial products ViaVoice (from IBM) and Naturally Speaking (from Dragon Systems; now part of ScanSoft, where it pairs with RealSpeak TTS) have acquired a following now in the era of much ram and fast processors, but operate by brute force for the most part.

[bullet] CSLU also develops and sells corpora of talk, especially telephone speech.

Archives and Comprehensive Lists

comp.speech button [bullet] Andrew Hunt maintained the comp.speech FAQ. Now somewhat long in the tooth.

[bullet] The Cambridge engineering archive maintained by Tony Robinson has much useful information as well as data, dictionaries, and sources for programs.

[bullet]Karen Chung's List of Phonetics URLs for Linguist List has many good things.

[bullet] The audio files formats maintained now by Chris Bagwell at Sprynet is valuable when you are tangled in unknown or unfamiliar sound formats and trying to convert them.


[bullet] Mambo is the Perceptual Science Laboratory at UC Santa Cruz. They list many, many more sites than those given here, including sites on Natural Language Processing and the physiology of speech (and simulation of same). The figure on the right is their talking head Baldi.

[bullet] The IEEE listing of automatic speech synthesis and recogition (ASSR) sites is substantial and current.

[bullet] The Institute of Phonetics at Johann Wolfgang von Goethe-University has a good search front end riding atop a solid bibliography.

Online Tutorials and Courses

[bullet]Sure the BBC has a few Pronunciation Tips on Learning English with exercises.

[bullet]Kevin Russell covers the basic points with sound clips and fonts and has useful transcription advice and practice.

[bullet]David Brett teaches English at the University of Sassari (It.) using Flash pages with Lucida Sans Unicode transcriptions. Much drag-and-drop action in the phonetics exercises. (The "English" is RP, more or less.)

[bullet] Karen Chung is adding pages on various topics to her Introduction to Phonetics and Phonetics II course pages. These courses assume a Taiwanese learner and so touch at various points on Chinese. Many links and some pleasant software.

[bullet] The Department of Linguistics at the University of Lausanne introduces the 60 most common IPA, vowels, and consonants (not just of English) with diagrams of articulation and sound clips (au format).

[bullet]Henry Rogers and Michael Stairs at University of Toronto have developed phthong which teaches, tests, and corrects the phonetic alphabet; actually, it has two forms, one which teaches the "American" modifications to the IPA and the other which teaches IPA by the book.

[bullet]IPA Help, from the Summer Institute of Linguistics, is a downloadable Windows package, but a limited demo is on line.

[bullet]Also at U Toronto, Daniel Currie Hall has created Sammy, an interactive sagittal section drawing that changes the position of articulators in the mouth as you select different features; it also gives the phonetic symbol for the sound.

[bullet]John Maidment gives us POW (Prosody on the Web) in three tutorials (tone, tone group, focus, etc.). He also provides Phonetic Flash for the RP minded.

[bullet]Ted Power gives all kinds of exercises (corrected online).

[Decorative HR] This is one of four sites of (on-line) Resources for English language study maintained by George Dillon, University of Washington. The others are:

Corpus Resources Syntax Resources Semantics Resources