Resources For Studying Spoken English

If you are interested in studying the material presented here, then you will probably want the zipped image of this archive phonres.zip. (6 Meg).

IPA Phones and Phonemes of English

[bullet] For the full IPA alphabet with latest revisions, visit the IPA's own home site, where more information on fonts can also be obtained.

[bullet] You can hear Peter Ladefoged pronounce all of the vowel and consonant symbols on the basic IPA chart. These are .aiff sound clips. Alternatively, you can click the symbols on the chart and hear Paul Meier pronouncing them in a very nice Flash file. Meier is at the University of Kansas, Armstsrong at York University in Toronto.

[bullet] IPA Help from Summer Institute of Linguistics also provides clickable charts on line. These are .rm files but do not properly play on my computers. The entire IPA Help package can be downloaded onto Windows machines where it works like a champ (sounds in the package are .wav files).

[bullet] John Wells and the Dept. of Phonetics and Linguistics, University College London have made up a cassette and cd of all of the sounds of the IPA which they will happily send to you for a fairly nominal sum. Also located at UCL is information about the emerging European standard ipaascii type alphabet, namely SAMPA.

[bullet] Unicode IPA -—if you have a font such as lucida sans unicode you can transcribe and browse with it. Here is how to encode the Unicode IPA extensions (and others) as character entities in HTML with and without an editor. Microsoft has put out a full version of Arial (Arial Unicode) that supports almost all of the Unicode set. At 23.5 Meg,it is overkill if all you want is support for the IPA extensions.

[bullet] The vowel sounds of American English are here linked to the symbols of the International Phonetic Alphabet and the ipaascii "ARPAbet". The sounds are taken from the Ibiblio archive of Sun "phonemes" along with a few that I have added. While not super and not complete, they can perhaps provide a set of reference points. These are phones, and include a couple of sounds that are not contrastive (phonemic) in anyone's speech (indicated with parentheses).

[bullet] And here are the main consonant phones of English (with some non-phonemic allophones shaded in yellow.) Consonants, especially stopped ones, are very difficult to clip out of words as recognizable single segments. So here the consonant phones are exemplified with monosyllabic words, recorded by yrs truly.

[bullet] British and American Vowels The vowel sounds and IPA symbols of American English in the Vowel quadrilateral; then a contrasting set of vowels for British English in a second quadrilateral. (No-Java version). For another display of these differences, concentrating on vowel diphthongs and triphthongs, see Meier and Armstrong's page on same.

[bullet] Sounds in words and phrases: Ben Holmberg at Virginia Tech gives instructions for sounding American (does Homeland Security know?).

[bullet]The Varieties of English page by The Language Samples Project at the University of Arizona has descriptions and samples of distinguishing differences for seven varieties of North American English.

[bullet]John Newman at University of Alberta has put up sound clips of a New Zealand speaker pronouncing series of words that exhibit typical NZ vowel mergers. The files are .aiff format, but I have added RealAudio versions. The site also has much valuable information on New Zealand English.

[bullet]The late Donn Bayard of Otago University put up clips of readings of a paragraph by male and female speakers of Kiwi, Australian, British, and American English which were used to explore "Evaluating English Accents Worldwide." University students in various countries listened to the clips, evaluated the speakers for various personality and voice traits, and estimated the speaker's age, ethnicity, educational level, income, and social class. The American voices are rated surprisingly high on most of the traits worldwide. One would hope Weinberg's students would tackle the transcription of these samples! Here (soon) is a little page with the passage written out and a few of the samples converted from .mov format to mp3 format.

[bullet]Steven Weinberger at George Mason University has a collection of 306 readings of a paragraph by English learners from around the world. Each of these readings is transcribed with the IPA, so it is a great way to hear some of these sounds and learn to transcribe them. For some of them, there is also a list of "different" phonological rules that seem to be in use. (The samples are .mov files; here is his page with a few examples converted to mp3s.)

[bullet]alt--usage-english.org is collecting readings of a number of English texts by speakers of English from around the world. A welcome relief from Please call Stella.

Speech Waveform Analysis

tutorials

[bullet]Here is a basic tutorial in speech analysis from (in?) Sweden.

[bullet] The Center for Spoken Language Understanding (CSLU) at the University of Oregon also has an excellent tutorial on Spectrograph reading .

Pic of AJR [bullet] The most extensive and heavy-duty tutorials on line are Tony Robinson's notes from his Lenten Term 1998 Speech Analysis class. Not for the math-challenged, they come also as a well-illustrated 65 page Postscript file for closer study offline. This is the place to go after you have read the new, second edition of Peter Ladefoged's Introduction to Acoustic Phonetics and Keith Johnson's Acoustic and Auditory Phonetics . (Look, Ma--books!)

tools

[bullet]The SIL Speech Analyzer is the largest of their Speech Tools; it allows simultaneous display of a sound clip's transcription, amplitude wave form, pitch contour, and spectrogram. It is a Windows program, quite large, and very servicable on Win2K. The larger package includes a Speech Manager and Audio Converter.

[bullet]Alex Quarmby has put up a nifty realtime sound spectrograph Winspec for Windows95+. It takes input from a microphone and gives very fast, real time display (which however requires screen capture if you want to study it--no Save or Print buttons). Very nifty if you want to see the effects of changing pace or pitch as you speak, or otherwise modifying your speech.

[bullet] University College London offers the SFS (Sound File System) package of tools for speech analysis. The package has its own sound file format, way of doing headers, and combining different perspectives on a sound chunk. A nice feature is that the files you get into .sfs format can be used in the kpe vowel synthesizer, which allows you to emulate a sound clip by dragging formant frequencies up and down with a mouse. SFS has now been ported to Windows 95/98.

[bullet] Another solid sound analysis package for Unix/X/Linux with an emphasis on analysis and re-synthesis is mxv , which does spectrographic analysis and pitch contour extraction; in addition, it will do linear predictive coding and pulse vocoding and so is a nice introduction to that. Underdocumented as normally is the case, but take a look at cmix in the same directory. Development of MixViews seems to have slowed.

[bullet]Another new sound analyzer and editor for Linux is snd, a project of Stefan Schwandter, TU Wien, and included in The Stanford University Center for Computer Research in Music and Acoustics (CCRMA). Nice features.

[bullet] Not to be left out, MSDOS and Windows users can download CECIL or WINCECIL --Computerized Extraction of Components of Intonation in Language--which package also contains the current version of the IPA fonts from the Summer Institute of Linguistics.

[bullet]The CSLU Speech Toolkit (of which more below) includes an excellent Speech View tool (the successor to Lyre) that records wave files and displays wave forms, spectrographic analysis, and pitch contour.

[bullet]Perhaps the newest (2003) and most advanced package for speech analysis is Praat by Paul Boersma and David Weenink at the Institute for Phonetic Science at the University of Amsterdam. Praat is available for all platforms and is well-documented with many advanced features. Praised by Karen Chung in her class notes on Advanced Speech Analysis Tools.

[bullet] A spectro<->synthesizer! A sound spectrogram is a plot of the amount of energy at various frequencies over time. Speech sounds, especially vowels, have distinctive signatures (patterns of bands at certain frequencies). Vowels, for example, are identifiable by two or three bands of energy (called "formants") at certain intervals, or in the case of diphthongs, movement of the bands over time. The standard assumption is that a spectrogram is derived from a speech sound by a kind of filtering and analysis, and could not be used to regenerate the sound on which it was based.

Peter Meijer of Philips Research has made a Java applet that takes a grayscale .gif and assigned tones to each pixel--the higher on the y-axis the higher the pitch, the darker (or lighter, depending on how you set it) the louder and thus resynthesized the spectrogram. His sonifier will also take an audio input (in RIFF .wav format) and make a gif plot of it, so it is both a sound spectrograph and a synthesizer, but it will also take gifs made from any other sound spectrograph. Here (if you have Java enabled) it is, adapted to our purposes .

Text to Speech (TTS) Synthesizers

[bullet] For MSDOS Windows we have Winspeech , which comes in a somewhat limited freeware version, but is a workable descendent of Monologue for Windows. You can experiment with its "phonemic" alphabet and lexicon.

[bullet] Rsynth-2.0 , though also free, is a full-fledged speech synthesizer based on the vocal tract modeling of Dennis Klatt. Nick Ing-Simmons has put many options on the command line, introduced some lexical stress (as a pitch boost), and set up options for either an American English lexicon or a British one (each is about 9 meg when made up as .db files). Rsynth is the Unix counterpart to the moribund Monologue for Windows. It too is a text-to-talk speech synthesizer and, equipped with one of the lexicons (or both!) it does well from AA to zucchinis. A nice feature is that it can display its "phonemic" mapping ("transcription") of text. It "says" either text entered on the command line or from a file redirected to it. Thanks to a classic piece of interactive programming by Axel Belinfonte at the University of Twente, you can sample rsynth and enter text which it will convert to speech and send back to you (press Enter to send your text before moving the cursor out of the entry box).

RealSpeak lady[bullet]At the right is the generic face of Scansoft's RealSpeak synthesizer. Their demo (takes paste-in text) has voices for many languages, all female. Clicking on her image will get a sentence from the American English voice, Jennifer.

[bullet] And the ELIS Speech Lab at the University of Ghent demonstrates the work of a plug-in synthesizer Eurovocs with samples of it pronouncing German, Dutch, French, and (American) English texts.

[bullet] On parallel track, British Telecommunications offers interactive demo of their latest technology complete with a synth offering choice of male, female, and girl voices. Registration required, but apparently harmless.

[bullet] The Linguistic Data Consortium at University of Pennsylvania offers a TTS comparison site where you can run tests on various online synthesizers.

[bullet]The largest list of active synth links is Gregor Möhler's at the University of Stuttgart. It covers synthesizers for 25 languages.

[bullet] Diphones and PSOLA (Pitch Synchronous Overlap and Add) make a very natural sounding synth. Most of the phone company voices indicated here are of this kind (one supposes--the matter is cloaked in trade secrecy). Much less secretive is the French telecommunication firm Elan Informatique , which has put up full-scale impressive synths for several European languages complete with manuals.

[bullet] Thierry Dutoit has persuaded several people to open up these existing diphone-analyzed data bases for scholarly investigation (not TTS) via MBROLA . Good working software for (among others, Linux) can be found there along with a data base for French, Dutch, English,Estonian, German, and Portuguese, Romanian, Spanish, and 20 other languages, many with more than one voice, along with an invitation to join the project. His book is now available from Kluwer: Introduction to Text-to Speech Synthesis. For an excellent overview of TTS, see his Introduction to TTS

[bullet] Mike Hamilton has used MBROLA voices for other languages to speak English with those accents. To do this, he has to substitute the phonemes of one language for those of the other (e.g. "R" [French uvular r] substitutes for "r" and so on). He thus creates .pho files (see next) which are synthesized with the diphone voices (hence the voice's assimilations apply). The result is remarkably realistic. Here is one French voice describing the MBROLA Project.link to sound file Since MBROLA can impose an intonational contour, it can sing as well as speak. Not to be missed on Hamilton's site are the various German, Brazilian, French, Dutch, and Swedish voices singing the Doremi song from The Sound of Music and "I'm Popeye the Sailor Man."

Euler profile--pinched [bullet] MBROLA does not take text as an input, and hence is not a complete TTS. Rather it takes a ".pho" input file with the text already analysed into individual phonemes and with information added on length of segment and pitch. To be part of a TTS, MBROLA needs a front end that will take text and turn it into pho type representation. The Euler project uses different modules for different languages [French only, at present] and produces such a representation which it feeds to MBROLA.


[bullet] The Edinburgh Centre for Speech Technology Research has put up version 1.4.3 of Festival their diphone synthesizer done with Scheme. The distribution directory has voices both British and American. It is pretty well documented and invites a number of applications, not the least of which is to hook up with MBROLA as its front end. The best voice currently available for (Br.)English is result of this emerging collaboration--it comes from the en1 database which is the MBROLA-massaged version of the one provided them by the Festival researchers. It can be downloaded from MBROLA and installed as a voice in Festival, even the default voice. A wide selection of other voices and languages is now available, including the American Female voice speaking on "Please call Stella."[audio]

Festival supports sable, which is an XML-conforming markup language for marking up a text for prosodic features such as emphasis, pause, lowered pitch for backgrounding information, speed, volume, and other points. For a demo of SABLE markup and other Festival demos, the best site is Alan Black's at Carnegie Mellon.

[bullet] The Center for Spoken Language Understanding has put up version 2.0 of a huge Toolkit (minimum 31M) which gets Windows users into the speech synthesis and recognition game along with faces. It includes a modified version of Festival 1.4.2 working as a TTS with CLSU's own diphone voices; other voices can be downloaded (at upwards of 8M per voice). In addition to the excellent waveform analysis tools, they include several "faces" and head+torsos which lip sync to what one of the "voices" is saying. Moreover, it provides software to engage these talking heads in dialogues you script. That is, it includes some speech recognition. All wrapped up in TclTk, it is definitely worth a look, though a beta; all this 3D simulation needs graphics acceleration (of the GL variety) and the package will not install cleanly without it. Lip sync at 3 fps is definitely underwhelming.

[bullet] Excellent overviews of facial models of speaking cn be found at Perceptual Science Laboratory at UC Santa Cruz and at Haskins Lab in New Haven, Conn.

[bullet] FreeTTS is a new, very large package from Sun programmers that illustrates the use of the new Java Speech API to synthesize speech. It can use some MBROLA voices. There are things to explore here, as the speed, fundamental frequency and pitch range are all tunable on the fly.

[bullet]Janet Cahn of the MIT Media Laboratory has put up several demos of synth voices with prosody. One simulates emotional speech by varying one or more of 17 affect parameters (pitch, timing, articulation, volume, etc.). This work helps to explain why most synthesized speech is tinged with ennui.

[bullet]Bill Hollingsworth has written the Cynthia speech engine to test linguistic modelling of rhythm and intonation. Among other things, Cynthia syllabifies, assimilates across word boundaries, and can include certain allophones such as the flap/tap in (American) butter, little etc. He has given 'her' a Southern American accent.

Speech Recognition

[Decorative  inline jpeg] How to wreck a nice beach.

[bullet] Continuous  speechrecognizers are either concatenated word recognizers or they analyze words into possible phonemic strings which they match probabilistically against words in a lexicon. The commercial products ViaVoice (from IBM) and Naturally Speaking (from Dragon Systems; now part of ScanSoft, where it pairs with RealSpeak TTS) have acquired a following now in the era of much ram and fast processors, but operate by brute force for the most part.

[bullet] CSLU also develops and sells corpora of talk, especially telephone speech.

Archives and Comprehensive Lists

comp.speech button [bullet] Andrew Hunt maintained the comp.speech FAQ. Now somewhat long in the tooth.

[bullet] The Cambridge engineering archive maintained by Tony Robinson has much useful information as well as data, dictionaries, and sources for programs.

[bullet]Karen Chung's List of Phonetics URLs for Linguist List has many good things.

[bullet] The audio files formats maintained now by Chris Bagwell at Sprynet is valuable when you are tangled in unknown or unfamiliar sound formats and trying to convert them.

mamboface

[bullet] Mambo is the Perceptual Science Laboratory at UC Santa Cruz. They list many, many more sites than those given here, including sites on Natural Language Processing and the physiology of speech (and simulation of same). The figure on the right is their talking head Baldi.

[bullet] The IEEE listing of automatic speech synthesis and recogition (ASSR) sites is substantial and current.

[bullet] The Institute of Phonetics at Johann Wolfgang von Goethe-University has a good search front end riding atop a solid bibliography.

Online Tutorials and Courses

[bullet]Kevin Russell covers the basic points with sound clips and fonts and has useful transcription advice and practice.

[bullet]Stephen Luscombe at Stirling University has an on-line English phonology and phonetics course (RP dialect) with sounds samples and exercises.

[bullet]David Brett teaches English at the University of Sassari (It.) using Flash pages with Lucida Sans Unicode transcriptions. Much drag-and-drop action in the phonetics exercises. (The "English" is RP, more or less.)

[bullet] Karen Chung is adding pages on various topics to her Introduction to Phonetics and Phonetics II course pages. These courses assume a Taiwanese learner and so touch at various points on Chinese. Many links and some pleasant software.

[bullet] The Department of Linguistics at the University of Lausanne introduces the 60 most common IPA, vowels, and consonants (not just of English) with diagrams of articulation and sound clips (au format).

[bullet]Henry Rogers and Michael Stairs at University of Toronto have developed phthong which teaches, tests, and corrects the phonetic alphabet; actually, it has two forms, one which teaches the "American" modifications to the IPA and the other which teaches IPA by the book.

[bullet]IPA Help has nifty tests of your ability to match sounds (heard) to symbols.

[bullet]Also at U Toronto, Daniel Currie Hall has created Sammy, an interactive sagittal section drawing that changes the position of articulators in the mouth as you select different features; it also gives the phonetic symbol for the sound.

[bullet]John Maidment gives us POW (Prosody on the Web) in three tutorials (tone, tone group, focus, etc.)



[Decorative HR]

See the companion site GramResourcesfor links to online resources for the study of English syntax, and Lexical, Semantic, Textual Resources for resources for word meaning and usage, and textual cohesion.