If you are interested in studying the material presented here, then you will probably want the zipped image of this archive phonres.zip. (6 Meg).
For the full IPA alphabet with latest revisions, visit the IPA's own home site, where more information on fonts can also be obtained.
You can hear Peter Ladefoged pronounce all
of the vowel and consonant symbols on the basic IPA chart. These are .aiff
sound clips. Alternatively, you can click the symbols on the chart and hear Paul Meier pronouncing them in a very nice Flash file. Meier is at the University of Kansas, Armstsrong at York University in Toronto.
IPA Help from
Summer Institute of Linguistics also provides clickable charts on line. These
are .rm files but do not properly play on my computers. The entire IPA Help
package can be downloaded onto Windows machines where it works like a champ
(sounds in the package are .wav files).
John Wells and the Dept. of Phonetics and Linguistics, University College London
have made up a cassette and cd of
all of the sounds of the IPA which they will happily send to you for a fairly
nominal sum. Also located at UCL is information about the emerging European
standard ipaascii type alphabet, namely SAMPA.
Unicode IPA -—if you have a font such as lucida sans unicode you
can transcribe and browse with it. Here is how to encode the
Unicode IPA extensions (and others) as character entities in HTML with and
without an editor. Microsoft has put out a full version of Arial (Arial Unicode)
that supports almost all of the Unicode set. At 23.5 Meg,it is overkill if all
you want is support for the IPA extensions.
The vowel
sounds of American English are here linked to the symbols of the
International Phonetic Alphabet and the ipaascii
"ARPAbet". The sounds are taken from the Ibiblio archive of Sun "phonemes"
along with a few that I have added. While not super and not complete, they can
perhaps provide a set of reference points. These are phones, and include a
couple of sounds that are not contrastive (phonemic) in anyone's speech
(indicated with parentheses).
And here are the main consonant
phones of English (with some non-phonemic allophones shaded in yellow.)
Consonants, especially stopped ones, are very difficult to clip out of words as
recognizable single segments. So here the consonant phones are exemplified with
monosyllabic words, recorded by yrs truly.
British
and American Vowels The vowel sounds and IPA symbols of American English in
the Vowel quadrilateral; then a contrasting set of vowels for British English in
a second quadrilateral. (No-Java version). For another display of these differences, concentrating on vowel diphthongs and triphthongs, see Meier and Armstrong's page on same.
Sounds in words and phrases: Ben Holmberg at Virginia Tech gives instructions for sounding American (does Homeland Security know?).
The Varieties of English
page by The Language Samples Project at the University of Arizona has
descriptions and samples of distinguishing differences for seven
varieties of North American English.
John Newman at University of Alberta has put up sound
clips of a New Zealand speaker pronouncing series of words that exhibit typical
NZ vowel mergers. The files are .aiff format, but I have added RealAudio
versions. The site also has much valuable information on New Zealand English.
The late Donn Bayard of Otago University put up clips of readings of a paragraph by
male and female speakers of Kiwi, Australian, British, and American English
which were used to explore "Evaluating
English Accents Worldwide." University students in various countries
listened to the clips, evaluated the speakers for various personality and voice
traits, and estimated the speaker's age, ethnicity, educational level, income,
and social class. The American voices are rated surprisingly high on most of the
traits worldwide. One would hope Weinberg's students would tackle the
transcription of these samples! Here (soon) is a little page with the
passage written out and a few of the samples converted from .mov format to
mp3 format.
Steven Weinberger at George Mason University has a collection of 306 readings of a paragraph by English learners from around the world. Each of these readings is transcribed with the
IPA, so it is a great way to hear some of these sounds and learn to transcribe
them. For some of them, there is also a list of "different" phonological rules
that seem to be in use. (The samples are .mov files; here is his page with a few
examples converted to mp3s.)
alt--usage-english.org is collecting readings of a number of English texts by speakers of English from around the world. A welcome relief from Please call Stella.
Here is a basic
tutorial in speech analysis from (in?) Sweden.
The Center for Spoken Language Understanding (CSLU) at the University of Oregon
also has an excellent tutorial on Spectrograph
reading .
The most
extensive and heavy-duty tutorials on line are Tony Robinson's notes from his Lenten Term 1998 Speech
Analysis class. Not for the math-challenged, they come also as a
well-illustrated 65
page Postscript file for closer study offline. This is the place to go after
you have read the new, second edition of Peter Ladefoged's Introduction to
Acoustic Phonetics and Keith Johnson's Acoustic and Auditory Phonetics
. (Look, Ma--books!)
The
SIL Speech Analyzer is the largest of their Speech Tools; it allows
simultaneous display of a sound clip's transcription, amplitude wave form, pitch
contour, and spectrogram. It is a Windows program, quite large, and very
servicable on Win2K. The larger package includes a Speech Manager and Audio
Converter.
Alex Quarmby has put up a nifty realtime sound spectrograph Winspec
for Windows95+. It takes input from a microphone and gives very fast, real time
display (which however requires screen capture if you want to study it--no Save
or Print buttons). Very nifty if you want to see the effects of changing pace or
pitch as you speak, or otherwise modifying your speech.
University College London
offers the SFS (Sound File System) package of tools for speech analysis. The
package has its own sound file format, way of doing headers, and combining
different perspectives on a sound chunk. A nice feature is that the files you
get into .sfs format can be used in the kpe
vowel synthesizer, which allows you to emulate a sound clip by dragging
formant frequencies up and down with a mouse. SFS has now been ported to Windows
95/98.
Another solid sound analysis package for Unix/X/Linux with an emphasis on
analysis and re-synthesis is mxv , which does
spectrographic analysis and pitch contour extraction; in addition, it will do
linear predictive coding and pulse vocoding and so is a nice introduction to
that. Underdocumented as normally is the case, but take a look at cmix
in the same directory. Development of MixViews seems to have slowed.
Another new sound analyzer and editor for Linux is snd,
a project of Stefan Schwandter, TU Wien, and included in The Stanford
University Center for Computer Research in Music and Acoustics (CCRMA).
Nice features.
Not to be left out, MSDOS and Windows users can download CECIL or WINCECIL --Computerized
Extraction of Components of Intonation in Language--which package also contains
the current version of the IPA fonts from the Summer Institute of
Linguistics.
The CSLU Speech Toolkit (of which more below) includes
an excellent Speech View tool (the successor to Lyre) that records wave files
and displays wave forms, spectrographic analysis, and pitch contour.
Perhaps the newest (2003) and most advanced package for speech analysis is Praat by Paul Boersma and David Weenink at the Institute for Phonetic Science at the University of Amsterdam. Praat is available for all platforms and is well-documented with many advanced features. Praised by Karen Chung in her class notes on Advanced Speech Analysis Tools.
A spectro<->synthesizer! A sound spectrogram is a plot of the
amount of energy at various frequencies over time. Speech sounds, especially
vowels, have distinctive signatures (patterns of bands at certain frequencies).
Vowels, for example, are identifiable by two or three bands of energy (called
"formants") at certain intervals, or in the case of diphthongs, movement of the
bands over time. The standard assumption is that a spectrogram is derived from a
speech sound by a kind of filtering and analysis, and could not be used to
regenerate the sound on which it was based.
Peter Meijer of Philips Research has made a Java applet that takes a grayscale .gif and assigned tones to each pixel--the higher on the y-axis the higher the pitch, the darker (or lighter, depending on how you set it) the louder and thus resynthesized the spectrogram. His sonifier will also take an audio input (in RIFF .wav format) and make a gif plot of it, so it is both a sound spectrograph and a synthesizer, but it will also take gifs made from any other sound spectrograph. Here (if you have Java enabled) it is, adapted to our purposes .
For MSDOS Windows
we have Winspeech , which comes in a somewhat
limited freeware version, but is a workable descendent of Monologue for Windows.
You can experiment with its "phonemic" alphabet and lexicon.
Rsynth-2.0
, though also free, is a full-fledged speech synthesizer based on the vocal
tract modeling of Dennis Klatt. Nick Ing-Simmons has put many options on the
command line, introduced some lexical stress (as a pitch boost), and set up
options for either an American English lexicon or a British one (each is about 9
meg when made up as .db files). Rsynth is the Unix counterpart to the moribund
Monologue for Windows. It too is a text-to-talk speech synthesizer and, equipped
with one of the lexicons (or both!) it does well from AA to zucchinis. A nice
feature is that it can display its "phonemic" mapping ("transcription") of text.
It "says" either text entered on the command line or from a file redirected to
it. Thanks to a classic piece of interactive programming by Axel Belinfonte at
the University of Twente, you can sample rsynth and enter text which it will
convert to speech and send back to you (press Enter to send your text before
moving the cursor out of the entry box).

At the right is the generic face of Scansoft's RealSpeak synthesizer.
Their demo (takes paste-in text) has voices for many languages, all
female. Clicking on her image will get a sentence from the American
English voice, Jennifer.
And the ELIS Speech Lab at the University of Ghent demonstrates the work of a
plug-in synthesizer Eurovocs
with samples of it pronouncing German, Dutch, French, and (American) English
texts.
On parallel track, British Telecommunications offers interactive demo of their latest technology
complete with a synth offering choice of male, female, and girl voices.
Registration required, but apparently harmless.
The Linguistic Data Consortium at University of Pennsylvania offers a TTS comparison site where you can
run tests on various online synthesizers.
The largest list of active synth links is Gregor
Möhler's at the University of Stuttgart. It covers synthesizers for 25
languages.
Diphones
and PSOLA (Pitch Synchronous Overlap and Add) make a very natural sounding
synth. Most of the phone company voices indicated here are of this kind (one
supposes--the matter is cloaked in trade secrecy). Much less secretive is the
French telecommunication firm Elan Informatique
, which has put up full-scale impressive synths for several European
languages complete with manuals.
Thierry Dutoit has persuaded several people to open up these existing
diphone-analyzed data bases for scholarly investigation (not TTS) via MBROLA .
Good working software for (among others, Linux) can be found there
along with a data base for French, Dutch, English,Estonian, German, and
Portuguese, Romanian, Spanish, and 20 other languages, many with more
than one voice, along with an invitation to join the project. His book
is now available from Kluwer: Introduction
to Text-to Speech Synthesis. For an excellent overview of TTS, see his
Introduction to
TTS
Mike Hamilton has used MBROLA voices for other languages to speak
English with those accents. To do this, he has to substitute the
phonemes of one language for those of the other (e.g. "R" [French
uvular r] substitutes for "r" and so on). He thus creates .pho files
(see next) which are synthesized with the diphone voices (hence the
voice's assimilations apply). The result is remarkably realistic. Here
is one French voice describing the MBROLA Project.
Since MBROLA can impose an intonational contour, it can sing as well as speak. Not to be missed on Hamilton's site are the various German, Brazilian, French, Dutch, and Swedish voices singing the Doremi song from The Sound of Music and "I'm Popeye the Sailor Man."
MBROLA does not take text as an input, and hence is not a complete TTS.
Rather it takes a ".pho" input file with the text already analysed into
individual phonemes and with information added on length of segment and
pitch. To be part of a TTS, MBROLA needs a front end that will take
text and turn it into pho type representation. The Euler project uses
different modules for different languages [French only, at present] and
produces such a representation which it feeds to MBROLA.
The Edinburgh Centre for Speech Technology Research has put up version 1.4.3 of
Festival their
diphone synthesizer done with Scheme. The distribution directory has
voices both British and American. It is pretty well documented and
invites a number of applications, not the least of which is to hook up
with MBROLA as its front end. The best voice currently available for
(Br.)English is result of this emerging collaboration--it comes from
the en1 database which is the MBROLA-massaged version of the one
provided them by the Festival researchers. It can be downloaded from
MBROLA and installed as a voice in Festival, even the default voice. A
wide selection of other voices and languages is now available,
including the American Female voice speaking on "Please call Stella."
Festival supports sable, which is an XML-conforming markup language for marking up a text for prosodic features such as emphasis, pause, lowered pitch for backgrounding information, speed, volume, and other points. For a demo of SABLE markup and other Festival demos, the best site is Alan Black's at Carnegie Mellon.
The Center for Spoken Language Understanding has put up version 2.0 of a huge Toolkit
(minimum 31M) which gets Windows users into the speech synthesis and
recognition game along with faces. It includes a modified version of
Festival 1.4.2 working as a TTS with CLSU's own diphone voices; other
voices can be downloaded (at upwards of 8M per voice). In addition to
the excellent waveform analysis tools, they include several "faces" and
head+torsos which lip sync to what one of the "voices" is saying.
Moreover, it provides software to engage these talking heads in
dialogues you script. That is, it includes some speech recognition. All
wrapped up in TclTk, it is definitely worth a look, though a beta; all
this 3D simulation needs graphics acceleration (of the GL variety) and
the package will not install cleanly without it. Lip sync at 3 fps is
definitely underwhelming.

Excellent overviews of facial models of
speaking cn be found at Perceptual Science
Laboratory at UC Santa Cruz and at Haskins Lab in New Haven,
Conn.
FreeTTS
is a new, very large package from Sun programmers that illustrates the
use of the new Java Speech API to synthesize speech. It can use some
MBROLA voices. There are things to explore here, as the speed,
fundamental frequency and pitch range are all tunable on the fly.
Janet Cahn of the MIT Media Laboratory has put up several demos of synth voices with prosody.
One
simulates emotional speech by varying one or more of 17 affect
parameters (pitch, timing, articulation, volume, etc.). This work helps
to explain why most synthesized speech is tinged with ennui.
Bill Hollingsworth has written the Cynthia speech engine
to test linguistic modelling of rhythm and intonation. Among other
things, Cynthia syllabifies, assimilates across word boundaries, and
can include certain allophones such as the flap/tap in (American) butter, little etc. He has given 'her' a Southern American accent.
How to wreck a nice beach.
Continuous speechrecognizers are either concatenated word recognizers or
they analyze words into possible phonemic strings which they match
probabilistically against words in a lexicon. The commercial products ViaVoice
(from IBM) and Naturally Speaking (from Dragon Systems; now part of ScanSoft, where it pairs with RealSpeak TTS)
have acquired a following now in the era of much ram and fast processors, but
operate by brute force for the most part.
CSLU also develops and sells corpora of
talk, especially telephone speech.
Andrew Hunt maintained the
comp.speech FAQ. Now somewhat long in the tooth.
The Cambridge engineering archive maintained by Tony Robinson has much useful information as well
as data, dictionaries, and sources for programs.
Karen
Chung's List of Phonetics URLs for Linguist List has many good things.
The audio files
formats maintained now by Chris Bagwell at Sprynet is valuable when you are
tangled in unknown or unfamiliar sound formats and trying to convert them.
Mambo is the
Perceptual Science Laboratory at UC Santa Cruz. They list many, many more sites
than those given here, including sites on Natural Language Processing and the
physiology of speech (and simulation of same). The figure on the right is their
talking head Baldi.
The IEEE listing of automatic speech synthesis and recogition (ASSR) sites is substantial and current.
The Institute of Phonetics at Johann Wolfgang von Goethe-University has a good
search
front end riding atop a solid bibliography.
Kevin Russell covers
the basic points with sound clips and fonts and has useful transcription
advice and practice.
Stephen Luscombe at Stirling University has an on-line
English phonology and phonetics course (RP dialect) with sounds samples and exercises.
David Brett teaches English at the University of Sassari (It.) using Flash pages with Lucida Sans Unicode transcriptions. Much drag-and-drop action in the phonetics exercises. (The "English" is RP, more or less.)
Karen Chung is adding pages on various topics to her Introduction to Phonetics and Phonetics II
course pages. These courses assume a Taiwanese learner and so touch at
various points on Chinese. Many links and some pleasant software.
The Department of
Linguistics at the University of Lausanne introduces the 60 most common IPA,
vowels, and consonants (not just of English) with diagrams of articulation and
sound clips (au format).
Henry Rogers and Michael Stairs at University of Toronto have developed
phthong which
teaches, tests, and corrects the phonetic alphabet; actually, it has two forms,
one which teaches the "American" modifications to the IPA and the other which
teaches IPA by the book.
IPA Help
has nifty tests of your ability to match sounds (heard) to symbols.
Also at U Toronto, Daniel Currie Hall has created Sammy, an
interactive sagittal section drawing that changes the position of articulators
in the mouth as you select different features; it also gives the phonetic symbol
for the sound.
John Maidment gives us POW (Prosody on the
Web) in three tutorials (tone, tone group, focus, etc.)
See the companion site GramResourcesfor links to online resources for the study of English syntax, and Lexical, Semantic, Textual Resources for resources for word meaning and usage, and textual cohesion.