HCOMP 2013 Workshop

Scaling Speech, Language Understanding and Dialogue through Crowdsourcing


November 9, 2013 - Palm Springs, California USA

Workshop at Conference on Human Computation and Crowdsourcing 2013


Venue: The workshop is held at HCOMP 2013 in the Madera meeting room of the Renaissance Palm Springs Hotel

Workshop program:

  • 1:00pm - Opening

  • 1:10 pm - Invited Talks

  • 1:10pm - Gina-Anne Levow, "Crowdsourcing Spoken Dialog Systems Evaluation: From Judgments to Predictions"

    1:40pm - Jeanne Parson, "Ask the crowd: Which voice?"

    2:10pm - Nancy Chang, "Extracting meaningful wisdom from crowds: Problems and prospects for crowdsourcing semantic data"

  • 2:40pm - Oral Sessions

  • 2:40pm - Paper 1 - "Crowdsourcing Transcription Beyond Mechanical Turk", Haofeng Zhou, Denys Baskov, Matthew Lease

    3:00pm - Paper 2 - "Conversations in the Crowd: Collecting Data for Task-Oriented Dialog Learning", Walter S. Lasecki, Ece Kamar, Dan Bohus, Eric Horvitz

    3:20pm - Paper 3 - "Does History Help? An Experiment on How Context Affects Crowdsourcing Dialogue Annotation", Elnaz Nouri

  • 3:40pm - Coffee break

  • 4:00pm - Panel Discussion: Does Crowdsourcing really work? Successful and less successful stories, challenges and learnings.

    Panel: Dan Bikel, Michael Tjalve, Daniela Braga, Ece Kamar, Annika Hämäläinen

  • 5:00pm - Closing

The main goals of the workshop are:

  • To create more awareness in the speech community to crowdsourcing as a privileged approach to scale speech and language technology (languages, scenarios, domains)

  • To present and discuss E2E methodologies and best practices for data collection, testing and usability studies using crowdsourcing

  • To discuss methodologies for data quality control and validation

  • To join efforts and develop synergies among the speech and language community in the area of crowdsourcing

  • To create an interest group in this field for speech, language and dialogue technologies

The workshop will take place at 13:00-17:00 and will cover talks, paper presentations and a panel discussion.

Important Dates:

  • October 1: Deadline for papers submission

  • October 8: Notification of acceptance/rejection

  • October 15: Camera ready submission

  • November 9: Workshop

Paper submission formats:

  • Long papers and position papers: up to 8 pages

  • Work in progress, demo papers: up to 4 pages

  • Submissions must be in AAAI format and submitted as PDF. The accepted papers will be published by the AAAI Press Proceedings.

  • Papers should be submitted to Michael.Tjalve@microsoft.com


The main topic is crowdsourcing for speech, language understanding and dialogue, covering (but not limited to) the following aspects:

  • Data collection experiments, techniques and applications using crowdsourcing applied to speech and language understanding

  • Human annotation, labeling, data processing workflows and impact on E2E products/applications

  • White papers on state of the art, best practices, challenges, learnings, opportunities

  • Innovative approaches, models and methodologies using crowdsourcing

  • Data quality control and anti-spam processes, algorithms, techniques

  • Games with a purpose and gamification techniques applied to speech and dialogue

  • Usability studies and experiences using crowdsourcing


Since 2006, when Jeff Howe used the term for the first time (in "The rise of Crowdsourcing", http://www.wired.com/wired/archive/14.06/crowds.html), Crowdsourcing has rapidly grown and expanded across different scientific and business areas as one of the most successful strategies to scaling businesses and processes in a cost effective and high speed pace. The industry landscape in this field has expanded in many different directions and business models, from crowd/cloud labor to platforms and tools to enable micro-tasks for human computing or distributed knowledge building. The overwhelming expansion of Crowdsourcing in the industry can be tracked at crowdsourcing.org. Another good resource for crowdsourcing activities with a more academic focus can be found at http://ir.ischool.utexas.edu/crowd/.

Since 2005, when Amazon launched its micro-task crowdsourcing platform, Mechanical Turk, the community of researchers in computer science, linguistics, speech technology, etc., have been changing their data collection, data labeling/annotation, data analysis and user studies paradigms. Research can finally scale and allow more robust models, better results, more accurate assessments and conclusions, more sophisticated user studies, simply because researchers can access more data in a rapid and cost effective way. At the same time, industries in the field have been effectively leveraging crowdsourcing and significantly reducing time to market.

Despite all the well-known advantages, it is also true that Crowdsourcing may raise a few concerns regarding data quality control, availability and quality of the crowds, dealing with spam, methodologies and best practices to launch a task in crowdsourcing and how to optimally use the data that comes back from the crowds.

In the Speech area, and in particular in the Language Understanding, Language Generation and Dialogue technologies, there is still a huge gap between the investments that industry and academia make in crowdsourcing: industry is very well organized and represented in this field, whereas academia is lacking behind.

After the success of the Special Session on Crowdsourcing for Speech Processing, held in Interspeech 2011, in Florence, the first attempt to bring attention to this topic in a world-wide scale speech-related event, there has been no other event focusing on the Speech area with similar impact.

Nevertheless, crowdsourcing is becoming a hot topic in the speech and dialogue areas, as can be seen by scattered academic activity from researchers in the field:



The main reason to hold this workshop as part of HCOMP is precisely to fill the gap of this area in the regular academic events.

Invited speakers:

Gina-Anne Levow, "Crowdsourcing Spoken Dialog Systems Evaluation: From Judgments to Predictions".

As spoken dialog systems become more pervasive, effective methods of evaluation become increasingly important. However, building such models depends not only on automatically extractable factors but also crucially on human assessments of dialog quality. Unfortunately, assessment in laboratory settings is costly, time-consuming, small-scale, and somewhat unrealistic, while real users of deployed systems rarely want to complete quality surveys. To overcome these constraints, we present a crowdsourcing methodology for the collection of user judgments of dialog system quality. The approach addresses rapid large-scale rating, semi-automatic task validation, and assessment of worker reliability. Analysis of crowdsourcing results demonstrates efficiency and cost-effectiveness, while comparison of ratings among workers and between workers and experts indicates good levels of agreement for key criteria. Lastly, we demonstrate that these crowdsourced judgments can support the development of effective predictors of spoken dialog system quality for new dialogs and systems.

Joint work with Helen Meng, Zhaojun Yang, Irwin King, Baichuan Li, and Yi Zhu..

Gina-Anne Levow, University of Washington (levow@u.washington.edu), has applied crowdsourcing to spoken dialog system evaluation. She also co-organized a special session on crowdsourcing at Interspeech 2011 and is co-editor and contributor to a forthcoming volume on crowdsourcing for speech processing. She is Assistant Professor at the University of Washington..

Contact information: levow@u.washington.edu.

Jeanne Parson, "Ask the crowd: Which voice?".

Whether choosing a voice talent to create a TTS voice, harvesting perceptual information from listeners, or validating technology advancements in a TTS engine, crowdsourcing provides an opportunity to affordably and quickly collect feedback from "regular" people - people likely to be users of TTS voices. In this presentation, Ms. Parson will share insights on methodologies developed over the past 18 months. Some highlights are: Using surveys rather than single-HITs; The crowd as a contributor for voice talent evaluation; Catching and holding the interest of judges; Tips for spam detection in "boutique" surveys; Perception of a persona and/or personality traits; and Evolving an automated platform for frequently used surveys..

Jeanne Parson, Microsoft Corporation (USA), has worked in audio design of systems that include spoken dialogue since 1996. After 14 years of working with educational learning systems, she joined the Tellme group at Microsoft in 2010. Now part of the Bing Information Platform Design Studio, she currently leads a small team which serves as the design arm for Microsoft's TTS development group. In early 2012, she began partnering with Microsoft senior researchers and crowd-sourcing advocates to leverage crowd-sourcing for feedback on human and TTS voices. Ms. Parson's educational background is rooted in music: she holds a BA in Music Performance from CSUFresno and an MFA in Electronic Music from Mills College (1990)..

Nancy Chang, "Extracting meaningful wisdom from crowds: Problems and prospects for crowdsourcing semantic data".

What can the wisdom of the crowds tell us about meaning? A variety of natural language technologies rely on semantically annotated data, but such data is notoriously hard to come by. Crowdsourcing holds great promise of overcoming this data bottleneck by tapping into the expertise of native speakers. Reliably capturing textual meaning also, however, poses representational and methodological challenges significantly more complex than in other domains. In this talk I will present an overview of these challenges and highlight lessons learned from some empirical investigations..

Nancy Chang earned her doctorate in computer science at UC Berkeley, focusing on computational cognitive models of language learning and use. She then served as a research associate at Sony Computer Science Laboratory and the Université Sorbonne Nouvelle, both in Paris, and as a visiting lecturer at Gothenburg University, Sweden. Since joining Google in 2012, she has worked on semantic parsing for conversational search and commonsense reasoning for natural language understanding..


  • Daniela Braga, VoiceBox (homepage)

  • Michael Tjalve, University of Washington / Microsoft (homepage)

  • Ece Kamar, Microsoft (homepage)

  • Gina-Anne Levow, University of Washington (homepage)

  • Maxine Eskenazi, Carnegie Mellon University (homepage)

  • Daniel Bikel, Google (homepage)

  • Jon Barker, University of Sheffeid (homepage)

  • Nikko Ström, Amazon (homepage)

  • Christoph Draxler, Ludwig Maximilian University of Munich (homepage)