ACL 2010 workshop on "NLP and Linguistics: Finding the Common Ground"

Uppsala, Sweden
July 16, 2010
Workshop website: http://faculty.washington.edu/fxia/nlpling2010/
Submission website: https://www.softconf.com/acl2010/NLPLing/

1. Workshop overview

This workshop aims at examining the relationship between linguistics and NLP and determining

  1. the new methods in incorporating linguistic knowledge into statistical systems to advance the state of the art of NLP, and
  2. the feasibility of using NLP techniques to acquire linguistic knowledge for a large number of languages and to assist linguistic studies.

Since early 1990s, with the advancement of machine learning methods and the availability of data resources such as treebanks and parallel corpora, data-driven approaches to NLP have made significant progress. The success of such data-driven approaches has cast doubt on the relevance of linguistics to NLP. Conversely, NLP techniques are rarely used to help linguistics studies. We believe that there is room to expand the involvement of linguistics in NLP, and likewise, NLP in linguistics, and believe that the cross-pollination of ideas between the disciplines can greatly benefit both fields.

One common approach to take advantage of linguistic knowledge is to train a statistical system on linguistically annotated data such as treebanks. Another approach is to represent linguistic knowledge as rules in a rule-based approach. This workshop is interested in research that goes BEYOND these common approaches and explores new methods in incorporating linguistic knowledge into statistical systems or using statistical systems for linguistic knowledge discovery.

The workshop will consist of one invited talk, 2-3 panels, group discussion, and paper/poster sessions.

2. Topics of Interest

The workshop is interested in research that explores new methods in incorporating linguistic knowledge into statistical systems or using statistical systems for linguistic knowledge discovery. These include but are not limited to the following themes:

[T1] Research that shows awareness of particular linguistic phenomena and its effects on statistical systems. For instance, being aware of syntactic phenomena such as scrambling, cross-serial dependencies and long-distance movement is very relevant to parsing (e.g., earlier work on using different grammar formalisms such as LTAG/CCG/HPSG/LFG to handle these phenomena or more recent work on non-projective dependency parsing). Similarly, being aware of word formation patterns (e.g., reduplication in Chinese) or allomorphic variation patterns (e.g., vowel harmony in Turkish) could help word segmentation and morphological analysis.

[T2] New methods in incorporating linguistic knowledge into statistical systems to improve the start of the art. (e.g., as rules in a preprocessing step, as linguistic features in a statistical system, as filters for pruning a search space, as priors in an objective function).

[T3] Research that demonstrates the feasibility of creating NLP systems to automatically acquire linguistic knowledge for a large number of languages. In order to make generalizations about language universals, linguists need to gather information about as many individual languages as possible. However, knowledge about most languages is not complete. Can we use NLP techniques to acquire knowledge (e.g., basic word order, case marking, tense, aspect, word formation rules, etc.) for hundreds of languages, which could help in the construction of resources such as WALS (http://wals.info) (Haspelmath et al., 2005)?

[T4] Research that demonstrates the benefits of using NLP techniques to help particular linguistic studies. For instance, given some language data, can the categorization of languages into families be automated? Can historical interactions between languages be identified automatically (e.g., areal effects and borrowings, but beyond just lexical borrowings)? Can NLP tools run over corpora of different dialects of the same language systematically identify differences in the two dialects (e.g., not just word choice differences, but also choice and frequency of particular constructions)?

On the submission form, please specify the theme that your paper falls into.

The systems described in the paper should be properly evaluated and compared with the start of the art.

3. Important dates

  • Apr 7, 2010 (23:59 Pacific Daytime Time): Submission deadline
  • May 6, 2010: Notification of Acceptance
  • May 16, 2010: Camera-ready paper
  • Jul 16, 2010: Workshop in Uppsala, Sweden

    4. Submission Information

    The papers should report original and unpublished research on topics of interest for the workshop. Accepted papers are expected to be presented at the workshop, and will be published in the workshop proceedings. They should emphasize obtained results rather than intended work, and should indicate clearly the state of completion of the reported results.

    A paper accepted for presentation at the workshop must not be presented or have been presented at any other meeting with publicly available proceedings.

    4.1 Submission Format: All submissions must be electronic in PDF and must be formatted using the ACL 2010 style files, which are available at http://www.acl2010.org/authors.html

    4.2 Maximum Length: The maximum length of a submitted paper is eight (8) pages of content, excluding references. One additional page is allowed for the References section. Thus, your PDF file is limited to eight (8) pages of content and nine (9) pages in total.

    4.3 Anonymous Review: Reviewing of papers will be double-blind. Therefore, the paper must not include the authors' names and affiliations. Furthermore, self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ...", must be avoided. Instead, use citations such as "Smith (1991) previously showed ...". Papers that do not conform to these requirements will be rejected without review.

    4.4 Double Submitting: Papers that have been or will be submitted to other meetings or publications must provide this information on the START online submission page. If NLPLing 2010 accepts a paper, authors must notify the program chairs *immediately* indicating which meeting they choose for presentation of their work. NLPLing 2010 cannot accept for publication or presentation work that will be (or has been) published elsewhere.

    4.5 Submission site: Authors must submit papers online at https://www.softconf.com/acl2010/NLPLing/

    4.6 Submission deadline is April 7, 2010 23:59 PDT. Papers submitted after the deadline will not be reviewed.

    5. ACL mentoring service

    ACL is providing a mentoring (coaching) service for authors from regions of the world where English is less emphasized as a language of scientific exchange. Many authors from these regions, although able to read the scientific literature in English, have little or no experience in writing papers in English for conferences such as the ACL meetings. The service will be arranged as follows. A set of potential mentors will be identified by Mentoring Service Chairs Björn Gambäck (SICS, Sweden and NTNU, Norway) and Diana McCarthy (Lexical Computing Ltd., UK), who will organize this service for ACL 2010. If you would like to take advantage of the service for a submission to this workshop, please upload your paper in PDF format using the paper submission software for the mentoring service available at: https://www.softconf.com/acl2010/acl2010mentor/

    The deadline for the mentoring service is six weeks before the workshop submission deadline. An appropriate mentor will be assigned to your paper and the mentor will get back to you no later than two weeks before the submission deadline.

    Please note that this service is for the benefit of the authors as described above. It is not a general mentoring service for authors to improve the technical content of their papers.

    Questions about the mentoring service should be referred to mentoring@acl2010.org

    6. Workshop Chairs

  • Lori Levin, CMU, USA
  • William Lewis, Microsoft Research, USA
  • Fei Xia, University of Washington, USA

    7. Program Committee

  • Anthony Aristar, LinguistList, USA
  • Jason Baldridge, University of Texas at Austin, USA
  • Timothy Baldwin, University of Melbourne, Australia
  • Dorothee Beermann, NTNU, Norway
  • Emily M. Bender, University of Washington, USA
  • Steven Bird, University of Melbourne, Australia
  • Chris Brew, Ohio State University, USA
  • Michael Collins, MIT, USA
  • Michael Cysouw, Max Planck Inst. for Evolutionary Anthropology, Germany,
  • Hal Daume III, Univ of Utah, USA
  • Markus Dickinson, University of Indiana, USA
  • Alexis Dimitriadis, Utrecht Institute of Linguistics OTS, The Netherlands
  • Helen Aristar Dry, LinguistList
  • Jason Eisner, JHU, USA
  • Erhard Hinrichs, Univ. of Tubingen, Germany
  • Chu-Ren Huang, The Hong Kong Polytechnic Univ., Hong Kong, China,
  • Julia Hockenmaier, UIUC, USA
  • Mark Johnson, Macquarie University, Australia
  • Kevin Knight, ISI, USA
  • Mark Liberman, Univ of Pennsylvania, USA
  • Dekang Lin, Google, USA
  • Paola Merlo, University of Geneva, Switzerland
  • Kathy McKeown, Columbia Univ, USA
  • Martha Palmer, University of Colorado, USA
  • Dragomir Radev, University of Michigan, USA
  • Owen Rambow, Columbia Univ., USA
  • Dipti Misra Sharma, IIIT, India
  • Richard Sproat, Oregon Health & Science University, USA
  • Mark Steedman, Edinburgh, UK
  • Michael White, Ohio State University, USA
  • Richard Wicentowski, Swarthmore College, USA,
  • Peter Wittenburg, Max Planck Inst. for Psycholinguistics, The Netherlands
  • Andreas Witt, Institut für Deutsche Sprache, Mannheim, Germany,
  • Nianwen Xue, Brandeis University, USA

    8. Contact info

    If you have any questions about the workshop, please contact us at nlpling2010@uw.edu.