Research
My research area is
computational linguistics, and my research covers a wide range of NLP tasks
including morphological analysis, part-of-speech tagging, grammar extraction
and grammar generation, treebank development, machine
translation, named-entity recognition, and information extraction. Selected
publications are
here,
and my grants are here.
Below are some of my current
projects, which are supported by grants from NSF, Microsoft, and UW:
- The
RiPLes project: The goal is to create a
framework that allows the rapid development of resources for resource-poor
languages and then use the automatically created resources to perform
cross-lingual study on a large number of languages to discover linguistic
knowledge.
-
The Hindi/Urdu Treebank project:
The goal is to create a large-scale multi-layer,
multi-representational Treebank for Hindi and Urdu; the Treebank will
include dependency structure, phrase structure, and PropBank
annotation for 400K-word Hindi and 150K-word Urdu. This is a collaborative
project with University of Colorado, UMASS, Columbia University, and IIIT
in India.
- Bio-NLP
projects (e.g., SCOAP, deCIPHER): We are
working on several projects that focus on extracting medical information (e.g.,
critical results, recommendation, and phenotypes) from clinical data such
as radiology reports and ICU reports. We are collaborating with UW Medical
School and Microsoft.
- The AGGREGATION project: The goal of the project is to develop software tools to assist in the documentation of endangered languages by merging two types of resources: Collections of linguistic examples curated by linguists and a cross-linguistic computational grammar resource, called the Grammar Matrix. The result will be a system for creating machine-readable, or implemented, grammars from data collected and annotated by field linguists.
Last
modified on 6/17/2014