Research
Overview
My research area is
computational linguistics, and my research covers a wide range of NLP tasks
including morphological analysis, part-of-speech tagging, grammar extraction
and grammar generation, treebank development, machine
translation, named-entity recognition, and information extraction. Selected
publications are
[here].
Below are some of my current
projects, which are supported by grants from NSF, Microsoft, and UW:
- The
RiPLes project: The goal is to create a
framework that allows the rapid development of resources for resource-poor
languages and then use the automatically created resources to perform
cross-lingual study on a large number of languages to discover linguistic
knowledge.
-
The Hindi/Urdu Treebank project:
The goal is to create a large-scale multi-layer,
multi-representational Treebank for Hindi and Urdu; the Treebank will
include dependency structure, phrase structure, and PropBank
annotation for 400K-word Hindi and 150K-word Urdu. This is a collaborative
project with University of Colorado, UMASS, Columbia University, and IIIT
in India.
- Bio-NLP
projects (e.g., SCOAP, deCIPHER): We are
working on several projects that focus on extracting medical information (e.g.,
critical results, recommendation, and phenotypes) from clinical data such
as radiology reports and ICU reports. We are collaborating with UW Medical
School and Microsoft.
- The AGGREGATION project: The goal of the project is to develop software tools to assist in the documentation of endangered languages by merging two types of resources: Collections of linguistic examples curated by linguists and a cross-linguistic computational grammar resource, called the Grammar Matrix. The result will be a system for creating machine-readable, or implemented, grammars from data collected and annotated by field linguists.
Grants
- NSF:
- Grant BCS-1160274 (2012-2015, co-PI): Automatic Generation of Grammars for Endangered Languages from Glosses and Typological Information
- Workshop grant IIS-1027289 (2010-2011, PI): Workshop
on NLP and Linguistics: Finding the Common Ground
- CAREER grant BCS-0748919 (2008-2013, PI): Information Engineering and Synthesis for Resource-poor Languages
- REU Supplement IIS-0939733 (2009-2013, PI): Supplement to the NSF CAREER grant
- Grant CNS-0751213 (2008-2012, PI): A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu
- Grant BCS-0720670 (2007-2009, co-PI, PI: Scott Farrar): Implementing the GOLD Community of Practice
- Grant CNS-0708719 (2007-2008, PI): General Techniques for Creating Treebanks with Multiple Representation
- IARPA:
- FUSE program D11PC20153 (2011-2013, subcontractor): Discovering and Explaining Technical Emergence through Analysis of the Language and Structure of Scientific Publications
- NIH:
- NIH/NLM K99/R00 Pathway to Independence Award 1K99LM010227-0110 (2009-2014, Consultant, PI: Imre Solti): Increasing Clinical Trial Enrollment: A Semi-Automated Patient Centered Approach
- Microsoft:
- Research Grant (2011-2013, collaborator, PI: Meliha Yetisgen-Yildiz): Extracting Critical Illness Phenotypes from Electronic Medical Records
- UW:
- RRF grant (2012-2013, co-PI, PI: Meliha Yetisgen-Yildiz):
Annotating and Detecting Medical Events in Clinical Data
- RRF grant (2007-2009, PI): Towards automatic enrichment and analysis of linguistic data for threatened and endangered languages
- Student Technology Fee (STF) grant (2006, faculty lead): Natural Language Computing Cluster
Last
modified on 01/23/2013