Overview
My research area is
computational linguistics, and my research covers a wide range of NLP tasks
including morphological analysis, part-of-speech tagging, grammar extraction
and grammar generation, treebank development, machine
translation, named-entity recognition, and information extraction. Selected
publications are [here].
Below are some of my current
projects, which are supported by grants from NSF, NIH, Microsoft, and IARPA:
- The
RiPLes project: The goal is to create a
framework that allows the rapid development of resources for resource-poor
languages and then use the automatically created resources to perform
cross-lingual study on a large number of languages to discover linguistic
knowledge.
- The
Hindi/Urdu Treebank project:
The goal is to create a large-scale multi-layer,
multi-representational Treebank for Hindi and Urdu; the Treebank will
include dependency structure, phrase structure, and PropBank
annotation for 400K-word Hindi and 150K-word Urdu. This is a collaborative
project with University of Colorado, UMASS, Columbia University, and IIIT
in India.
- Bio-NLP
projects (e.g., SCOAP, deCIPHER): We are
working on several projects that focus on extracting medical information (e.g.,
critical results, recommendation, and phenotypes) from clinical data such
as radiology reports and ICU reports. We are collaborating with UW Medical
School and Microsoft.
- The FUSE project: The goal is to build a system that
automatically discovers technical
emergence in scientific publications. This is a collaborative project with
Columbia University, University of Maryland, University of Michigan, and
Cambridge University.
Grants
- NSF:
- Workshop grant
IIS-1027289 (2010-2011, PI): Workshop
on NLP and Linguistics: Finding the Common Ground
- CAREER grant
BCS-0748919 (2008-2013, PI): Information Engineering and Synthesis for
Resource-poor Languages
- REU Supplement
IIS-0939733 (2009-2013, PI): Supplement to the NSF CAREER grant
- Grant CNS-0751213
(2008-2012, PI): A Multi-Representational and Multi-Layered Treebank for
Hindi/Urdu
- Grant BCS-0720670
(2007-2009, co-PI, PI: Scott Farrar): Implementing the GOLD Community of
Practice
- Grant CNS-0708719
(2007-2008, PI): General Techniques for Creating Treebanks
with Multiple Representation
o
FUSE program D11PC20153
(2011-2016, subcontractor): Discovering and Explaining Technical Emergence
through Analysis of the Language and Structure of Scientific Publications
- NIH:
- NIH/NLM K99/R00 Pathway to Independence Award
1K99LM010227-0110 (2009-2014,
Consultant, PI: Imre Solti): Increasing Clinical Trial Enrollment: A
Semi-Automated Patient Centered Approach
- Microsoft:
- Research Grant
(2011-2013, collaborator, PI: Meliha Yetisgen-Yildiz): Extracting Critical Illness
Phenotypes from Electronic Medical Records
- UW:
- RRF grant (2007-2009,
PI): Towards automatic enrichment and analysis of linguistic data for
threatened and endangered languages
- Student Technology Fee
(STF) grant (2006, faculty lead): Natural Language Computing Cluster
Last modified
on 9/18/2011.