Research
Overview
My research area is computational linguistics. My previous work includes machine translation, grammar extraction, grammar generation, and methodologies for creating large annotated corpora. One of my current research activities is to create a framework that allows the rapid development of resources for resource-poor languages and then use the automatically created resources to perform cross-lingual study on a large number of languages to discover linguistic knowledge. I am also interested in using NLP techniques to help people retrieve and automatically analyze various kinds of information such as linguistic data, electronic medical records, and scientific abstracts. In addition, I am collaborating with my colleagues in creating a large-scale Hindi/Urdu multi-layer, multi-representational Treebank. My research is supported by several NSF grants and a UW RRF grant.
Current projects
Grants
- NSF CAREER grant BCS-0748919 (2008-2013): Information Engineering and Synthesis for Resource-poor Languages
- NSF CNS-0751213 (2008-2011): A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu
- NSF BCS-0720670 (2007-2009): Implementing the GOLD Community of Practice
- NSF CNS-0708719 (2007-2008): General Techniques for Creating Treebanks with Multiple Representation
- UW RRF grant (2007-2009): Towards automatic enrichment and analysis of linguistic data for threatened and endangered languages