Fei Xia - Selected Publications (by research area)

Selected Publications (by research area)

1. Bridging NLP and Linguistics (The RiPLes project)

Developing ODIN:

William Lewis and Fei Xia, 2010. Developing ODIN: A Multilingual Repository of Annotated Language Data for Hundreds of the World's Languages, Journal of Literary and Linguistic Computing (LLC), 25(3):303-319. [pdf]
Fei Xia, Carrie Lewis and William D. Lewis, 2010. The Problems of Language Identification within Hugely Multilingual Data Sets, Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010), pages 2790-2797, Valletta, Malta, May 19-21, 2010. [pdf]
Fei Xia, William Lewis and Hoifung Poon, 2009. Language ID in the Context of Harvesting Language Data off the Web, Proceedings of the 12th Conference of the European Chapter of the ACL (EACL-2009), pages 870-878, Athens, Greece, March 30 - April 3, 2009. [pdf]
Fei Xia and William Lewis, 2008. Repurposing Theoretical Linguistic Data for Tool Development and Search, Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP-2008), pages 529-536, Hyderabad, India, Jan 7-12, 2008. [pdf]

Building language profiles and comparing languages:

Emily M. Bender, Michael Wayne Goodman, Joshua Crowgey, and Fei Xia, 2013. Towards Creating Precision Grammars from Interlinear Glossed Text: Inferring Large-scale Typological Properties, in Proceedings of the 7th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH 2013), in conjunction of ACL 2013, Sofia, Bulgaria. [pdf]
Ryan Georgi, Fei Xia, and Will Lewis, 2010. Comparing Language Similarity across Genetic and Typologically-Based Groupings, Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), pages 385-393, Beijing, China, August 23-27, 2010. [pdf]
Fei Xia and William Lewis, 2009. Applying NLP Technologies to the Collection and Enrichment of Language Data on the Web to Aid Linguistic Research, Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education (LaTeCH-SHELT\&R 2009), pages 51-59, Athens, Greece, 30 March 2009. [pdf]
William Lewis and Fei Xia, 2008. Automatically Identifying Computationally Relevant Typological Features, Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP-2008), pages 685-690, Hyderabad, India, Jan 7-12, 2008. [pdf]

Structural projection and improving parsing performance:

Ryan Georgi, Fei Xia, and William D. Lewis, 2015. "Enriching Interlinear Text using Automatically Constructed Annotators", in Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH-2015), in conjunction with ACL 2015, July 30, Beijing, China. [pdf]
Ryan Georgi, Fei Xia, and William D. Lewis. Capturing Divergence in Dependency Trees to Improve Syntactic Projection, Journal of Language Resources and Evaluation (LRE), 48(4), pp 709-739. [eprint]
Fei Xia, William Lewis, Michael Wayne Goodman, Joshua Crowgey and Emily M. Bender, 2014. Enriching ODIN, in Proceedings of LREC 2014, Reykjavik, Iceland. [pdf]
Xuezhe Ma and Fei Xia, 2014. Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization, in Proceedings of ACL-2014, Baltimore, MD. [pdf]
Ryan Georgi, Fei Xia, and William D. Lewis, 2013. Enhanced and Portable Dependency Projection Algorithms Using Interlinear Glossed Text, short paper, In Proceedings of ACL, Sofia, Bulgaria, Aug 2013. [pdf]
Ryan Georgi, Fei Xia, and William D. Lewis, 2012. Improving Dependency Parsing with Interlinear Glossed Text and Syntactic Projection, short paper, In Proceedings of COLING. Mumbai, India, Dec 2012. [pdf]
Ryan Georgi, Fei Xia, and William D. Lewis. 2012. Measuring the Divergence of Dependency Structures Cross-Linguistically to Improve Syntactic Projection Algorithms, In Proceedings of LREC, Istanbul, Turkey, May 22-25, 2012. [pdf]
Fei Xia and William Lewis, 2007. Multilingual Structural Projection across Interlinearized Text, Proceedings of NAACL HLT 2007, pages 452-459, Rochester, NY, April 22-27, 2007. [pdf]

Tools and packages:

Ryan Georgi, Michael Wayne Goodman, and Fei Xia, 2016. "A Web-framework for ODIN Annotation", in Proceedings of ACL-2016 System Demonstrations, pp 31-36, Aug 7-10, Berlin, Germany. [pdf]
Fei Xia, William D. Lewis, Michael W. Goodman, Glenn Slayden, Ryan Georgi, Joshua Crowgey, and Emily Bender, 2016. "Enriching a Massively Multilingual Database of Interlinear Glossed Text", Journal of Language Resources and Evaluation (LRE), 50(2): 321-349. [eprint]
Michael Wayne Goodman, Joshua Crowgey, Fei Xia, and Emily M. Bender, 2015. "Xigt: Extensible Interlinear Glossed Text for Natural Language Processing", Journal of Language Resources and Evaluation (LRE), 49(2), pp 455-485. [eprint]

2. Treebank development

Conversion from dependency structure to phrase structure:

Rajesh Bhatt, Owen Rambow, and Fei Xia, 2012. Creating a Tree Adjoining Grammar from a Multilayer Treebank, in Proceedings of the 11th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+11), pages 162-170, Paris, France, September 2012. [pdf]
Rajesh Bhatt and Fei Xia, 2012. Challenges in Converting between Treebanks: a Case Study from the HUTB, in Proceedings of META-RESEARCH Workshop on Advanced Treebanking, in conjunction with LREC-2012, Istanbul, Turkey. [pdf]
Rajesh Bhatt, Owen Rambow, and Fei Xia, 2011. Linguistic Phenomena, Analyses, and Representations: Understanding Conversion between Treebanks, In the Proc. of the IJCNLP, Chiang Mai,Thailand, Nov 9-13, 2011. [pdf]
Fei Xia, Owen Rambow, Rajesh Bhatt, Martha Palmer, and Dipti Misra Sharma, 2009. Towards a Multi-Representational Treebank," the 7th International Workshop on Treebanks and Linguistic Theories (TLT 2009), pages 159-170, Groningen, Netherlands, Jan 23-24, 2009. [pdf]
Fei Xia and Martha Palmer, 2001. Converting Dependency Structures to Phrase Structures, Proceedings of the 1st Human Language Technology Conference (HLT-2001), San Diego, Mar 18-21, 2001. [pdf]

The Hindi/Urdu Treebank Project:

Riyaz Ahmad Bhat, Rajesh Bhatt, Annahita Farudi, Prescott Klassen, Bhuvana Narasimhan, Martha Palmer, Owen Rambow, Dipti Misra Sharma, Ashwini Vaidya, Sri Ramagurumurthy Vishnu, and Fei Xia, 2014. The Hindi/Urdu Treebank Project, to appear in the Handbook of Linguistics Annotation (edited by Nancy Ide and James Pustejovsky), Springer Press.
Archna Bhatia, Rajesh Bhatt, Bhuvana Narasimhan, Martha Palmer, Owen Rambow, Dipti Misra Sharma, Michael Tepper, Ashwini Vaidya, Fei Xia, 2010. Empty Categories in a Hindi Treebank, Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010), pages 1863-1870, Valletta, Malta, May 19-21, 2010. [pdf]
Martha Palmer, Rajesh Bhatt, Bhuvana Narasimhan, Owen Rambow, Dipti Misra Sharma, and Fei Xia, 2009. Hindi Syntax: Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure, Proceedings of the 7th International Conference on Natural Language Processing (ICON-2009), pages 259-268, Hyderabad, India, Dec 14-17, 2009. [pdf]
Rajesh Bhatt, Bhuvana Narasimhan, Martha Palmer, Owen Rambow, Dipti Misra Sharma, and Fei Xia, 2009. A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu, Proceedings of the Third Linguistic Annotation Workshop (LAW 2009), ACL-IJCNLP 2009, pages 186-189, Singapore, 6-7 August 2009. [pdf]

The Chinese Penn Treebank Project:

Nianwen Xue, Fei Xia, Fu-dong Chiou, and Martha Palmer, 2005. The Penn Chinese Treebank: Phrase Structure Annotation of a Large Corpus, Journal of Natural Language Engineering, 11(2): 207-238, 2005. Cambridge University Press. [pdf]
Fei Xia, Martha Palmer, Nianwen Xue, Mary Ellen Okurowski, John Kovarik, Fu-Dong Chiou, Shizhe Huang, Tony Kroch, and Mitch Marcus, 2000. Developing Guidelines and Ensuring Consistency for Chinese Text Annotation, Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC-2000), Athens, Greece, May 31 - June 2, 2000.' [pdf]
Fei Xia, 2000. The Segmentation Guidelines for the Penn Chinese Treebank (3.0), IRCS Report 00-06, University of Pennsylvania, Oct 2000. [pdf]
Fei Xia, 2000. The Part-of-Speech Guidelines for the Penn Chinese Treebank (3.0), IRCS Report 00-07, University of Pennsylvania, Oct 2000. [pdf]
Nianwen Xue, Fei Xia, Shizhe Huang, and Anthony Kroch, 2000. The Bracketing Guidelines for the Penn Chinese Treebank (3.0), IRCS Report 00-08, University of Pennsylvania, Oct 2000. [pdf]

3. Bio-NLP

Phenotype detection:

Cosmin Adrian Bejan, Lucy Vanderwende, Fei Xia, and Meliha Yetisgen-Yildiz, 2013. Assertion modeling and its role in clinical phenotype identification, Journal of Biomedical Informatics, 46(1):68-74. [pdf]
Michael Tepper, Heather L. Evans, Fei Xia, Meliha Yetisgen-Yildiz. 2013. Modeling Annotator Rationales with Application to Pneumonia Classification, in Proceedings of the 2013 AAAI workshop on Expanding the Boundaries of Health Informatics Using Artificial Intelligence (HIAI 2013), July 15, Bellevue, WA. [pdf]
Cosmin Adrian Bejan, Fei Xia, Lucy Vanderwende, Mark M. Wurfel, and Meliha Yetisgen-Yildiz, 2012. Pneumonia identification using statistical feature selection, Journal of American Medical Informatics Association (JAMIA), 19(5): 817-823. [pdf]
Meliha Yetisgen-Yildiz, Bradford Glavan, Fei Xia, Lucy Vanderwende, and Mark Wurfel, 2011. Identifying Patients with Pneumonia from Free-Text Intensive Care Unit Reports. In Proc. of the ICML workshop on Learning from Unstructured Clinical Text, Bellevue, WA, July 2, 2011. [pdf]
Imre Solti, Colin Cooke, Fei Xia, and Mark Wurfel, 2009: Automated classification of radiology reports for acute lung injury: Comparison of keyword and machine learning based natural language processing approaches, Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine Workshop (BIBM-2009), pages 314-319, Washington DC, November 1-4, 2009. [pdf]

Detecting critical recommendations:

Meliha Yetisgen-Yildiz, Martin Gunn, Fei Xia, and Tom Payne, 2013. Text Processing Pipeline to Extract Recommendations from Radiology Reports, Journal of Biomedical Informatics (JBI), 46(2):354-362. [pdf]
Meliha Yetisgen-Yildiz, Martin Gunn, Fei Xia, and Tom Payne, 2011. Automatic Identification of Critical Follow-Up Recommendation Sentences in Radiology Reports. In Proc. of the AMIA 2011 Annual Symposium, Washington DC, Oct 22-26, 2011. [pdf]

Clinical corpus annotation:

Prescott Klassen, Fei Xia, and Meliha Yetisgen, 2016. "Annotating and Detecting Medical Events in Clinical Notes", in Proceedings of the 10th Language Resources and Evaluation Conference (LREC 2016), May 23-28, Portoroz, Slovenia. [pdf]
Prescott Klassen, Fei Xia, Lucy Vanderwende and Meliha Yetisgen, 2014. Annotating Clinical Events in Text Snippets for Phenotype Detection, in Proceedings of LREC 2014, Reykjavik, Iceland. [pdf]
Lucy Vanderwende, Fei Xia, and Meliha Yetisgen-Yildiz, 2013. Annotating Change of State for Clinical Events, in Proceedings of the 1st Workshop on Events: Definition, Detection, Coreference, and Representation, in conjunction with NAACL-2013, Atlanta, GA. [pdf]
Fei Xia and Meliha Yetisgen-Yildiz, 2012. Clinical corpus annotation: challenges and strategies, in Proceedings of the third Workshop on Building and Evaluating Resources for Biomedical Text Mining, in conjunction with LREC-2012, Istanbul, Turkey. [pdf]
Ozlem Uzuner, Imre Solti, Fei Xia, and Eithon Cadag, 2010. Community Annotation Experiment for Ground Truth Generation for the i2b2 Medication Challenge, Journal of the American Medical Informatics Association (JAMIA), 17:519-523. [pdf]
Meliha Yetisgen-Yildiz, Imre Solti, Fei Xia, and Scott Halgrim, 2010. Preliminary Experiments with Amazon's Mechanical Turk for Annotating Medical Named Entities, Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pages 180-183, Los Angeles, June 2010. [pdf]

Extracting medication information (the 2009 i2b2 challenge):

Scott Halgrim, Fei Xia, Imre Solti, Eithon Cadag, Ozlem Uzuner, 2011. A cascade of MaxEnt classifiers applied to extracting medication information from discharge summaries, Journal of Biomedical Semantics 2011, 2 (Suppl 3):S2. [pdf]
Fei Xia, Imre Solti, and Ozlem Uzuner, 2009. UW Internal Annotation Guidelines for the 2009 i2b2 Challenge and UW Medication IE System, Manuscript. [manuscript]
Ozlem Uzuner, Imre Solti, and Fei Xia, 2009. The i2b2 Medication Extraction Challenge Preliminary Annotation Guidelines, Manuscript. [manuscript]
Ozlem Uzuner, Imre Solti, and Fei Xia, 2009. The i2b2 Medication Extraction Challenge Evaluation Metrics, Manuscript. [manuscript]

Other Bio-NLP topics:

Louise Deleger, Katalin Molnar, Guergana Savova, Fei Xia, Todd Lingren, Qi Li, Keith Marsolo, Anil G. Jegga, Megan Kaiser, Laura Stoutenborough, and Imre Solti, 2013. Large Scale Evaluation of Automated Clinical Note De-identification and its Impact on Information Extraction. Journal of the American Medical Informatics Association (JAMIA), 20(1): 84-94. [pdf]
Michael Tepper, Daniel Capurro, Fei Xia, Lucy Vanderwende, and Meliha Yetisgen-Yildiz, 2012. Statistical Section Segmentation in Free-Text Clinical Records. In the Proceedings of the LREC, Istanbul, Turkey, May 22-25, 2012. [pdf]
Cuijun Wu, Fei Xia, Louise Deleqer, and Imre Solti, 2011. Statistical Machine Translation for Biomedical Text: Are We There Yet? In the Proc. of the AMIA 2011 Annual Symposium, Washington DC, Oct 22-26, 2011. [pdf]

4. Domain Adaptation

Yan Song and Fei Xia, 2014. Enhancing Archaic Language Processing via the Properties shared in Modern and Archaic Corpora: A Case Study on Chinese, in Proceedings of LREC 2014, Reykjavik, Iceland. [pdf]
Yan Song and Fei Xia, 2013. A Common Case of Jekyll and Hyde: the Synergistic Effect of Using Divided Source Training Data for Feature Augmentation, in Proceedings of IJCNLP, Oct 14-18. Nagoya, Japan. [pdf]
Xuezhe Ma and Fei Xia, 2013. Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data, short paper, In Proceedings of ACL, Sofia, Bulgaria, Aug 2013. [pdf]
Dong Wang and Fei Xia, 2012. Effort of Genre Variation and Prediction of System Performance, In Proceedings of LREC, Istanbul, Turkey, May 22-25, 2012. [pdf]
Yan Song and Fei Xia, 2012. Using a Goodness Measurement for Domain Adaptation: A Case Study on Chinese Word Segmentation, In Proceedings of LREC, Istanbul, Turkey, May 22-25, 2012. [pdf]

5. Chinese NLP

POS tagging:

Alex Cheng, Fei Xia, and Jianfeng Gao, 2010. A comparison of unsupervised methods for Part of Speech Tagging in Chinese, Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Poster Volume, pages 135-143, Beijing, China, August 23-27, 2010. [pdf]

The Chinese Penn Treebank Project (see the "Treebank Development" section)

Others:
- Kam Tang Lau, Yan Song, and Fei Xia, 2013. The Construction of a Segmented and Part-of-speech Tagged Archaic Chinese Corpus: A Case Study on Huainanzi (in Chinese), in Proceedings of the 12th China National Conference on Computational Linguistics (CNCCL 2013), Oct 10-12, Suzhou, China. [pdf]

6. Machine Translation

Statistical MT:

Fei Xia and Michael McCord, 2004. Improving a Statistical MT System with Automatically Learned Rewrite Patterns", the 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, Aug 22-29, 2004. [pdf]
Christoph Tillmann and Fei Xia, 2003. A Phrase-Based Unigram Model for Statistical Machine Translation, Proceedings of the 3rd Human Language Technology Conference (HLT/NAACL 2003), Edmonton, Canada, May 27 -- June 2, 2003. [pdf]

Transfer-based MT:

Hiyan Alshawi, Adam Buchsbaum, and Fei Xia, 1997. A Comparison of Head Transducers and Transfer for a Limited Domain Translation, Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL-1997), pages 360-365, Madrid, Spain, July 7-11, 1997. [pdf]

7. Tree Adjoining Grammar

Grammar Extraction (LexTract):

Fei Xia and Martha Palmer, 2010. From Treebank to Tree-Adjoining Grammar, In Supertagging: Using Complex Lexical Descriptions in Natural Language Processing, edited by Srinivas Bangalore and Aravind K. Joshi, pages 35-72, MIT Press, 2010. [pdf]
Fei Xia, Chung-hye Han, Martha Palmer and Aravind Joshi, 2001. Automatically Extracting and Comparing Lexicalized Grammars for Different Languages, Proceedings of the 17th International Joint conference on Artificial Intelligence (IJCAI-2001), pages 1321-1326, Seattle, Aug 4-10, 2001. [pdf]
Fei Xia, Martha Palmer, and Aravind Joshi, 2000. A Uniform Method of Grammar Extraction and Its Applications, Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pages 53-62, Hong Kong, Oct 7-8, 2000. [pdf]
Fei Xia and Martha Palmer, 2000. Evaluating the Coverage of LTAGs on Annotated Corpora, Proceedings of the Workshop on Using Evaluation within HLT Programs: Results and Trends, Athens, Greece, May 30, 2000. [pdf]
Fei Xia and Tonia Bleam, 2000. A Corpus-based Evaluation of Syntactic Locality in TAGs, Proceedings of the 5th International Workshop on Tree Adjoining Grammar and Related Formalisms (TAG+ 2000), pages 215-220, Paris, France, May 25-27, 2000. [pdf]
Fei Xia and Martha Palmer, 2000. Comparing and Integrating Tree Adjoining Grammars, Proceedings of the 5th International Workshop on Tree Adjoining Grammar and Related Formalisms (TAG+ 2000), pages 265-268, Paris, France, May 25-27, 2000. [pdf]
Fei Xia, 1999. Extracting Tree Adjoining Grammars from Bracketed Corpora, Proceedings of the 5th Natural Language Processing Pacific Rim Symposium (NLPRS-99), pages 398-403, Beijing, China, Nov. 1999. [pdf]

Grammar Generation (LexOrg):

Fei Xia, Martha Palmer, and Vijay Shanker, 2010. Developing Tree-Adjoining Grammars with Lexical Descriptions, in Supertagging: Using Complex Lexical Descriptions in Natural Language Processing, edited by Srinivas Bangalore and Aravind K. Joshi, pages 73-110, MIT Press, 2010. [pdf]
Fei Xia, Martha Palmer and K. Vijay-Shanker, 2005. Automatically Generating Tree Adjoining Grammars from Abstract Specifications, Journal of Computational Intelligence, 21(3), 246-287, 2005. [pdf]
Fei Xia, Martha Palmer, and K. Vijay-Shanker, 1999. Towards Semi-automating Grammar Development, Proceedings of the 5th Natural Language Processing Pacific Rim Symposium (NLPRS-99), pages 96-101, Beijing, China, Nov. 1999. [pdf]
Fei Xia, Martha Palmer, K. Vijay-Shanker and Joseph Rosenzweig, 1998. Consistent Grammar Development Using Partial-Tree Descriptions for LTAGs, Proceedings of the 4th International Workshop on Tree Adjoining Grammar and Related Formalisms (TAG+ 1998), page 180-183, Philadelphia, Aug 1-3, 1998. [pdf]

Other Topics on LTAG:

Anoop Sarkar, Fei Xia, and Aravind Joshi, 2000. Some Experiments on Indicators of Parsing Complexity for Lexicalized Grammars, In Proceedings of Efficiency in Large-Scale Parsing Systems Workshop, Luxembourg, Germany, Aug 5, 2000. [pdf]
Christy Doran, Beth Ann Hockey, Anoop Sarkar, B. Srinivas and Fei Xia, 2000. Evolution of the XTAG System, in Tree Adjoining Grammars: Formalisms, Linguistic Analysis and Processing, a CSLI volume edited by Anne Abeille and Owen Rambow, pages 371-404, 2000. [pdf]

8. Other topics:

Morphological induction:

Michael Tepper and Fei Xia, 2010. Inducing Morphemes Using Light Knowledge, Journal of ACM Transactions on Asian Language Information Processing (TALIP), 9(3): 1-38, 2010. [pdf]

Social media:

Kelly Peterson, Matt Hohensee, and Fei Xia, 2011. Email Formality in the Workplace: A Case Study on the Enron Corpus, In Proceedings of the 2011 ACL Workshop on Language in Social Media (LSM 2011), Portland, Oregon, June 23, 2011. [pdf] [package]

Teaching CL:

Emily Bender, Fei Xia, and Erik Bansleben, 2008. Building a flexible, collaborative, intensive master's program in computational linguistics, Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics (TeachCL-2008), pages 10-18, Columbus, Ohio, June 19-20, 2008. [pdf]

Selected Publications (by research area)

1. Bridging NLP and Linguistics (The RiPLes project)

2. Treebank development

3. Bio-NLP

4. Domain Adaptation

5. Chinese NLP

6. Machine Translation

7. Tree Adjoining Grammar

8. Other topics:

Last modified on 10/09/2016.