Treebank Workshop

April 26, 2007

Rochester, NY

(1) 8:30-8:40: Introduction

* Martha Palmer, Fei Xia, Owen Rambow

Hypothesis: given a rich enough phrase structure/dependency structure, we can convert to the other type of representation automatically. Which representation to choose is an empirical issue.

Here is a list of features that might be good to be included in a Treebank.

(2) 8:40-8:55: The Penn Treebank: Lessons Learned/Current Methodology

* Ann Bies: summary

(3) 8:55-9:45: Treebanks as a Foundation for Semantics/Pragmatics (Martha)

5 mins each + 3 QA + 18 minutes discussion; total 50 minutes

Q1: What semantics is annotated?

Q2: What information was missing in the underlying treebank?

Q3: What information was there but represented badly?

Q4: What methodology is appropriate for semantic annotation? How should quality control be done?

Q5: What are the advantages of a phrase-structure and/or a dependency treebank for semantic annotation?

* PropBank/NomBank/OntoNotes: Martha Palmer

* FrameNet: Katrin Erk

* TR: Jan Hajic

* Discourse Treebank: Rashmi Prasad

(4) 9:45-10:15: BREAK

(5) 10:15-11:30: Grammar Formalisms and Transformations Between Formalisms (Fei)

7 mins each + 3 QA, + 25 minutes discussion; total 75 mins

Q1: What did they do for grammar extraction?

Q2: What info was missing in the source treebank w.r.t. grammar extraction? What had to be done by hand? Refer to list of features if possible.

Q3: What methodological lessons can be drawn? How is quality control done?

Q4: What are the advantages of a phrase-structure and/or a dependency treebank for grammatical extraction?

Q5: Pros and Cons for building a treebank for grammars in a particular formalism vs. building a general purpose treebank and extracting grammars from the treebank?

* CCG: Julia Hockenmaier

* LFG: Josef van Genabith

* HPSG: Dan Flickinger

* LTAG: Fei Xia

* Dependency: Fei Xia/Owen Rambow

(6) 11:30-12:30: Treebanks as Training data for Parsers (Owen)

7 mins each + 3 QA, + 30 minutes discussion; total 60 mins

Q1: What do you really care about when you're building a parser?

Q2: What works, what doesn't?

Q3: What info (e.g., function tags, empty categories, coindexation) is useful, what is not? Do you prefer a more refined tagset for parsing? Will marking adjunct/argument distinction be useful? What about subcats?

Q4: How does grammar writing interact with treebanking?

Q5: What methodological lessons can be drawn for treebanking? How is quality control done?

Q6: What are advantages and disadvantages of pre-processing the data to be treebanked with an automatic parser?

Q7: What are the advantages of a phrase-structure and/or a dependency treebank for parsing?

* Chris Manning

* Jan Hajic

* Joakim Nivre

(7) 12:30-13:30: LUNCH

(8) 13:30-15:00: Language-Specific Issues (Owen)

5 mins each + 5 QA, + 20 minutes discussion; total 90 mins

Pick one or two interesting constructions, discuss representational problem, and show possible dependency vs phrase structure representations

Q1: What are the advantages of a phrase-structure and/or a dependency

treebank for this particular language?

* German: Sandra Kuebler

* Chinese: Bert Xue

* Hindi: Dipti Sharma

* Persian: Pollet Samvelian

* Czech: Jan Hajic

* Turkish: Kemal Oflazer

* Korean: Chung-Hye Han

(9) 15:00-15:30: BREAK

(10) 15:30-16:30: General Discussion

Q1: Is the hypothesis correct? Do you have counter-examples to the hypothesis? What do you see as major obstacles for the PSó DS conversion?

Q2: What information should be included in a treebank?

Q3: What are the advantages of PS and/or DS for a particular task (parsing, grammar extraction, semantics) or for a particular language?

Q4: What is the relation between grammars and treebanks?

Q5: Which language should the next treebank effort be aimed at?