Treebank Workshop

April 26, 2007

Rochester, NY




(1) 8:30-8:40: Introduction


* Martha Palmer, Fei Xia, Owen Rambow


Hypothesis: given a rich enough phrase structure/dependency structure, we can convert to the other type of representation automatically.  Which representation to choose is an empirical issue. 


Here is a list of features that might be good to be included in a Treebank.



(2) 8:40-8:55: The Penn Treebank: Lessons Learned/Current Methodology


* Ann Bies: summary



(3) 8:55-9:45: Treebanks as a Foundation for Semantics/Pragmatics (Martha)


5 mins each + 3 QA + 18 minutes discussion; total 50 minutes


Q1: What semantics is annotated?

Q2: What information was missing in the underlying treebank?

Q3: What information was there but represented badly?

Q4: What methodology is appropriate for semantic annotation?  How should quality control be done?

Q5: What are the advantages of a phrase-structure and/or a dependency treebank for semantic annotation?


* PropBank/NomBank/OntoNotes: Martha Palmer

* FrameNet: Katrin Erk

* TR: Jan Hajic

* Discourse Treebank: Rashmi Prasad



(4) 9:45-10:15: BREAK


(5) 10:15-11:30: Grammar Formalisms and Transformations Between Formalisms (Fei)


7 mins each + 3 QA, + 25 minutes discussion; total 75 mins

Q1: What did they do for grammar extraction?

Q2: What info was missing in the source treebank w.r.t. grammar extraction?  What had to be done by hand?  Refer to list of features if possible.

Q3: What methodological lessons can be drawn?  How is quality control done?

Q4: What are the advantages of a phrase-structure and/or a dependency treebank for grammatical extraction?

Q5: Pros and Cons for building a treebank for grammars in a particular formalism vs. building a general purpose treebank and extracting grammars from the treebank?



* CCG: Julia Hockenmaier

* LFG: Josef van Genabith

* HPSG: Dan Flickinger

* LTAG: Fei Xia

* Dependency: Fei Xia/Owen Rambow





(6) 11:30-12:30: Treebanks as Training data for Parsers (Owen)


7 mins each + 3 QA, + 30 minutes discussion; total 60 mins


Q1: What do you really care about when you're building a parser?

Q2: What works, what doesn't?

Q3: What info (e.g., function tags, empty categories, coindexation) is useful, what is not?  Do you prefer a more refined tagset for parsing? Will marking adjunct/argument distinction be useful?  What about subcats?

Q4: How does grammar writing interact with treebanking?

Q5: What methodological lessons can be drawn for treebanking?  How is quality control done?

Q6: What are advantages and disadvantages of pre-processing the data to be treebanked with an automatic parser? 

Q7: What are the advantages of a phrase-structure and/or a dependency treebank for parsing?


* Chris Manning

* Jan Hajic

* Joakim Nivre



(7) 12:30-13:30: LUNCH


(8) 13:30-15:00: Language-Specific Issues (Owen)


5 mins each + 5 QA, + 20 minutes discussion; total 90 mins


Pick one or two interesting constructions, discuss representational problem, and show possible dependency vs phrase structure representations


Q1: What are the advantages of a phrase-structure and/or a dependency

treebank for this particular language?


* German: Sandra Kuebler

* Chinese: Bert Xue

* Hindi: Dipti Sharma

* Persian: Pollet Samvelian

* Czech: Jan Hajic

* Turkish: Kemal Oflazer

* Korean: Chung-Hye Han


(9) 15:00-15:30: BREAK


(10) 15:30-16:30: General Discussion


Q1: Is the hypothesis correct? Do you have counter-examples to the hypothesis? What do you see as major obstacles for the PS DS conversion?

Q2: What information should be included in a treebank?

Q3: What are the advantages of PS and/or DS for a particular task (parsing, grammar extraction, semantics) or for a particular language?

Q4: What is the relation between grammars and treebanks?

Q5: Which language should the next treebank effort be aimed at?