Treebank Workshop
April 26, 2007
(1) 8:30-8:40: Introduction
* Martha Palmer, Fei Xia, Owen Rambow
Hypothesis: given a rich enough phrase structure/dependency structure, we can convert to the other type of representation automatically. Which representation to choose is an empirical issue.
Here is a list of features that might be good to be included in a Treebank.
(2) 8:40-8:55: The Penn Treebank: Lessons Learned/Current Methodology
(3) 8:55-9:45: Treebanks as a Foundation for Semantics/Pragmatics (Martha)
5 mins each + 3 QA + 18 minutes discussion; total 50 minutes
Q1: What semantics is annotated?
Q2: What information was missing in the underlying treebank?
Q3: What information was there but represented badly?
Q4: What methodology is appropriate for semantic annotation? How should quality control be done?
Q5: What are the advantages of a phrase-structure and/or a dependency treebank for semantic annotation?
* PropBank/NomBank/OntoNotes: Martha Palmer
* Discourse Treebank: Rashmi Prasad
(4) 9:45-10:15: BREAK
(5) 10:15-11:30: Grammar Formalisms and Transformations Between Formalisms (Fei)
7 mins each + 3 QA, + 25 minutes discussion; total 75 mins
Q1: What did they do for grammar extraction?
Q2: What info was missing in the source treebank w.r.t. grammar extraction? What had to be done by hand? Refer to list of features if possible.
Q3: What methodological lessons can be drawn? How is quality control done?
Q4: What are the advantages of a phrase-structure and/or a dependency treebank for grammatical extraction?
Q5: Pros and Cons for building a treebank for grammars in a particular formalism vs. building a general purpose treebank and extracting grammars from the treebank?
* Dependency: Fei Xia/Owen Rambow
(6) 11:30-12:30: Treebanks as Training data for Parsers (Owen)
7 mins each + 3 QA, + 30 minutes discussion; total 60 mins
Q1: What do you really care about when you're building a parser?
Q2: What works, what doesn't?
Q3: What info (e.g., function tags, empty categories, coindexation) is useful, what is not? Do you prefer a more refined tagset for parsing? Will marking adjunct/argument distinction be useful? What about subcats?
Q4: How does grammar writing interact with treebanking?
Q5: What methodological lessons can be drawn for treebanking? How is quality control done?
Q6: What are advantages and disadvantages of pre-processing the data to be treebanked with an automatic parser?
Q7: What are the advantages of a phrase-structure and/or a dependency treebank for parsing?
(7) 12:30-13:30: LUNCH
(8) 13:30-15:00: Language-Specific Issues (Owen)
5 mins each + 5 QA, + 20 minutes discussion; total 90 mins
Pick one or two interesting constructions, discuss representational problem, and show possible dependency vs phrase structure representations
Q1: What are the advantages of a phrase-structure and/or a dependency
treebank for this particular language?
(9) 15:00-15:30: BREAK
(10) 15:30-16:30: General Discussion
Q1: Is the hypothesis correct? Do you have counter-examples to the hypothesis? What do you see as major obstacles for the PSó DS conversion?
Q2: What information should be included in a treebank?
Q3: What are the advantages of PS and/or DS for a particular task (parsing, grammar extraction, semantics) or for a particular language?
Q4: What is the relation between grammars and treebanks?
Q5: Which language should the next treebank effort be aimed at?