LING571

Team Project 1 (Hw5)

Due 11/3/05

 

 

      1. Project overview

                                                                                                                                                                                 

The task:

·        Form a team. Each team should have 3-4 members. Create a team name. Make plans for team projects (e.g., meeting time, communication methods, team captain, programming language, and job division)

·        Understand the first lexicalized model we discussed on 10/20/05 (see Slides #29-#33 in 10_18.ppt).

·        Design (on paper) a parsing algorithm that implements the model.

·        Write a report to describe the model and your parsing algorithm.

 

   2. The report

·        The report should be a plain text file, a WORD file, or a pdf file.

·        Each team hands in only one report. 

·        When describing the training and parsing algorithms, please use pseudo code. The pseudo code should be clear enough that they can be understood and implemented by others without problems. (One example is the CYK algorithm in the J&S book).

·        The report should include the following sections:

-         Team info:  (5 pts)

o       The name of the team

o       Team members

 

-         Model:  (10 pts)

o       The types of model parameters: e.g., head-head probability.

o       The numbers of possible values for each type of parameter:  Let |N| be the number of non-terminals, |T| be the number of terminals (i.e., words), and |R| be the number of un-lexicalized syntactic rules. Represent the numbers with O(x) notation, e.g, O(|N|2).

                       

-         Training:  (40 pts)

o       Sketch out an algorithm that calculates the parameters using a treebank; that is, explain how you estimate the values of the parameters given a Treebank. (20 pts)

 

o       Suppose a Treebank includes only two parse trees below, list all the non-zero parameters and their values that you get from the Treebank. (20 pts)

 

((S  (NP (N John))

       (VP (V asked)

              (NP (N Mary))

              (NP (Det a) (N question)))))

 

((S (NP (N Mary))

      (VP (V asked)

             (NP (N John))

             (PP (P for)

                    (NP (Det a) (N favor))))))

 

-         Parsing: (45 pts)

o       (10 pts) Describe a method that converts a lexicalized PCFG into CNF (Def 3, in which unit rules are allowed). For instance, what rules should be used to replace a lexicalized rule:  

       A(w) è B1(w1) …. B5(w5),  and B3 is the head child.

 

                           Hint: the position of the head child is important.

 

o       (5 pts) List the parameters that are different in the two grammars, and calculate their values. (e.g., parameter p is replaced with parameters p1 and p2).

 

o       (15 pts) Augment the CYK algorithm to handle the lexicalized model.

 

o       (15 pts) For the sentence “Mary asked John a question”: 

§         Draw a chart with the modified CYK algorithm.

§         List the probability equation associated with each cell in the chart when using lexicalized PCFG.

§         Calculate the probability of P(T,S).

 

   3. Grade:

       The homework is worth 100 pts. ESubmit one report per team.