LING571

Hw5 key

 

  1. Model:  (10 pts)
    1. The types of model parameters:  head-head prob

o       Head-head prob:    P(w1 | A, w2)

o       Lexical-rule prob:   P(Aèβ | A, w)

    1. The numbers of possible values for each type of parameter:  Let |N| be the number of non-terminals, |T| be the number of terminals (i.e., words), and |R| be the number of un-lexicalized syntactic rules.

o       Head-head prob:   O(|T|2*|N|)

o       Lexical-rule prob:   O(|R|*|T|):   Note the number is not O(|R|*|N|*|T|), because once a rule is chosen, its lhs is fixed.

                     

  1. Training:  (40 pts)
    1. Describe an algorithm (Algorithm #1) that calculates the parameters using a treebank; that is, explain how you estimate the values of the parameters given a Treebank.

o       Data structure:

                                                                                                        I.       HeadProb[w1][w2][A]: store Cnt(w1 | A, w2),  and later P(w1 | A, w2)

                                                                                                     II.      HeadCnt[w2][A]: store the count of  the occurrence of rules “X(w2)è…. A ….”.

                                                                                                   III.      RuleProb[r][w]: store Cnt(r|A, w), and later P(r|A,w), r is a rule whose lhs is A.

                                                                                                  IV.      RuleCnt[A][w]: store the count of the occurrence of rules “A(w)è….”

 

o       Algorithm:

                                                                                                        I.      Read in head percolation table

                                                                                                     II.      Initialize all the data structures

                                                                                                   III.      For each parse tree in the Treebank

             %%%%% Find the heads at all the levels

                                                                                                                                       i.      Add a new node called “Top” as the parent of the original root node.

                                                                                                                                     ii.      Top-down: Use head percolation table to find the head child of each parent node.

                                                                                                                                    iii.      Bottom-up: pass the head info from a head-child to its parent.

 

%%%%% collect counts: Method 1

                                                                                                                                   iv.      Let w be the head of the whole sentence

(1)   HeadProb[w][-][Top]++

(2)   HeadCnt[-][Top]++

                                                                                                                                     v.      For each syntactic rule “A[w]èB1[w1] …. Bn[wn]”:

(1)   For each i, HeadProb[w_i][w][B_i]++

(2)   For each i, HeadCnt[w][B_i]++;

(3)   RuleProb[AèB1…Bn][w]++

(4)   RuleCnt[A][w]++

                                                                                                                                   vi.      For each lexicon rule “Aèw”

(1)   RuleProb[Aèw][w]++

(2)   RuleCnt[A][w]++

 

                                                                 %%%%% collect counts: Method 2

                                                                                                                                  vii.      For each non-terminal node B_i, suppose its head is w_i, and it is expanded into β. B_i’s parent is A and A’s head is w (if B_i is TOP, w is “-“):

(1)   HeadProb[w_i][w][B_i]++

(2)   HeadCnt[w][B_i]++

(3)   RuleProb[B_iè…][w_i]++

(4)   RuleCnt[B_i][w_i]++

                                                                

                                                                       

                                                                                                  IV.      Foreach w1, w2, A, HeadProb[w1][w2][A]/=Cnt[w2][A]

                                                                                                     V.      Foreach rule r and w, RuleProb[r][w]/=Cnt[lhs[r]][w]

                                                                                                  VI.      Output HeadProb[w1][w2][A] and RuleProb[r][w].

 

 

    1. Suppose a Treebank includes only two parse trees below, list parameters and their values that you get from the Treebank.

 

((S  (NP (N John))

       (VP (V asked)

              (NP (N Mary))

              (NP (Det a) (N question)))))

 

((S (NP (N Mary))

      (VP (V asked)

             (NP (N John))

             (PP (P for)

                    (NP (Det a) (N favor))))))

 

 

Syntactic rules from the two sentences:   (count=13, or c=13 for short. c=1 unless specified o.w.)

o       TOP (asked) è S (asked)                         (c=2)

o       S (asked) è NP (John) VP (asked)           

o       NP (John) è N (John)                               (c=2)

o       VP (asked) èV (asked) NP (Mary) NP (question) 

o       NP (Mary) è N (Mary)                            (c=2)

o       NP (question) è Det (a) N (question)

o       S (asked) è NP (Mary) VP (asked)

o       VP (asked) è V (asked) NP (John) PP (for)

o       PP (for) è P (for) NP (favor)

o       NP (favor) è Det (a) N (favor)

 

            Lexicon from the two sentences: (c=11)

o       N è John     (c=2)

o       V èasked    (c=2)

o       NèMary      (c=2)

o       Det èa         (c=2)

o       Pèfor           

o       Nèquestion 

o       Nèfavor      

 

 

Head-head probabilities: (c=24)

o       P(John | NP, asked) = 2/5              (c=2)

o       P(Mary | NP, asked) = 2/5             (c=2)

o       P(question | NP, asked) = 1/5    

o       P(asked | Top, -) =1                       (c=2)      

o       P(asked | S, asked) =1                   (c=2)

o       P(asked | VP, asked) = 1               (c=2)

o       P(for | PP, asked) = 1

o       P(favor | NP, for) = 1

 

o       P(Mary | N, Mary)  = 1                  (c=2)

o       P(John | N, John) = 1                     (c=2)

o       P(question | N, question) = 1     

o       P(favor | N, favor) = 1

 

o       P(asked | V, asked) = 1                  (c=2)

o       P(for | P, for) = 1

 

o       P(a | Det, question) = 1

o       P(a | Det, favor) = 1

                       

 

Head-rule probabilities:  (c=24)

o       P(TopèS | Top, asked) = 1                      (c=2)

o       P(SèNP VP | S, asked) = 1                     (c=2)

o       P(VPèV NP NP | VP, asked) = ½

o       P(VPèV NP PP | VP, asked) = ½

o       P(NPèN | NP, John) = 1                          (c=2)

o       P(NPèN | NP, Mary) = 1                        (c=2)

o       P(NPèDet N | NP, question) = 1

o       P(NPèDet N | NP, favor) = 1

o       P(PPèP NP | PP, for) = 1

 

o       P(Vèasked | V, asked)=1                         (c=2)

o       P(NèJohn | N, John)=1                            (c=2)

o       P(NèMary | N, Mary)=1                          (c=2)

o       P(Nèquestion | N, question)=1

o       P(Nèfavor | N, favor) = 1

o       P(Pèfor | P, for)=1

o       P(Detèa | Det, a) =1                                 (c=2)

 

 

  1. Parsing: (45 pts)
    1. (10 pts) Describe your algorithm for converting a lexicalized PCFG into CNF (Def 3, in which unit rules are allowed). For instance, what rules should be used to replace a lexicalized rule:

         A (w) è B1(w1) …. B5(w5),  and B3 is the head child of A.

  

         A (w) è B1(w1) X1(w3)

         X1(w3)èB2 (w2) Y1(w3)

         Y1(w3)èY2(w3) B5(w5)

         Y2(w3)èB3(w3) B4(w4)

 

    1. (5 pts) List the parameters that are in this new grammar but not in the original grammar, and calculate their values.

o       P(VPèV NP NP | VP, asked) = ½

Change to

P(VPèX1 NP | VP, asked)=1/2

P(X1èV NP | X1, asked)=1

 

o       P(VPèV NP PP | VP, asked) = ½

Change to

P(VPèX2 PP | VP, asked)=1/2

P(X2èV NP | X2, asked)=1

 

o       Add P(asked | X1, asked)=1

o       Add P(asked | X2, asked)=1

 

    1. (15 pts) Describe how the CYK algorithm could be augmented to handle the lexicalized model.

 

Initialization: For every word position i, 

                     If Aèw_i 

                     Then π[i][i][A]=P(Aèw_i | A, w_i)

 

Recursive part:

    Given a rule A (w0)èB (w1) C (w2),

           prob=P(w1|B,w0)*P(w2|C,w0)*P(AèBC|A,w)

           π[begin][end][A]=π[begin][m][B]*π[m+1][end][C]* prob

 

   Given a rule A(w)èB(w)

           Prob=P(AèB|A,w) * P(w | B,w)

           If A is “Top”

              Then Prob*=P(w |Top, -)

           π[begin][end][A]=π[begin][m][A]*prob

                 

 

    1. (15 pts) Show how your parser creates a parse tree for the sentence “Mary asked John a question”:

o       Draw a chart

o       List the probability equation associated with each cell in the chart when using lexicalized PCFG.

                                                                                                        I.      P[1][1][NP]=P(NèMary|N,Mary)*P(NPèN|NP, Mary)*P(Mary|N,Mary) = 1.0

 

                                                                                                     II.      P[2][2][V]=P(Vèasked|V, asked)=1.0

 

                                                                                                   III.      P[3][3][NP]=P(NèJohn|N,John)*P(NPèN|NP,John)*P(John|N, John)=1.0

 

                                                                                                  IV.      P[4][4][Det]=P(Detèa|Det, a)=1.0

 

                                                                                                     V.      P[5][5][N]=P(Nèquestion|N, question)=1.0

 

                                                                                                  VI.      P[2][3][X1]=P[2][2][V] * P[3][3][NP] * P(X1èV NP|X1,asked] * P(John|NP, asked)*P(asked|V, asked)=2/5

 

                                                                                                VII.      P[4][5][NP]=P[4][4][Det] * P[5][5][N] * P(NPèDet N|NP, question) * P(a|Det,question)*P(question|N, question)=1.0

 

                                                                                             VIII.      P[2][5][VP]=P[2][3][X1] * P[4][5][NP] * P(VPèX1 NP|VP, asked) * P(asked|X1,asked)*P(question|NP, asked)=2/5*1/2*1/5=1/25

 

                                                                                                  IX.      P[1][5][S]=P[1][1][NP] * P[2][5][VP] * P(SèNP VP|S, asked) * P(asked|VP, asked)*P(Mary|NP, asked)=1/25*2/5=2/125

 

X.                                                                                                                  P[1][5][Top]=P[1][5][S] * P[TopèS | Top, asked) * P(asked | S, asked * P(asked|Top, -1)=2/125

 

 

o       Calculate the probability of P(T,S).

P(T,S)=P[1][5][Top]=2/125