LING571
Hw5 key
o Head-head prob: P(w1 | A, w2)
o Lexical-rule prob: P(Aèβ | A, w)
o Head-head prob: O(|T|2*|N|)
o Lexical-rule
prob: O(|R|*|T|): Note the number is not O(|R|*|N|*|T|), because
once a rule is chosen, its lhs is fixed.
o Data structure:
I. HeadProb[w1][w2][A]: store Cnt(w1 | A, w2), and later P(w1 | A, w2)
II. HeadCnt[w2][A]: store the count of the occurrence of rules “X(w2)è…. A ….”.
III. RuleProb[r][w]: store Cnt(r|A, w), and later P(r|A,w), r is a rule whose lhs is A.
IV. RuleCnt[A][w]: store the count of the occurrence of rules “A(w)è….”
o Algorithm:
I. Read in head percolation table
II. Initialize all the data structures
III. For each parse tree in the Treebank
%%%%% Find the heads at all the levels
i. Add a new node called “Top” as the parent of the original root node.
ii. Top-down: Use head percolation table to find the head child of each parent node.
iii. Bottom-up: pass the head info from a head-child to its parent.
%%%%% collect counts: Method 1
iv. Let w be the head of the whole sentence
(1) HeadProb[w][-][Top]++
(2) HeadCnt[-][Top]++
v. For each syntactic rule “A[w]èB1[w1] …. Bn[wn]”:
(1) For each i, HeadProb[w_i][w][B_i]++
(2) For each i, HeadCnt[w][B_i]++;
(3) RuleProb[AèB1…Bn][w]++
(4) RuleCnt[A][w]++
vi. For each lexicon rule “Aèw”
(1) RuleProb[Aèw][w]++
(2) RuleCnt[A][w]++
%%%%% collect counts: Method 2
vii. For each non-terminal node B_i, suppose its head is w_i, and it is expanded into β. B_i’s parent is A and A’s head is w (if B_i is TOP, w is “-“):
(1) HeadProb[w_i][w][B_i]++
(2) HeadCnt[w][B_i]++
(3) RuleProb[B_iè…][w_i]++
(4) RuleCnt[B_i][w_i]++
IV. Foreach w1, w2, A, HeadProb[w1][w2][A]/=Cnt[w2][A]
V. Foreach rule r and w, RuleProb[r][w]/=Cnt[lhs[r]][w]
VI. Output HeadProb[w1][w2][A] and RuleProb[r][w].
((S (NP (N John))
(VP (V asked)
(NP (N Mary))
(NP (Det a) (N question)))))
((S (NP (N Mary))
(VP (V asked)
(NP (N John))
(PP (P for)
(NP (Det a) (N favor))))))
Syntactic rules from the two sentences: (count=13, or c=13 for short. c=1 unless specified o.w.)
o TOP (asked) è S (asked) (c=2)
o S (asked) è NP (John) VP (asked)
o NP (John) è N (John) (c=2)
o VP (asked) èV (asked) NP (Mary) NP (question)
o NP (Mary) è N (Mary) (c=2)
o NP (question) è Det (a) N (question)
o S (asked) è NP (Mary) VP (asked)
o VP (asked) è V (asked) NP (John) PP (for)
o PP (for) è P (for) NP (favor)
o NP (favor) è Det (a) N (favor)
Lexicon from the two sentences: (c=11)
o N è John (c=2)
o V èasked (c=2)
o NèMary (c=2)
o Det èa (c=2)
o Pèfor
o Nèquestion
o Nèfavor
Head-head probabilities: (c=24)
o P(John | NP, asked) = 2/5 (c=2)
o P(Mary | NP, asked) = 2/5 (c=2)
o P(question | NP, asked) = 1/5
o P(asked | Top, -) =1 (c=2)
o P(asked | S, asked) =1 (c=2)
o P(asked | VP, asked) = 1 (c=2)
o P(for | PP, asked) = 1
o P(favor | NP, for) = 1
o P(Mary | N, Mary) = 1 (c=2)
o P(John | N, John) = 1 (c=2)
o P(question | N, question) = 1
o P(favor | N, favor) = 1
o P(asked | V, asked) = 1 (c=2)
o P(for | P, for) = 1
o P(a | Det, question) = 1
o P(a | Det, favor) = 1
Head-rule probabilities: (c=24)
o P(TopèS | Top, asked) = 1 (c=2)
o P(SèNP VP | S, asked) = 1 (c=2)
o P(VPèV NP NP | VP, asked) = ½
o P(VPèV NP PP | VP, asked) = ½
o P(NPèN | NP, John) = 1 (c=2)
o P(NPèN | NP, Mary) = 1 (c=2)
o P(NPèDet N | NP, question) = 1
o P(NPèDet N | NP, favor) = 1
o P(PPèP NP | PP, for) = 1
o P(Vèasked | V, asked)=1 (c=2)
o P(NèJohn | N, John)=1 (c=2)
o P(NèMary | N, Mary)=1 (c=2)
o P(Nèquestion | N, question)=1
o P(Nèfavor | N, favor) = 1
o P(Pèfor | P, for)=1
o P(Detèa | Det, a) =1 (c=2)
A (w) è B1(w1) …. B5(w5), and B3 is the head child of A.
A (w) è B1(w1) X1(w3)
X1(w3)èB2 (w2) Y1(w3)
Y1(w3)èY2(w3) B5(w5)
Y2(w3)èB3(w3) B4(w4)
o P(VPèV NP NP | VP, asked) = ½
Change to
P(VPèX1 NP | VP, asked)=1/2
P(X1èV NP | X1, asked)=1
o P(VPèV NP PP | VP, asked) = ½
Change to
P(VPèX2 PP | VP, asked)=1/2
P(X2èV NP | X2, asked)=1
o Add P(asked | X1, asked)=1
o Add P(asked | X2, asked)=1
Initialization: For every word position i,
If Aèw_i
Then π[i][i][A]=P(Aèw_i | A, w_i)
Recursive part:
Given a rule A (w0)èB (w1) C (w2),
prob=P(w1|B,w0)*P(w2|C,w0)*P(AèBC|A,w)
π[begin][end][A]=π[begin][m][B]*π[m+1][end][C]* prob
Given a rule A(w)èB(w)
Prob=P(AèB|A,w) * P(w | B,w)
If A is “Top”
Then Prob*=P(w |Top, -)
π[begin][end][A]=π[begin][m][A]*prob
o Draw a chart
o List the probability equation associated with each cell in the chart when using lexicalized PCFG.
I.
P[1][1][NP]=P(
II. P[2][2][V]=P(Vèasked|V, asked)=1.0
III.
P[3][3][NP]=P(
IV. P[4][4][Det]=P(Detèa|Det, a)=1.0
V. P[5][5][N]=P(Nèquestion|N, question)=1.0
VI. P[2][3][X1]=P[2][2][V] * P[3][3][NP] * P(X1èV NP|X1,asked] * P(John|NP, asked)*P(asked|V, asked)=2/5
VII. P[4][5][NP]=P[4][4][Det] * P[5][5][N] * P(NPèDet N|NP, question) * P(a|Det,question)*P(question|N, question)=1.0
VIII. P[2][5][VP]=P[2][3][X1] * P[4][5][NP] * P(VPèX1 NP|VP, asked) * P(asked|X1,asked)*P(question|NP, asked)=2/5*1/2*1/5=1/25
IX. P[1][5][S]=P[1][1][NP] * P[2][5][VP] * P(SèNP VP|S, asked) * P(asked|VP, asked)*P(Mary|NP, asked)=1/25*2/5=2/125
X. P[1][5][Top]=P[1][5][S] * P[TopèS | Top, asked) * P(asked | S, asked * P(asked|Top, -1)=2/125
o Calculate the probability of P(T,S).
P(T,S)=P[1][5][Top]=2/125