Back to labs/answer key page

This search pattern finds long-distance dependencies that cross three S nodes, along with some garbage.

tgrep '__ < /^WH/ < (S !<< 0 !<< /^WH/ <<
      (S !<< 0 !<< /^WH/ << (S !<< /^WH/ !<< 0 << T)))' | more    

The heart of the pattern is this:

tgrep '__ < /^WH/ < (S << (S << (S << T)))' | more    

That is, we're looking for some node (__) that immediately dominates both a WH phrase and an S. That S, in turn, must dominate an S that dominates and S that dominates a T (trace). To make sure that the lowest trace doesn't belong with some other WH phrase, we insist that the Ss mentioned do not donimnate any WH phrases (!<< /^WH/).

Insisting that the Ss mentioned also don't dominate and 0s (!<< 0) rules out examples like this one:

(SBAR (WHNP (WP who))
      (S (NP (-NONE- T))
         (VP (VBZ believes)
             (PP (IN in)
                 (S (NP (-NONE- *))
                    (VP (VBG dividing)
                        (NP (NP (NN everything))
                            (SBAR (-NONE- 0)
                                  (S (NP (PRP he))
                                     (VP (VBZ does))
                                     (NEG (RB not))
                                     (VP (JJ own)
                                         (NP (-NONE- T))))))))))))

(That is, examples where the trace is related to a 'zero' relative pronoun.)

Finally, we might also want to rule out intervening relative thats, which the TREEBANK folks parse as follows:

(SBAR (WHNP (WP what))
      (S (NP (NNS outlanders))
         (VP (VBP call)
             (NP (-NONE- T))
             (NP (NP (DT the)
                     (NNP New)
                     (NNP York)
                     (NN mind))
                 (, ,)
                 (NP (NP (DT a)
                         (NN state))
                     (SBAR (IN that)
                           (S (NP (DT the)
                                  (NN subject))
                              (VP (VBZ is)
                                  (ADVP (RB necessarily))
                                  (ADJP (JJ unable)
                                        (S (NP (-NONE- *))
                                           (AUX (TO to))
                                           (VP (VB perceive)
                                               (NP (-NONE- T))
                                               (PP (IN in)
                                                   (NP (PRP himself))))))))))))))

Excluding these thats requires adding another !<< clause to the search pattern:

tgrep '__ < /^WH/ < (S !<< 0 !<< /^WH/ !<< (IN < that) <<
      (S !<< 0 !<< /^WH/ !<< (IN < that) <<
      (S !<< /^WH/ !<< 0 !<< (IN < that) << T)))' | more    

NB: These queries will miss some examples of what they're supposed to find, because they rely on the encoding of traces, which is inconsistent in the TREEBANK. There are also still many false positives due to extra S nodes from coordination and adjunction.

Back to labs/answer key page

-----

Emily M. Bender
Last modified: Fri Dec 8 12:00:00 2000