Dr. Syntax his Guide to
Online Resources for Studying
English Syntax, Words and Usage


Several interactive and downloadable frontends , part of speech (POS) taggers, and syntactic tree drawers and displayers are now available online.

A. Morphosyntactic taggers

These are programs that tag words in sentences with a grammatical category or Part of Speech. They differ according to the set of tags they use:

  1. CLAWS tag set
  2. ENGCG (Constraint Grammar) tag sets
  3. Penn Treebank tags

and the type of program ("AI") used:

  1. Rule-Based
  2. Probability-Based
    (Bi- and tri-gram, Markov)
  3. Memory-Based

Here is a quick overview.

A. 1 CLAWS tag set(s)

The University Centre for Computer Corpus Research in Language at the University of Lancaster developed the CLAWS (Constituent Likelihood Automatic Word-tagging System) tagger program with several levels of delicacy. You can submit a paragraph of up to 300 words to the tagger and it will return a tagged version fairly quickly. You can choose coarser or finer tag set, using CLAWS 5 (60 parts of speech—used for bulk of BNC) or CLAWS 7 (over 160 parts—used for the BNC sampler). The on-line Guidelines to Wordclass Tagging is very useful, especially in criteria for hard cases.


A. 2 English Constraint Grammar tag set(s)

Lingsoft logoENGCG with Documentation. 100 words max paste in. Limited uses per day. Gives two options if decisive information is lacking.

The VISL project tags with Constraint Grammar tags, along with tense, case, and number information and grammatical function in the sentence. "Flat Structure" actually returns a dependency parse. Here is a Table of Tags. For VISL parsing, see below.

ConnexorConnexor's Machinese Phrase Tagger also uses a Constraint Grammar tagger. Its tags are spelled out as words, but the full strings of symbols can be found in the Machinese Syntax parser-grapher.

A. 3. PennTree tags

The complete, detailed PennTree Guide to Part of Speech Tagging is here (31 pages).

small pic of Jakub ZavrelMBT: Memory Based Tagging demo (sometimes down). This produces horizontal format POS tagging only, using the Penntree tag set. Its work can be gleaned from the Memory Based Shallow Parser (MBSP) on line, where you get much more information as well (chunking, lemmas, some grammatical relations).

TreeTagger produces vertical POS format tagging only with Penntree tag set. (Tagger is trainable HMM-type. Works for other languages too. Available for Linux and a demo version for Windows.) Version 3.1 is quite impressive and is an entry in the Great PennTree Tagger Contest (below). There is an excellent gui interface demo on line at Nottingham and one with fewer bells and whistles at the University of Pisa. Downloaded version also has chunker. Binaries available for Linux and Windows, also a GUI for Windows.


FreeLing has been developed by the TALP Research Center at the Polytechnic University of Catelona. It includes a tagger with on-line (limited) demo and is downloable for Linux/Unix.

LingPipe Uses large Brown tag set (82 lex tags plus punc) and is trained on Brown corpus. Two other demos are trained on bio-medical corpora. [Currently down.]

SVMTool is a recent tagger using Support Vector Mahines that claims very good accuracy. It is trained on WSJ corpus.


SS is fast: "This part-of-speech (POS) tagger offers fast tagging (2400 tokens/sec) with a state-of-the-art accuracy (97.10% on the WSJ corpus). The tagger uses an extension of Maximum Entropy Markov Models (MEMM), in which tags are determined in the easiest-first manner." No on-line demo; but it can be chosen as the tagger in Antelope↓.

The Stanford NLP Group has put up java-based maximum entropy POS tagger that can tag large amounts of text. It tags each word of continuous text with a PennTree POS. Special feature: it has a much slower bidirectional mode as well as "left three words" mode of operation. Bidirectional scored very well on the Tagger Contest. No demo, but a cross-platform Java program. Package also has a chunker and a good parser ↓. Can also be selected in Antelope ↓ suite for MS Windows.

OpenNLP Tools This set of java-based tools provides tagger, chunker, and parser ↓. They test out very well. No demos.

Great PennTree Tagger Contest: Results of the second heat: Here are slightly edited taggings of Lincoln's Gettysburg Address by SVM Tool, OpenNLP, SS, TreeTagger, Stanford Tagger, and FreeLing tagger. (See Scoring Protocol and Parallel Results}

B. Syntactic Parsers and Tree Diagrammers

Here again there are major differences in the kinds of grammars:

  1. dependency grammar
  2. generative (Chomskyan grammars)
  3. form and function grammars

B. 1 dependency grammars

Connexor was founded by some of the Helsinki group and offers for sale parsing tools for several languages. For English, it uses an improved version of ENGCG (ENGCG2) POS tagging with a nifty little java applet for a dependency tree display of grammatical relations. Overview of dependency model. There is a Technical Monograph on line that explains how the dependency grammar uses ENGCE2 parts of speech. Key to the grammatical relations annotating the edges, Dependency functions. At least one function has been added in Connexor's Machinese Syntax.

The Stanford Parser can compute and report a dependency equivalent of its constituent-structure-based parses. The Antelope interface to the Stanford parser does draw trees and indicate dependency relations very reliably (Windows only at present).

The Cognitive Computing Group at UIUC has a dependency tree parser and grapher, but it does not label the edges (i.e. the relational links).

DGA—the dependency Grammar Annotator— is a little Java-baed tool for drawing dependency trees with labels. The online demo offers one set of labels and relations; to customize the list, you have to download the DGA and change the Configuration file.

RASP is the continuing work of Ted Briscoe and others at Cambridge (and Sussex and Sydney). It is a complete package Tagger-Parser and gives several choices for outputs including a list of grammatical dependency relations. It runs on Unix, esp. Linux and must be downloaded. Decently documented. It uses a categorial grammar.

CandC Tools is also a downloadable package for Windows, Linux and Mac. It too will produce analysis in terms of grammatical relations (in RASP set of relations).

Antelope logoProxem Antelope is a package of taggers, chunkers, parsers, and graphers that can draw trees that are both PennTree constituent style and marked for grammatical relations (using the Stanford parser). Integrates much lexical info (WordNet, VerbNet, etc.). It is written in C# for Windows and is free with a harmless registration. Ram hungry, but a nice piece of work.

B. 2 generative grammars.

The Penn Treebank is a large corpus of articles from the Wall Street Journal that have been tagged with Penn Treebank tags and then parsed into properly bracketed trees according to a simple set of phrase structure rules conforming to Chomsky's Government and Binding syntax. See Building a large annotated corpus..., especially the latter part. For more extensive description, see Annotating Predicate Argument Structure The full (318 page) manual for PennTreebank II markup is available as a Latex or Postscript


Aurélian MAX presents a tree drawing java applet with a default mini English grammar but with the capacity to build your own.



A descendant of SSC is the shareware Trees 2/3 (Sean Crist and Tony Kroch), which also has a downloadable demo version. Even the demo is useful with the little grammars provided as parts of syntax exercises at Penn.


InterArbora  The Language Technology Group at Edinburgh also offers an online tree drawer (Thistle) which takes a labelled proper bracketing of a sentence and draws a classy tree, even giving you a postscript version of the tree if you want it. [Not sure this is still working.]

The Linguist's Search Engine Not only tags with PennTree tags, but draws a tree (one sentence at a time of 20 words or less).  Then you can edit the tree to shear  away leaves until you get a syntactic configuration you are interested in.  And THEN, you can search a 3 million word archive drawn from the internet for instances of the use of that construction.   Thanks Aaron Elkiss and Philip Resnik, University of Maryland Institute for Advanced Computer Studies.

Along with their POS tagger, Stanford Natural Language Processing Group (Dan Klein, Christopher Manning, et al.) have put up a java-based parser which also produces PennTree structures and a dependency representation based on them. It runs on your local platform (if you have Java installed) and parses txt files of multiple sentences of up to forty words (twice the length of the LSE engine, though LSE too can do more if it is loading the text from its corpus) and draws a tree. You can copy the on-line bracketing and paste it into LSE to let LSE draw the tree on line. Or, if you change the parentheses to square brackets, you can paste it into phpSyntaxTree and you will get a nice colored svg or png graph.

The similar OpenNLP suite of programs also provides a parser. The same manouvers for graphic display apply.

B. 3 Form and function grammars

VISL has extensive tools for tagging, parsing, and graphing, and not just for English. It produces graphs like those on the right which have double labelling of each node as POS or "g" (group, or phrase) and as core function (S, P, Od, etc.) or as H(ead) or Dep(endent). (a node with a torn edge is one half of a discontinuous one—as P is here.)

University College London logoThe venerable Survey of English Usage at University College London weighs in with its contribution to the International Corpus of English, namely the International Corpus of English, or at least the British part of it. ICE-GB is a 100 million word corpus of contemporary English written and spoken in Britain, some of which can be downloaded for free and accessed with the free ICECUP tool. This corpus is not only marked up for part of speech; each part is also assigned a syntactic function following the Quirk et al. scheme of SVOA etc. So it displays its texts in trees (oriented side-, top-, or bottom-up as you please) with dual labelling of each node (see sample). This links up very well with the Oxford Grammar of English, which is based on the ICE-GB corpus for British English and a Wall Street Journal corpus for American English. In fact, if you have the ICE-GB corpus installed, you can check the diagram for any sentence in the Oxford English Grammar.

Internet Grammar of EnglishIn addition, the Survey of English Usage offers an online tutorial in English syntax of the double-layered kind used in ICE. It has self-correcting check-off quizzes and animations of syntactic movements.


Gene Moutoux Old Time Religion Gene Moutoux of Eastern High School in Louisville, KY has put up extensive tutorial examples of sentences diagrammed according to Reed-Kellogg principles (1877 et seq.) For more than a century, this was sentence diagramming in America.

C. Pronouns and antecedents (anaphora resolution)

MARS is an on-line implementation of a Mitkof knowledge-poor method for resolving pronoun coreference. Paste in a text and it will return an analysis of the possible antecedents of each pronoun in the text and its own reasons for picking the best one in each case. Ironically, it gets a sentence in the immediately preceding paragraph wrong (A table is printed under each pronoun, listing all candidates considered as its potential antecedents. ) Documentation is good. MARS has been developed and tested on technical prose (computer documentation).


This site is a companion to Phonetic Resources and Lexical, Semantic, Textual Resources

George L. Dillon
University of Washington
4 November 1999
Revised 10 March 2000
Again, 16 April 2000 Again, 12 February 2001 Again, 18 November 2001, January 2003, 2004, March 2005, May 2007