The HNZ Corpus (version 1)
Some statistics of the corpus
- number of character tokens: 158,230
- number of word tokens: 137,448
- number of sentences: 7,594
- character vocabulary size: 3,957
- word vocabulary size: 10,847
- Average word length: 1.15 characters
- Average sentence length (in characters): 20.84
- Average sentence length (in words): 18.10
Last
modified on 03/19/2014