The HNZ Corpus


The HNZ corpus is an Archaic Chinese corpus consisting of all the articles in the book of Huainanzi with word segmentation and POS tagging annotation. Huainanzi, also known as Huainan Honglie, is a collective work written by Prince Huainan, Liu An (179 BC-122 BC), and a group of his retainers in the Western Han Dynasty (206 BC-9AD). Huainanzi was first circulated in the Western Han Dynasty, which is near the end of the Archaic Chinese era.

The book has 21 chapters, covering a wide range of topics on philosophy, astrology, geography, politics, customs, military affairs, mountains, sociology, etc. It has been described as the ``Encyclopedia of the early Han Dynasty''. Its abundant language capacity reveals characteristics of lexical usage in the Western Han Dynasty, and demonstrates how the usage had been transformed from the Qin Dynasty to the Han Dynasty. In this regard, Huainanzi contains valuable data for an in-depth analysis of Archaic Chinese. Because of these nice properties, we selected the book as the raw data for our Archaic Chinese corpus. All the manual annotation and correction was done by a Chinese linguist who is an expert on Archaic Chinese.


If you use the corpus in your study, please cite both papers below:


The corpus is stored in one gzipped text file. The file size is about 280KB.


Last modified on 5/29/2014