In order to ensure compliance with the licenses for the various
corpora we have installed, we have instituted the following policies.
Title | LDC Catalogue number | Restricted access | Language(s) | License link |
Callhome Egyptian Arabic Transcripts Supplement | LDC2002T38 | | Arabic | general |
Arabic Gigaword | LDC2003T12 | | Arabic | general |
Arabic Treebank: Part 2 v 2.0 | LDC2004T02 | | Arabic | general |
Arabic Treebank: Part 3 v 1.0 | LDC2004T11 | | Arabic | general |
Arabic News Translation Text Part 1 | LDC2004T17 | | Arabic | general |
Arabic Treebank: Part 1 v 3.0 (POS with full vocal.+ syntactic analysis | LDC2005T02 | | Arabic | general |
CALLHOME Egyptian Arabic Transcripts | LDC97T19 | | Arabic | general |
TIDES Extraction (ACE) 2003 Multilingual Training Data | LDC2004T09 | | Arabic, Chinese, English | general |
ACE 2004 Multilingual Training Corpus | LDC2005T09 | | Arabic, Chinese, English | general |
Arabic Treebank Part 1 --- 10K-word English translation | LDC2003T07 | | Arabic, English | general |
Multiple-Translation Arabic (MTA) Part 1 | LDC2003T18 | | Arabic, English | general |
Arabic English Parallel News Part 1 | LDC2004T18 | | Arabic, English | general |
Arabic English Parallel News Part 1 | LDC2004T18 | | Arabic, English | general |
Multiple-Translation Arabic (MTA) Part 2 | LDC2005T05 | | Arabic, English | general |
TREC Mandarin | LDC2000T52 | | Chinese | specific |
Chinese Gigaword | LDC2003T09 | | Chinese | general |
Chinese Treebank 5.0 | LDC2005T01 | | Chinese | general |
Mandarin Chinese News Text | LDC95T13 | | Chinese | specific |
CALLHOME Mandarin Chinese Transcripts | LDC96T16 | | Chinese | general |
TDT2 Multilanguage Text Version 4.0 | LDC2001T57 | | "Chinese | English" | general |
TDT3 Multilanguage Text Version 2.0 | LDC2001T58 | | Chinese, English | general |
Chinese-English Translation Lexicon (v3.0) | LDC2002L27 | | Chinese, English | general |
Multiple-Translation Chinese Corpus | LDC2002T01 | | Chinese, English | general |
SummBank 1.0 | LDC2003T16 | | Chinese, English | general |
Multiple-Translation Chinese (MTC) Part 2 | LDC2003T17 | | Chinese, English | general |
Multiple-Translation Chinese (MTC) Part 3 | LDC2004T07 | | Chinese, English | general |
Hong Kong Parallel Text | LDC2004T08 | | Chinese, English | specific |
Chinese News Translation Text Part 1 | LDC2005T06 | | Chinese, English | general |
Chinese-English News Magazine Parallel Text | LDC2005T10 | | Chinese, English | general |
Czech Broadcast News Transcripts | LDC2004T01 | | Czech | general |
Prague Dependency Treebank 1.0 | LDC2001T10 | | Czech, English | general |
Prague Czech-English Dependency Treebank Version 1.0 | LDC2004T25 | | Czech, English | general |
Grassfields Bantu Fieldwork: Dschang Lexicon | LDC2003L01 | | Dschang | general |
Grassfields Bantu Fieldwork: Dschang Tone Paradigms | LDC2003S02 | | Dschang | general |
CELEX 2 | LDC96L14 | | Dutch, German, English | specific |
Santa Barbara Corpus of Spoken American English Part-I | LDC2000S85 | | English | general |
BLLIP 1987-89 WSJ Corpus Release 1 | LDC2000T43 | | English | specific |
MUC 7 | LDC2001T02 | | English | general |
Temporal Evaluation Examples | LDC2002E05 | | English | general |
RST Discourse Treebank | LDC2002T07 | | English | general |
The AQUAINT Corpus of English News Text | LDC2002T31 | | English | general |
Santa Barbara Corpus of Spoken American English Part-II | LDC2003S06 | | English | general |
ACE-2 Version 1.0 | LDC2003T11 | | English | general |
MUC 6 | LDC2003T13 | | English | general |
SLX Corpus of Classic Sociolinguistic Interviews | LDC2003T15 | | English | general |
ANC First Release | LDC2003T20 | Restricted access | English | specific |
Santa Barbara Corpus of Spoken American English III | LDC2004S10 | | English | general |
Proposition Bank I | LDC2004T14 | | English | general |
ACE Time Normalization (TERN) 2004 English Training Data v1.0 | LDC2005T07 | | English | general |
English Gigaword Second Edition | LDC2005T12 | | English | general |
CCGbank | LDC2005T13 | | English | general |
HCRC Map Task Corpus | LDC93S12 | | English | general |
ACL/DCI | LDC93T1 | | English | specific |
North American News Text Corpus | LDC95T21 | Restricted access | English | specific |
English Treebank 2 | LDC95T7 | | English | general |
COMLEX Syntax Text Corpus Version 2.0 | LDC96T11 | Restricted access | English | specific |
DSO Corpus of Sense-Tagged English | LDC97T12 | | English | general |
CALLHOME American English Transcripts | LDC97T14 | | English | general |
COMLEX English syntax Lexicon | LDC98L21 | Restricted access | English | specific |
North American News Text Supplement | LDC98T30 | Restricted access | English | specific |
Treebank-3 | LDC99T42 | | English | general |
Hansard French/English | LDC95T20 | | French, English | general |
European Language Newspaper Text | LDC95T11 | Restricted access | French, German, Portuguese | specific |
UN Parallel Text (Complete) | LDC94T4A | | French, Spanish, English | specific |
CALLHOME German Transcripts | LDC97T15 | | German | general |
Japanese Business News Text | LDC95T8 | Restricted access | Japanese | specific |
CALLHOME Japanese Transcripts | LDC96T18 | | Japanese | general |
Japanese Business News Text Supplement | LDC99T34 | Restricted Access | Japanese | specific |
Korean Newswire | LDC2000T45 | | Korean | general |
Korean Telephone Conversations Transcripts | LDC2003T08 | | Korean | general |
Klex: Finite-State Lexical Transducer for Korean | LDC2004L01 | | Korean | general |
Morphologically Annotated Korean Text | LDC2004T03 | | Korean | general |
Korean English Treebank Annotations | LDC2002T26 | | Korean, English | specific |
ECI Multilingual Text | LDC94T5 | | Multi | specific |
Grassfields Bantu Fieldwork: Ngomba Tone Paradigms | LDC2001S16 | | Ngomba | general |
Cetempublico | LDC2001T62 | | Portuguese | specific |
Portuguese Newswire Text | LDC99T40 | | Portuguese | general |
CALLHOME Spanish Dialogue Act Annotation | LDC2001T61 | | Spanish | general |
Spanish Newswire Text, Volume 2 | LDC99T41 | | Spanish | general
|