Project Topics
Morphology in factored phrase-based SMT - Krista Liin
- moprhology tool: Filosoft or EKI
- SMT tool: Moses
- corpus: Combined parallel corpus
TnT Tagger PoS-tagging, comparison with TreeTagger - Kadri Kajaste
Unsupervised morphology for Estonian - Ilja Livenson
- training corpus - mixed corpus of Estonian
- testing corpus - morphologically disambiguated corpus
- tool: morfessor
- compare to: Filosoft or EKI morphological analysis software
Unsupervised dependency/constituency syntax
- corpus:
- testing corpus: "ratsep", 370 parsed sentences
- tool: U-DOP/CCM/DMV
Some data-driven method applied to some corpus
- Raw Corpora
- Morphologically Annotated Corpora
- Syntactically Annotated Corpora
- Semantically Annotated Corpora
- Parallel Corpora
- UT Parallel Corpus
- JRC-Acquis: multi-language legislation text parallel corpus
- OPUS: mixed multi-language parallel corpus
- Europarl: multi-language parallel corpus