· Corpus of Spoken Greek [in preparation]
A corpus of over 40 hours of recordings of urban Modern
Greek. The transcriptions are in HIAT format and are time-aligned
with the sound files.
· Newswire Corpus of Modern Greek
20Mwords of written text from 2 high circulation
newspapers in Greece (Kathimerini, Ta NEA).
· Author identification corpora
Various corpora designed for author attribution,
verification and gender profiling.
· Corpus for teaching Modern Greek as a foreign language
This corpus contains texts organized in three different
thematic areas (Market, Health and Environment). The specific topics
are related directly to the teaching units of the curriculum
“Intermediate level for Modern Greek” of the Greek Language Centre.
· Learner Corpus of Modern Greek
The first learner corpus of Modern Greek. The corpus
contains 333 essays written by foreigners originated by 51 different
countries who learn Modern Greek in the School of the University of
Athens. Each essay has been transcribed in electronic form and each
error has been tagged using a custom error taxonomy developed
specifically for the needs of the project. Error tagging carried out
using special software (Episimiotis) which utilized XML for coding
errors and metalanguage data for each text.