George K. Mikros

Department of Italian Language and Literature
National and Kapodistrian University of Athens

Corpora developed for various research activities

·         Corpus of Spoken Greek [in preparation]

A corpus of over 40 hours of recordings of urban Modern Greek. The transcriptions are in HIAT format and are time-aligned with the sound files. 

·         Newswire Corpus of Modern Greek

20Mwords of written text from 2 high circulation newspapers in Greece (Kathimerini, Ta NEA).

·         Author identification corpora

Various corpora designed for author attribution, verification and gender profiling.

·         Corpus for teaching Modern Greek as a foreign language

This corpus contains texts organized in three different thematic areas (Market, Health and Environment). The specific topics are related directly to the teaching units of the curriculum “Intermediate level for Modern Greek” of the Greek Language Centre.

·         Learner Corpus of Modern Greek

The first learner corpus of Modern Greek. The corpus contains 333 essays written by foreigners originated by 51 different countries who learn Modern Greek in the School of the University of Athens. Each essay has been transcribed in electronic form and each error has been tagged using a custom error taxonomy developed specifically for the needs of the project. Error tagging carried out using special software (Episimiotis) which utilized XML for coding errors and metalanguage data for each text.