George K. Mikros

Professor
Department of Italian Language and Literature
National and Kapodistrian University of Athens

Software developed for corpus linguistics and stylometric research

The software below has been developed exclusively for research purposes only. If you need any of these tools please contact me.

·         GrTokenizer

A tokenizer for Modern Greek based on regular expressions written in PERL Converts text in vertical format, one token per line. Directions can be found to the Readme file inside the Stylometrics.zip  [zip]

·        FileSplitter

A utility for segmenting a text file in n equal word files, where n is a user-selected value. [zip]

·         stTTR

A software which calculates the Standardized Type/Token ratio using equal samples of texts and thus avoiding the text-size dependence of the particular index. [zip]

·         Stylometrics

A program that calculates over 100 stylometric indices. This version works for Modern Greek Corpora and produces text tab-delimited results. The zip file contains both Tokenizer and Stylometrics with a Readme file with directions how to use the programs [zip]

·         Roman Stylometrics

      A program that calculates over 100 stylometric indices. This version works for Latin script Corpora (e.g. English, Italian etc.) and produces text tab-delimited results. The zip file contains both Tokenizer and Roman Stylometrics with a Readme file with directions how to use the programs [zip]

·         TermCount

A PERL script that count the relative frequency of a user-selected wordlist in a corpus. [zip]

·         VocabGrowth: A software that calculates the relative growth of the types’ frequency in a text. [zip]

·         Episimiotis

Software for error coding in learner’s texts combining custom error taxonomies and metalanguage data in xml output files. The software is documented in the following paper:

Koutsis, I., Markopoulos, G., & Mikros, G. K. (2007). Episimiotis: A multilingual tool for hierarchical annotation of texts. In D. Matthew, P. Rayson, S. Hunston & P. Danielsson (Eds.), Proceedings of the Corpus Linguistics Conference (CL2007), 27-30 July 2007,  Birmingham, UK. Retrieved from http://corpus.bham.ac.uk/corplingproceedings07/paper/243_Paper.pdf.

·         CorpusManager

A software suite for managing megacorpora and producing subcorpora using customized metalanguage criteria.The software is documented in the following paper:

Kouklakis, G., Mikros, G. K., Markopoulos, G., & Koutsis, I. (2007). Corpus Manager: A tool for multilingual corpus analysis. In D. Matthew, P. Rayson, S. Hunston & P. Danielsson (Eds.), Proceedings of the Corpus Linguistics Conference (CL2007), 27-30 July 2007,  Birmingham, UK. Retrieved from http://ucrel.lancs.ac.uk/publications/CL2007/paper/244_Paper.pdf.