www.mari-language.com - Corpus infrastructure

Corpus infrastructure

The Mari corpus project was initiated by scholars from Ghent (Alexandra Simonenko), Helsinki (Jack Rueter), Moscow (Anna Volkova), Munich/Vienna (Jeremy Bradley), Tromsø (Trond Trosterud), Turku (Jorma Luutonen), and Yoshkar-Ola (Andrey Chemyshev).

Mari corpus team

It represents an effort to create a morphologically annotated corpus of literary Mari (both Meadow Mari and Hill Mari) searchable in myriad ways (by lexeme, by morphological pattern, by syntactic pattern). It will contain several dozen million words of Meadow Mari, and several million words of Hill Mari (exact figures to follow), and contain texts from the early 20th century till today.

Mari corpus team

Working demos of our efforts can be found here (Tromsø) and here (Vienna); the first proper release (including tutorials on using our corpus infrastructure) is planned later in 2019.

Work meetings:

Participating and supporting institutions:

Last update: 7 October 2020