www.mari-language.com - Corpus infrastructure

Corpus infrastructure

Тиде материал нерген кызыт англичан йылме дене веле лудаш лиеш. Марий ден руш версий ямдылалтеш да вашке савыкталтшаш.

The Mari corpus project was initiated by scholars from Ghent (Alexandra Simonenko), Helsinki (Jack Rueter), Moscow (Anna Volkova), Munich/Vienna (Jeremy Bradley), Tromsø (Trond Trosterud), Turku (Jorma Luutonen), and Yoshkar-Ola (Andrey Chemyshev).

Mari corpus team

It represents an effort to create a morphologically annotated corpus of literary Mari (both Meadow Mari and Hill Mari) searchable in myriad ways (by lexeme, by morphological pattern, by syntactic pattern). It will contain several dozen million words of Meadow Mari, and several million words of Hill Mari (exact figures to follow), and contain texts from the early 20th century till today.

Mari corpus team

Working demos of our efforts can be found here (Tromsø) and here (Vienna); the first proper release (including tutorials on using our corpus infrastructure) is planned later in 2019.

Work meetings:

Participating and supporting institutions:

Пытартыш уэмдымаш: 2019-ше ийын сорла тылзын 4-ше кечыштыже