The Russian educational corpus was opened in 2007. It is primarily intended for school lessons of the Russian language and literature. It can also be used in university teaching (at non-linguistic programs, e.g. in media and communications) as well as for teaching Russian as a foreign language and for training language teachers.
The main part of the corpus is made up of works included in the literature curriculum for secondary and high schools, including those recommended for extracurricular reading. Non-fiction texts included in the corpus belong to the functional styles that are studied in the course of the Russian language (journalistic, official, educational, academic and colloquial styles). The genres of these texts are diverse, reflecting the requirements of the school curriculum: news pieces, articles, interviews, reports, laws, protocols, statements, business letters, academic and popular articles, reviews, annotations, abstracts, everyday correspondence, everyday speech, etc.
All the texts are morphologically annotated, grammatical features are ascribed to all words, thus providing, in standard school terms, a “grammatical analysis” of each word. Morphological markup has been performed automatically in the texts of the Educational corpus using a special program, while grammatical homonyms have been disambiguated. For each token, the entire possible set of features is provided. Among them, a single analysis, determined by the program as the most preferred, is highlighted as the principal one. Thanks to the automatic disambiguation, it is possible to differentiate grammatical homonyms: word forms with different sets of grammatical features yield different parsings. A small part of the texts of the Educational corpus are manually disambiguated.
The morphological annotation in the educational corpus is adapted to the standard Russian language manuals and provides the traditional, simplified grammatical analysis. In addition, in order to comply with the purposes of school teaching, additional morphological features were introduced into the annotation scheme: inflectional types of nouns and verbs (declension, conjugation) and lexical and grammatical categories of nouns, adjectives, pronouns, adverbs.
The Educational corpus features morphemic analysis of lexemes, accessible within the Word at a Glance tool. This analysis is imformed by Alexander Tikhonov's Morphemic-orthological dictionary. Functional words and proper names are not analyzed
The current state of the Educational corpus is related to its latest functionality. First of all, these are new types of search results (Graph by year, Statistics, Frequency, N-grams), new types of search query (Collocation Search), as well as a new analytical tool – Word at a Glance which includes Word Sketches and Similar Words widgets. Also, Frequency Dictionary, Corpus and Subcorpus portraits are available, allowing users to analyze and compare the features of the whole corpus and customized subcorpus.
The Educational corpus greatly facilitates the preparation of test assignments for various sections of the Russian language course, makes the process of teaching the Russian language more diverse and modern, provides material for small-scale studies that can be carried out by school students, writing essays, etc. The wide representation of school program texts and the new research tools open up prospects for using the Educational corpus when studying a particular author's language and style, and conducting comparative studies of expressive and visual means in fiction, poetry, etc.