The Middle Russian corpus of the Russian National Corpus (RNC) includes texts written in northeastern Rus’ and later in the Muscovite state from the 15th to the 17th centuries (in some cases, also earlier and later texts). These include chronicles and tales, official documents, personal correspondence, monuments of religious literature, dramatic and poetic texts, and more. Specifically, the corpus incorporates relevant volumes from publications such as The Library of Old Russian Literature, The Complete Collection of Russian Chronicles, The Russian Historical Library, The Archive of Feudal Landownership, Acts of Land Surveying, Acts of the Muscovite State, Acts of Socio-Economic History, The Russian Diplomatic Corpus, as well as individual editions of letter and act collections.
The corpus reflects the orthography of the editions (including those prepared for literary, historical, and legal studies), which in many cases simplified the original spelling.
This period is characterized by a transitional state of the language, combining various grammatical and lexical layers. Many texts incorporate features of the earlier Old East Slavic period (11th–14th centuries) as well as lexical and grammatical elements of the Church Slavonic language. The period is marked by dialectal diversity in texts and the instability of orthographic norms.
The texts in the corpus are annotated with word-by-word grammatical markup and lemmatization, aligned with the standards of the Dictionary of the Russian Language of the 11th–17th Centuries. The annotation was performed using neural network mechanisms based on a manually annotated standard and was subsequently subjected to additional manual correction.
The metadata includes information about the publication, language type, text genre, time of creation, and the production of the specific manuscript copy.