Composition of the corpus

 

The Old East Slavic corpus includes original and translated texts in the Old East Slavic language, the common ancestor of Ukrainian, Belarusian, and Russian. The timespan is commonly assumed as before 1400, although Novgorod and Pskov acts included into the corpus may belong to the 15th century (the Middle Russian period).

The corpus comprises the following original Old East Slavic texts:

Old East Slavic chronicles: the Tale of the Bygone Years (Hypatian and Laurentian versions), Novgorod First Chronicle, Kiev/Kyiv, Halician/Halyč, and Volhynian chronicles, and the Laurentian chronicle as continued until 1305;

Hagiography, including the Boris and Gleb cycles in different versions, Kyivan Cave patericon, and other vitae;

Russkaya Pravda and other Rus' legal acts enacted by princes such as Ustav Vladimira;

Canonic rules such as Kirik's Questions and other texts;

Didactic tracts including anti-Pagan and anti-Catholic polemics, often transcribed as pseudepigrapha;

Old East Slavic pilgrimages (such as the one written by the hegumen Daniel who visited Palestine);

Other original literature including such celebrated texts as Hilarion's Sermon on Law and Grace, Vladimir Monomakh's corpus, texts by Kirill of Turaŭ/Turov, The Tale of Igor's CampaignPraying of Daniel the Immured in two variants.

Charters and acts, both official and private, issued in Novgorod, Pskov, Polatsk/Polotsk, other Belarusian and Lithuanian regions, Ukraine, Moldavian Principality, and Moscow, before 1400 (for Novgorod and Pskov also until 1500).

The following translated texts were included:

Collections of quotations and wisdom literature: Izbornik of 1076, translated in Bulgaria and copied in Rus'; Melissa, or the Bee,

Byzantine Hagiography: Life of Andrew the Fool and Life of Basil the New; also Wonders of Nicetas, partly of Southern Slavic origin;

Josephus' Jewish WarAlexandria, a novel about Alexander the Great; Story of Ahikar the Wise, translated either from Greek or Syriac; Studite Monastic Regulations and Dialogue with a Jew.

Annotation

The texts are provided with word-by-word morphological annotation with homonymy resolved. The annotation was done manually using the Morphy workspace (developed by T. A. Arkhangelsky) and is based on an automatically updated dictionary. Lemmas are given in Old East Slavic form, reflecting the state before the loss of yers (e.g., сълати, дьнь).

The word-by-word annotation also includes the location of each word form in a specific manuscript (page/column and line). For texts translated from Greek, the corresponding Greek lemmas and word forms (or their combinations) are provided alongside the Slavic forms.

Search is available by lemma, word form, grammatical features, and also by Greek lemmas and word forms. Greek lemmas in The Jewish War have not yet been annotated (only word forms are indicated). To facilitate searches, a drop-down list of Slavic and Greek lemmas is available, along with virtual Slavic and Greek polytonic keyboards.

For distinguishing homonyms, it is recommended to specify the part of speech when searching for a specific lemma and to indicate gender for nouns. If homonyms have identical grammatical features (e.g., лукъ ‘weapon,’ лукъ ‘saddle bow,’ and лукъ ‘plant’), the drop-down dictionary that appears during lemma entry will include three separate lemmas for лукъ, each with its meaning indicated. Thus, it is possible to search for each homonym separately. There is also an option to search using the undifferentiated lemma. In this case, the user must carefully review all the retrieved contexts to differentiate the homonyms manually. Assistance is provided by the drop-down dictionary: for texts where at least two homonyms with identical grammatical features are used, the meaning of each word form is indicated. If only one homonym appears in the text—usually the most common one—its meaning may not be displayed.

The texts are supplemented with metadata that specify the genre, whether the text is a translation or original, the creation date of the original and its copies, a brief annotation of the text, and the source from which the text is included in the corpus. Users can also select specific texts or groups of texts. Charters are grouped by origin, and these groups can be selected with a single click.

In the Old East Slavic corpus, many works are represented in multiple versions, such as The Tale of Bygone Years in the Laurentian and Hypatian Chronicles, different copies of the Smolensk-Riga Treaty of 1229, and The Life of Theodosius as part of both the Uspensky Collection and the Kiev-Pechersk Patericon. Although all versions of a text are important for the study of linguistic history and corpus searches due to their textual, orthographic, and linguistic differences, including multiple versions of the same work can significantly skew statistical results, especially in smaller corpora.

The parameter "Each text in one version" allows for a selection that excludes multiple versions of the same work. The choice of the "main" version is somewhat arbitrary (e.g., the Hypatian version was chosen for The Tale of Bygone Years, while a later version from the Patericon was selected for The Life of Theodosius). It should also be noted that Old East Slavic texts created at different times often contain significant textual overlaps and borrowings, such as between The Tale of Bygone Years and The Tale of Boris and Gleb, The Tale of Igor's Campaign and The Zadonschina, or the treaties of Novgorod with various princes. Such cases are treated as separate texts.

The Old East Slavic corpus includes the "Word at a glance" feature, which displays automatically calculated similar words with similar contexts (for content words), as well as paradigms of encountered noun word forms in various morphological variants and orthographies.

Building the corpus

The project's team at the Institute of Russian language was headed by Anna Pichkhadze. The team included Galina Barankova, Maria Ermolova, Anna Fitiskina, Anastasia Glagoleva, Natalia Iordani, Dmitri Krylov, Irina Makeeva, Ekaterina Mishina, Maria Mushinskaya, Pavel Petrukhin, Anna Ptentsova, Dmitri Sitchinava, Veronika Skripka, Irina Yuryeva and others.

Publications

Check out the list of scientific publications on the Old East Slavic corpus via the link: https://ruscorpora.ru/s/dw5lX. In the Publications section, use filters to find other types of publications about the corpus.

Updated on 27.01.2025