Church Slavonic corpus

The texts included in the Church Slavonic corpus, most of which are used in the contemporary liturgy of the Russian and some other Slavic Orthodox Churches, can equally be regarded as belonging to history and as relevant to modern linguistic practice. The justification for classifying this corpus as historical lies in the fact that it represents the language of a specific body of texts, the majority of which were finalized no later than the mid-18th century. Around the same time, its grammatical and orthographic norms were largely established.

At the same time, the Church Slavonic corpus occupies a unique position among the historical corpora of the Russian National Corpus. Notably, it is not included in the Panchronic Corpus, although it will feature annotated and searchable early and modernized representations of lemmas, facilitating navigation between it and other historical corpora. This decision reflects the distinct linguistic and textological status of the works included in the corpus.

First, most texts in the Church Slavonic corpus have a heterogenuous structure and feature editorial changes from various eras (for more details, see below). As a result, it is nearly impossible to assign them precise dates, which are essential for diachronic studies. Unlike most Old East Slavic, Middle Russian, or Modern Russian texts, these texts do not fit neatly into a chronological framework due to their evolving and composite nature. Second, starting from the 18th century, the history of the Church Slavonic language in Russia and the Russian language has diverged in linguistics. Distinct standards for these languages were codified, as was the function of Russian as a literary language. Including texts such as 19th–21st-century services and akathists or 18th-century revisions of biblical texts in the same chronological stream as Modern Russian would obscure the fundamental differences in tradition, linguistic norms, and purpose.

Third, Church Slavonic texts were created, translated, and edited by authors from various Slavic nations, including Ukrainians, Belarusians, Serbs, and Bulgarians, across different regions. These texts are tied to the history of various versions of the Church Slavonic language and the development of several modern Slavic languages, not just Russian. This diversity makes it difficult to include the corpus in a strictly Russian-centered Panchronic corpus.

Composition of the corpus

The corpus includes only Church Slavonic texts from the era of book printing:

  • The core of the corpus consists of books used in public and private worship: the Euchologion (Sluzhebnik), Trebnik, Menaion, Triodion, Oktoechos, Irmologion, etc. This also includes the Typikon.

  • Holy Scripture. In modern liturgical practice, two editions of the Holy Scripture coexist (the liturgical and the reading versions), which exhibit a significant number of lexical differences.

  • Akathists. Akathists surpass all other types of Church Slavonic literature in their degree of Russification and linguistic norm variability. Among Church Slavonic texts written in the 19th–21st centuries, akathists play a particularly significant role.

  • Patristic literature and hagiography. In this category, the corpus includes the Philokalia (the Church Slavonic translation by Paisius Velichkovsky) and the lives of saints composed by Dimitri of Rostov (Tuptalo) in the early 18th century.

The primary source of texts for the corpus is the Library of Patristic Literature.

Annotation

Orthography. Most texts are presented in Church Slavonic script with a relatively complex orthography, closely aligned with that used in printed Church Slavonic books. This orthography includes a large number of homophonic letters. Superscript letters are placed inline. For user convenience, searches can be performed using simplified orthography, which employs only modern Russian Cyrillic plus the letter ѣ, without diacritics. Additionally, modernized orthography, which excludes ѣ and the final ъ as well, is also available. Simplified orthography is the default for searches.

Some texts, specifically the so-called Green Menaia published in the 2000s, are presented in the so-called civil script (lay Cyrillic) and modernized orthography, reflecting modern Russian conventions (expanded abbreviations, capital letters in proper nouns, contemporary punctuation, stress marks on only some words, and no further diacritics). Users can select subsets of texts based on Church Slavonic or civil script.

Morphological annotation. The search functionality is organized similarly to the main corpus of the Russian National Corpus (RNC): by dictionary form (lemma) and grammatical characteristics. Throughout the corpus, lemmas follow traditional Church Slavonic orthographic conventions, but abbreviations are expanded, stress is marked with a unified acute accent, and aspiration marks are excluded. For user convenience, searches can also be conducted using simplified orthography (used in the drop-down lemma dictionary and as the default search format) or modernized spelling.

Some grammatical annotations were created using automated methods. Grammatical homonymy has not been resolved in the Church Slavonic corpus; however, traditional Church Slavonic orthography often distinguishes homonyms in writing (e.g., singular and plural word forms), and this information was utilized during automatic annotation.

Additional tools. The Church Slavonic corpus, which is extensive and lexically rich, features the "Similar Words" tool, offering a clear representation of the semantic fields of Slavic vocabulary.


The corpus also includes a "Frequency" mode for analyzing collocations of lexemes and grammatical markers, as well as other tools commonly found in large corpora, such as "Word at a glance", "Statistics," n-grams, frequency dictionaries, and comparison between the whole corpus and subcorpora. Note again that grammatical homonymy has been resolved only to a limited extent.

Text metadata. The metadata in the corpus is organized by genre on one hand, and by linguistic norm type according to the time period on the other.

The specific nature of the Church Slavonic corpus's metadata is defined by the layered character of the texts included in liturgical books. For most of these texts, it is impossible to specify the genre, creation date, or translation date. Liturgical sequences often include poetic texts (canons, stichera), instructions for worship leaders, and scriptural readings. Since the metadata describes the liturgical sequence as a whole, rather than each complete fragment, it does not classify texts as prose or poetry.

Instead, broad categories are used to classify text types:

  1. "Scripture" (the Bible, liturgical Gospel, and collections of Old Testament readings in separate rubrics);
  2. "Patristic";
  3. "Service" (all liturgical rites and services, as well as collections of liturgical texts, such as Theotokia, kontakia, etc., in various compilations);
  4. "Typikon";
  5. "Akathist";
  6. "Scientific" (a single text: Ethika Hieropolitika);
  7. "Hagiography";
  8. "Oration" (the last two genres are primarily represented in the Menaion Reader by Dimitri of Rostov).

Texts can also be filtered by linguistic norm type related to their period of creation. The following categories are available:

  1. "Archaic type" (e.g., Philokalia);
  2. "Hybrid type" (e.g., Spiritual Alphabet);
  3. "Standard type" (all core liturgical books except 20th-century texts);
  4. "20th century" (services and akathists composed in the 20th century).

Some texts, mainly later ones, are annotated with authorship and/or date information. For most texts, metadata include the year of publication.

For later services and akathists, a special group of metadata features has been introduced, reflecting the chronology of their inclusion into the circle of liturgical texts and key participants in this process.

Building the corpus

The corpus was developed by Alexey Polyakov (its main author), as well as by Alexander Kravetsky (annotation of metadata for late texts), Dmitri Sitchinava, Ekaterina Dobrushina and others.

Publications

Check out the list of scientific publications on the Church Slavonic corpus via the link: https://ruscorpora.ru/s/dGxo5. In the Publications section, use filters to find other types of publications about the corpus.

Updated on 01.02.2025