Corpora
  • 27,289 texts
  • 18,556,005 words
historical, disambiguated, syntactically tagged

The corpus includes fiction and nonfiction texts from the collected works of Russian classical writers.

At the moment, the corpus contains Russian-language works of

  • Alexander Radishchev
  • Ivan Krylov
  • Vasily Zhukovsky
  • Alexander Pushkin
  • Evgeny Baratynsky
  • Mikhail Lermontov
  • Nikolay Gogol
  • Ivan Turgenev
  • Mikhail Saltykov-Shchedrin
  • Lev Tolstoy
  • Nikolay Leskov
  • Anton Chekhov

Preference was given to the digitized complete collections of works available in the digital libraries rvb.ru and feb-web.ru. The soviet-times collected works by Zhukovsky, Gogol or Leskov omit some texts, for reasons including ideology. The texts of Leo Tolstoy and Anton Chekhov came from related digital projects. Editorial translations have not been included in the corpus. Foreign-language texts containing original Russian fragments or drafts are included.

The default sorting is by ascending date, sorting by author (and by genre and title within the texts of the same author) is also available. In the corpus you can build a diachronic frequency graph and also compare several queries on the graph.

The corpus is now in beta version, and it is planned to be updated with new authors and works. The approximate size of the corpus is 18 mln tokens.

Corpus goals

The works of Russian classical writers have an outstanding status for the history of Russian literary language. Literary language is "processed by masters," and the texts of these masters represent the core of the corpus of the Russian literary language. Such a corpus can be consulted as a normative rather than a usual source, and authoritative examples can be taken from it for academic grammars, dictionaries, and textbooks.

Including all of these texts in the main corpus would be a questionable decision, as it would violate the genre and authorial balance. Thus, the goal of the corpus is, without being bound by the limitations of the main corpus, to present the heritage of the Russian classics as widely as possible in the RNC, transforming it step by step into a corpus of Russian literary language of the 19th and early 20th centuries.

Since the goal of the corpus is to bring together in the fullest possible form the works (fiction and nonfiction) of Russian classical writers, the markup of the texts is not very rich and includes only a minimal set of parameters used in all RNC corpora.

Another useful additional feature of the corpus is the ability to search through the texts of selected authors, which are more fully represented than in the main corpus.

Creating the corpus

The task of creating the corpus is being carried out by

  • Boris Orekhov (general concept of the corpus; collection of texts, program processing)
  • Maria Satina (additional metadata markup)
  • Dmitri Sitchinava (manual proofreading, program processing, additional metadata markup)
  • Pavel Dyachenko (search realization)
  • Alexey Polyakov (preparation of Gogol's texts)
Publications

Check out the list of scientific publications on the "Russian classics" corpus via the link: https://ruscorpora.ru/s/boKPL. In the Publications section, use filters to find other types of publications about the corpus.

Updated on 22.07.2024