Using the Corpus

All the texts in the RNC are presented on this site online and are available for non-commercial scientific of educational use (Article 19 of the Russian Copyright Law).

If you quote the contexts retrieved from the RNC please cite the RNC as the source as well as the author of the text in question and the title of the text.

The offline versions of the Corpus are available. To obtain them, you should sign the corresponding license agreement and send the request with the attached scans of the signed license agreement to np-rnc@yandex.ru. Please specify the purpose of use in the request.

The offline disambiguated version of the Corpus (approx. 1 million words) — license agreement

The diachronic datasets of the Corpus (the total size of datasets is approx. 250 million words) — license agreement

The diachronic datasets cover three timespans, 1700-1916, 1918-1991 and 1992-2016, and correspond to three historical periods of the society and the Modern Russian language ("pre-Soviet", "Soviet", including the diasporal texts as well, and "post-Soviet"). Each of these timespans is represented by a large UTF-8 encoded text file containing shuffled sentences from the source texts. The sentences are shuffled to comply with the requirements of copyright protection. The texts lack a morphological or metatextual annotation.

The Corpus' developers appreciate any feedback and bug report on the Corpus.

RNC administration address: 119019 Moscow, Volkhonka Street, 18/2, Russian Language Institute of the RAS, Department for corpus linguistics and linguistic poetics.

