Russian National Corpus

Word at a glance

Features overview

The RNC for school

User guide

Search in corpora

News

Show all

30.06.2026

A new corpus has been added to the Russian National Corpus: the General Internet Corpus of Russian (VKontakte). It contains texts from the VKontakte social network covering the period from 2007 to early 2022. With the addition of the new corpus, the Russian National Corpus has grown by 11.3 billion word tokens. This has increased the total size of the RNC more than sixfold, from 2.2 to 13.5 billion word tokens.

A distinctive feature of the new corpus is the sociolinguistic annotation of texts: each text is assigned author-related parameters such as gender, age, and place of residence. This makes it possible to study regional, age-related, and gender-based features of Russian using very large volumes of data and to draw statistically significant conclusions about different varieties of the language.

08.06.2026

We continue to expand the corpus tools for teaching Russian at school. The Practice Example Generator now includes rules for spelling words with unpronounced consonants. The rules are organized into groups: consonants that can be checked with a related word (e.g., здравствуйте – здравие), dictionary words with uncheckable consonants (e.g., чувство), and cases where the target word and the checking word do not match (e.g., блестеть – блесна). Altogether, the new rules cover more than 30 groups of words with unpronounced consonants.

You can access the generator page from the RNC for school page by clicking on the corresponding banner.

08.06.2026

The National Media Corpus has been expanded with publications from the 1980s–1990s. The additions include a substantial collection of issues of Kommersant from 1992–1994 and 2000, as well as selected issues of “Nedelya”, “Pionerskaya Pravda”, “Sovetskaya Sibir”, “Uchitelskaya Gazeta”, and “Universitetskaya Zhizn”. The total volume of the update exceeds 2.2 million word tokens.

Show all

The Russian National Corpus is a representative collection of texts in Russian, counting more than 13 bln tokens and completed with linguistic annotation and search tools

Search in corpora

News