The Russian National Corpus is a representative collection of texts in Russian, counting more than 2 bln tokens and completed with linguistic annotation and search tools
Search in corpora
Within the same corpus, Graphs are now plotted using days, months or years as units of measurement. The default unit of measurement is now month. You can switch between days, months and years on charts. The option is available in the search results, and the functions of Get overview, Compare queries and Word at a glance.
Speech collections within the Accentological and Spoken corpora were expanded. Transcripts of academic and political talks, TV and radio broadcasts, personal oral history, and everyday dialogic speech have been added. The size of the Spoken corpus amounts to 14 million tokens, the overall size of the Accentological corpus, the naive poetry collection included, is 134.8 million tokens.
The parallel corpus was expanded by 3 million tokens. New texts appeared within the language pairs of Czech, English, French, German, Portuguese, and Spanish with Russian. In particular, the English-Russian tier was updated with a collection of transcripts of public TED Talks, while the Portuguese-Russian subcorpus has almost doubled in size and now also includes texts created in Portuguese-speaking Africa.
In the Social Networks corpus, genres are automatically marked for all the text. Users can select one or more genres from the list. Several new genres have been identified such as picture captions.
Properties generated by NeuroRNC are marked with a special icon. If you notice an inaccuracy or error, feel free to report it using the “Report an Error” button in the same window.
Since December 2023, two registration and authorization options are available — directly on the website and via Yandex ID.
A small fraction of users have registered on the RNC via ORCID.org. Due to the changes in the legislation of the Russian Federation that came into effect, this option of authorization is no longer available. We apologize for any inconvenience caused to users who previously registered via ORCID.org. Please register again to access the advanced functionality of the RNC.