Russian National Corpus

Search in corpora

Structure of the RNC

25.03.2025

The Russian Classics corpus has grown by 7.5 million words. It now includes the complete academic collected works of Fyodor Dostoevsky and Nikolai Nekrasov, the majority of letters written in Russian by Ivan Turgenev, as well as some previously omitted texts by other authors.

The "Similar Words" service is now available not only for the entire corpus, but also individually for works by nine authors whose corpora are the most extensive. This feature allows users to compare how a word is used across the distinctive individual styles of different writers.

While the associated words for a given writer may not always be informative — especially if the word appears rarely or in highly varied contexts — they often reveal vivid individual patterns. For instance:

The word страсть 'passion' in Pushkin's works tends to have a positive connotation (appearing alongside beauty and freedom), whereas in Tolstoy’s writing it takes on a strongly negative tone (linked with lust and malice).

The word лошадка 'little horse' in Leskov’s prose reflects everyday life and domesticity, while in Chekhov’s texts it is used as one of the many nicknames for his wife, Olga Knipper.

This functionality opens up fascinating possibilities for stylistic and semantic analysis across the literary voices of Russian classical literature.

The data export capabilities of the Multilingual parallel corpus have been enhanced. Now, when exporting to Excel, Word, or CSV, the system automatically includes parallel contexts along with additional information, such as the translation language and translator details.

25.02.2025

We continue to expand the “Word at a glance” feature. Recently, we introduced the “Word Sketch Difference” function. Now, within the Main Corpus, users can explore how the “Similar words” word cloud has evolved over time and view word definitions.

"Definition β" widget provides AI-generated definitions of the searched word. Currently, around 5,500 words (those most frequently searched in the Main Corpus) have available definitions. AI-generated definitions may contain errors or inaccuracies. Feel free to provide feedback using the "Rate" button next to the widget. Your input helps us to improve generated definitions.

With "Similar Words" widget users can now examine context-based associated words not just across the entire corpus but within specific historical periods. Unlike synonyms, associated words are those commonly used in similar contexts.

All texts within the Main Corpus (1700–2020s) have been divided into 11 time spans. Users can:

View similar words from a single time span
Compare word clouds across two different time spans
Download a screenshot of the results

For example, it's fascinating to see how the semantic associations of words like поезд ‘wedding ceremony’ → ‘train’ or машина ‘machine, mechanism’ → ‘automobile’ have evolved over time.

11.02.2025

In the Word at a glance of the Main, Educational, and Media corpora, as well as the "From 2 to 15" and "Russian Classics" collections, a new Word Sketch Difference feature has been introduced!

This new functionality allows users to see similarities and differences in the usage of two words. For example, you can explore what время ‘time’ and деньги ‘money’ have in common or analyze what can be колючий versus колкий (both ≈ ‘sharp, prickly’).

Word Sketch Difference is available for nouns, adjectives, verbs, and adverbs. You can compare two lemmas belonging to the same part of speech. However, sketches are not generated for words that appear fewer than in three different texts, as well as for proper names, abbreviations, and words with non-standard spellings.

For comparison, the top 6 collocates are selected for each keyword. The comparative table may contain fewer than 12 collocates if one or both keywords have fewer than six collocates or if there is an overlap in their top 6 lists.

11.02.2025

The Multimedia corpus has been expanded by 107,000 tokens. The following additions have been made: a collection of artistic reading recordings featuring short stories by Anton Chekhov performed by renowned actors such as Alexander Borisov, Leonid Bronevoy, Igor Ilyinsky, and Rostislav Plyatt; two theatrical productions, and recordings of television talk shows. The collection of regional speech recordings has been significantly enriched. It now includes conversations and interviews with residents of the Nizhny Novgorod, Murmansk, Ryazan, Sverdlovsk, and Tver regions, as well as the Krasnodar Krai, Yakutia, and other territories. These people are featured speakers in documentary films from the series "Letters from the Provinces" and in video blogs.

The corpus now offers the feature to filter subcorpora by region.

The Russian National Corpus is a representative collection of texts in Russian, counting more than 2 bln tokens and completed with linguistic annotation and search tools

Search in corpora

News