The Russian National Corpus is a representative collection of texts in Russian, counting more than 2 bln tokens and completed with linguistic annotation and search tools
Search in corpora
News
We continue to expand the “Word at a glance” feature. Recently, we introduced the “Word Sketch Difference” function. Now, within the Main Corpus, users can explore how the “Similar words” word cloud has evolved over time and view word definitions.
"Definition β" widget provides AI-generated definitions of the searched word. Currently, around 5,500 words (those most frequently searched in the Main Corpus) have available definitions. AI-generated definitions may contain errors or inaccuracies. Feel free to provide feedback using the "Rate" button next to the widget. Your input helps us to improve generated definitions.
With "Similar Words" widget users can now examine context-based associated words not just across the entire corpus but within specific historical periods. Unlike synonyms, associated words are those commonly used in similar contexts.
All texts within the Main Corpus (1700–2020s) have been divided into 11 time spans. Users can:
- View similar words from a single time span
- Compare word clouds across two different time spans
- Download a screenshot of the results
For example, it's fascinating to see how the semantic associations of words like поезд ‘wedding ceremony’ → ‘train’ or машина ‘machine, mechanism’ → ‘automobile’ have evolved over time.
In the Word at a glance of the Main, Educational, and Media corpora, as well as the "From 2 to 15" and "Russian Classics" collections, a new Word Sketch Difference feature has been introduced!
This new functionality allows users to see similarities and differences in the usage of two words. For example, you can explore what время ‘time’ and деньги ‘money’ have in common or analyze what can be колючий versus колкий (both ≈ ‘sharp, prickly’).
Word Sketch Difference is available for nouns, adjectives, verbs, and adverbs. You can compare two lemmas belonging to the same part of speech. However, sketches are not generated for words that appear fewer than in three different texts, as well as for proper names, abbreviations, and words with non-standard spellings.
For comparison, the top 6 collocates are selected for each keyword. The comparative table may contain fewer than 12 collocates if one or both keywords have fewer than six collocates or if there is an overlap in their top 6 lists.
The Multimedia corpus has been expanded by 107,000 tokens. The following additions have been made: a collection of artistic reading recordings featuring short stories by Anton Chekhov performed by renowned actors such as Alexander Borisov, Leonid Bronevoy, Igor Ilyinsky, and Rostislav Plyatt; two theatrical productions, and recordings of television talk shows. The collection of regional speech recordings has been significantly enriched. It now includes conversations and interviews with residents of the Nizhny Novgorod, Murmansk, Ryazan, Sverdlovsk, and Tver regions, as well as the Krasnodar Krai, Yakutia, and other territories. These people are featured speakers in documentary films from the series "Letters from the Provinces" and in video blogs.
The corpus now offers the feature to filter subcorpora by region.
The Birchbark Letters and Inscriptions corpora now feature photographs and drawings of the original historical texts.
By default, preview images are displayed in the concordance: photographs are shown on the left, and drawings are on the right. Clicking on an image opens it in full-screen mode, where users can zoom in or out on the drawings and photographs and download them as needed.
In KWIC mode and when selecting a subcorpus, images can only be viewed in full-screen mode by clicking the icon to the right of the text header.
There is a setting to hide images. This option is saved in the user’s browser, so upon returning to the corpus, the settings will remain, and the results will be displayed without images.
This new functionality was made possible through collaboration with the development teams of gramoty.ru and epigrafika.ru. These platforms provide more detailed information about the letters and inscriptions. We extend our gratitude to our colleagues and look forward to continued successful collaboration.