The Russian National Corpus is a representative collection of texts in Russian, counting more than 2 bln tokens and completed with linguistic annotation and search tools
Search in corpora
News
The Birchbark Letters and Inscriptions corpora now feature photographs and drawings of the original historical texts.
By default, preview images are displayed in the concordance: photographs are shown on the left, and drawings are on the right. Clicking on an image opens it in full-screen mode, where users can zoom in or out on the drawings and photographs and download them as needed.
In KWIC mode and when selecting a subcorpus, images can only be viewed in full-screen mode by clicking the icon to the right of the text header.
There is a setting to hide images. This option is saved in the user’s browser, so upon returning to the corpus, the settings will remain, and the results will be displayed without images.
This new functionality was made possible through collaboration with the development teams of gramoty.ru and epigrafika.ru. These platforms provide more detailed information about the letters and inscriptions. We extend our gratitude to our colleagues and look forward to continued successful collaboration.
The "Russian сlassics" corpus has been updated with the academic editions of complete works by Alexander Griboyedov and Fyodor Tyutchev. Their written legacy is relatively small in size (with Tyutchev having written even less in Russian than the "one-book author" Griboyedov). However, their language is of significant interest from various perspectives. The corpus also features variant readings in different revisions of the same texts. All the texts within the corpus have been re-annotated, incorporating improvements in the Rubic language model.
New texts totaling 100,000 tokens have been added to the Church Slavonic corpus. This is a small part of the collection of saints' lives compiled by the famous church figure Dimitri of Rostov (Tuptalo) in the early 18th century. This addition has significantly increased the size of narrative texts, which were previously almost exclusively represented by the Bible.
The corpus has been adapted to a complex orthography close to that used in printed Neo-Church-Slavonic books. For user convenience, a full search function is also available in simplified orthography, including the pop-up lemma dictionary.
The corpus, being both extensive and lexically rich, offers a "Similar Words" feature, which provides an illustrative representation of different semantic fields within Church Slavonic vocabulary.
The corpus also features the "Frequency" mode, allowing the analysis of the collocability of lexemes and grammatical markers. Please note that grammatical ambiguity has been resolved only to a limited extent so far.
On the last working day of the year, the team of the Russian National Corpus traditionally reflects on the results and recalls what new developments have taken place over the year.
In 2024, the Corpus grew by more than 109 million words. Many corpora now feature search and statistical tools that were previously available only in the Main, Media, and other "advanced" corpora.
We hope that in this image everyone will find tools that make your work with the Corpus even more productive and enjoyable. May the New Year bring you many fascinating discoveries and inspiring insights!
We extend our heartfelt gratitude to the creators of the Corpus of the Chuvash Language, the Open corpus of Veps and Karelian languages, and the Digital Corpus of the Khakas Language for their fruitful collaboration.
With warmest wishes for the New Year,
The Team of the Russian National Corpus