The Russian National Corpus is a representative collection of texts in Russian, counting more than 2 bln tokens and completed with linguistic annotation and search tools
Search in corpora
News
The Poetry corpus has been expanded by 400,000 word uses. In particular, new texts by twentieth-century poets have been added, as well as a large collection of Russian translations of ancient poetry, including hexametric versions of Iliad, "Aeneid" and Horatian "Satyres".
All the parallel bilingual corpora are now available in the new interface.
The interface of the Old East Slavic corpus has been substantially updated, it is now connected to the Overview feature. The selection of a subcorpus within the Old East Slavoc corpus is now available on a separate page. You can select from a list one or more Slavic literary monuments to be searched.
In the collocations search the user can specify the syntactic links. For example, if a user specifies решение 'solution' as the key, "verb" as a grammatic feature of the collocate, "object" as a syntactic role, the second word as a dependent, they can find out what is most often done with solutions (принять 'accept', согласовать 'agree', etc.). The table with the search results will show the 100 most frequent collocations with this syntactic relationship. For each of these collocations you can access a list of examples by clicking on the link.
Users of the Main corpus now can get frequency dictionaries by major parts of speech: nouns, adjectives, verbs, and adverbs. The same selection is available in the subcorpus frequency dictionary as well. Now you can specify the part of speech when comparing the most frequent lemmas of your selected subcorpus with the frequency dictionary of the whole corpus.
The parallel corpora started migrating to the new interface. At the end of April the following corpora are available with it:
For each bilingual pair, within the search form you can select any of three options: exact forms search, lexico-grammatical search or bilingual search. An important innovation is that in the new interface, the bilingual search is available on the main search page rather than on a new one. Queries in Russian and other languages are entered in two different query forms. The search results are formatted in two columns. This layout is already familiar to the users of the Birchbark letters corpus. On the left you see the original, and on the right, all the available translations.
This year, the RNC actively collaborated with Total Dictation, an annual educational event that unites people who speak Russian and strive to write correctly.
On the day of the dictation, Vladimir Plungian shared his thoughts on why the RNC is necessary for both linguists and non-linguists, how it changes, and which years were the most productive in the history of the Corpus. Watch the recording of the conversation; it's informative and exciting.
The Old East Slavic corpus features fourteen new texts with a total size of 120,000 tokens, including such famous works of Old East Slavic literature as "Sermon on Law and Grace", "Daniel Zatochnik's Prayer", "Kyivan Cave Patericon", the Slavic translation of "The Life of Basil the Younger". The corpus now includes different textual versions of different texts such as "Tale of Bygone Years", "Life of Theodosius", or the cycle on Boris and Gleb. More than 1,000 Old East Slavic lexemes have been added to the corpus, including the ancestors of such Russian words as выискивать, известие, избранник, пчелка, невежественный, стремглав, умышлять.
We continue to update the «Word at a Glance» feature. Now you can see the "Similar Words" cloud and Word frequency in the Middle Russian corpus and the "Similar Words" cloud in the Birchbark letter corpus.
Beta testing of the «Similar words» cloud within the «Word at a Glance» option continues. Thanks to your feedback we were able to improve the vector model that looks for similar words. We are waiting for new feedback on the «Similar words» clouds in the Main and Regional corpora and for your reaction on the «Similar words» cloud in the Middle Russian corpus. You can leave feedback by clicking the «Rate» button next to the feature.
The five examples in the «Word at a Glance» feature are now selected at random, which means that with each new viewing of the Word at a Glance feature there is a chance to see something new.