The Russian National Corpus is a representative collection of texts in Russian, counting more than 2 bln tokens and completed with linguistic annotation and search tools
Search in corpora
News
The Birchbark letters corpus now features 19 documents from Novgorod and Staraya Russa, found in 2023. They contain more than 300 tokens. In addition, the texts and translations of previously found birchbark letters were corrected. The corpus includes such new words as ѣздець ('rider'), шида ('silk'), немочи ('to be ill'), крута ('dowry'). Such common words as огородъ (garden), капуста (cabbage), боꙗринъ (boyar) were also encountered for the first time in the new documents.
The search options have been significantly improved, and new tags related to grammar and interpretation can be specified in the query form.
Leonid Leibovich Iomdin, an outstanding Russian linguist, specialist in modern syntax and semantics, computational linguistics and machine translation, a leading researcher at the Institute for Information Transmission Problems of the Russian Academy of Sciences, has died today at the age of 77.
Leonid Leibovich was an active participant in the project "Russian National Corpus" and one of the creators of the Syntactic Corpus within the RNC.
We offer our sincere condolences to the family and friends of Leonid Leibovich.
Personal accounts are now available on the Corpus website.
Its main task is to enhance the users’ individual workflow. Now you can save queries (in any corpus) or comparisons between them (in the corpora where this function is implemented) to your personal account and return to them when necessary.
To save a query or comparison, click the “Save query” button in the search result or “Save comparison” on the query comparison page. The corresponding tabs of your personal profile enable viewing the saved queries and comparisons, assigning names to them, copying short links for sharing, and deleting the queries. The number of queries and comparisons that can be saved is unlimited.
The profile settings have been expanded. Users can fill in their personal information (this data can only be seen by the user), change their password or delete their account. In the future, with the consent of the user, some of their data, such as name and affiliation, will become visible to others.
The personal account is available in the desktop and the mobile versions.
The Old East Slavic corpus was expanded with new texts and grew by 43 thousand token. On the one hand, it includes later texts of the 14th century (e.g., Ukrainian and Moscow business charters, the Pskovian Tale of Dovmont), and on the other hand, the annotation of some early texts (the Tale of Bygone Years according to the Laurentian manuscript or hagiographies) has been expanded. The vocabulary of the corpus now includes the ancestors of such familiar words as naprasno ‘unexpectedly; (Modern Russian) in vain’, peremolvit'sa ‘exchange words’, šapka ‘headwear’, or raznoglasie ‘discord’.
Users can select a subcorpus and get statistics by standard criteria (including the date of the text and copy, text genre, text size) and to find out to what extent the protagonists from chronicles go somewhere more often than those from charters and tales. It is now possible to search Greek lemmas and word forms in translated texts. Greek words can be typed on the virtual keyboard. For example, the word δόγμα (‘dogma’) is rendered by the Slavic translators not only as a direct borrowing but also as ‘command’, ‘doctrine’, or ‘statute’.
A new functionality is available in the Word at a glance function of the OES corpus, “Word Forms”. For Old East Slavic nouns in different orthographies, the paradigm of all the number and case forms encountered in the corpus are given, and it is possible to find out the frequency of these forms and follow the links to the word. For example, you can find out what forms the word drug ‘friend’ had. Some forms of the rarely used dual number are not yet attested in our corpus, so you should also consult grammars for the full paradigms.