News

15.04.2024

We continue to roll out new functionality already available in the advanced corpora, such as Main, Media, and Learning, to other corpora. An improved version of the “From 2 to 15” corpus is now available to users of the RNC. All the texts within the corpus feature resolved grammatical homonymy and syntactic annotation. Syntactic relations search and collocation search are now available, as well as new output types such as frequency, n-grams, statistics.

The Word at a Glance function has been updated, and new types of sorting by context have been added.
In the Word at a Glance you can see that the words мама 'mom' and папа 'dad' are used much more often in texts for the children of 7-8 years old, and the words бабушка 'grandma' and дедушка 'grandpa' has an equal frequency rating for both the children of 7-8 years and for teenagers of 14-15 years.

The bar next to the fragment indicating the age of readers who should understand these fragments is now clickable. When you click, you will see the calculated classical readability indices: Flesch-Kincaid Index, Coleman-Liau Index, Automatic Readability Index, Simple Measure of Gobbledygook, Dale-Chull readability formula

15.04.2024

In anticipation of the 20th anniversary of the National Corpus, we have significantly updated the publications page on our website. The list of publications about the Corpus has been expanded: the number of publications has increased by about 5 times! The section now includes both academic articles and other types of publications such as interviews, instructions, and social media posts.

The page of publications about the Corpus has advanced functionality: now you can find a publication about the Russian National Corpus in the search bar or using the filters on the right.

By default, the most popular filters are shown to the user. To see all available filters on the publications page, click "Show all". Combining multiple filters narrows the search and allows publications to be selected using multiple criteria.

Some publications can be downloaded by clicking on the icon to the right of the title. Other publications open in a separate window. You can share the list of selected publications by clicking on the "Copy link" button.

02.04.2024

Two new parallel corpora are available. The Japanese-Russian language pair has more than 400 thousand tokens and includes fiction texts and news translated from Japanese. The Khakas-Russian texts prepared for the RNC on the basis of the Electronic Corpus of the Khakas Language feature more than 1 million tokens and cover folklore (including 19th century records), written fiction, and journalism.

The existing parallel corpora have also been expanded. The Portuguese pair (now 1.6 million tokens) and the Czech pair (4.3 million tokens) have grown the most.

01.04.2024

New widgets are now available at the Word at a glance in the National Media, Educational and Russian classics corpora.

Sketches, Word frequency and Similar words have appeared in the National Media and the Russian classics corpora. Since Word at a glance is built on the base of the corpus texts, sketches and similar words for the same word are different in different corpora. For example, in the texts of the National media corpora, шутка ('joke') is most often злой ('evil') and первоапрельский ('April Fools’') and in the works of Russian classics — it is колкий ('sharp') and забавный ('funny').

The Statistics widget in all three corpora has been updated. Follow the link to find out in which type of texts of Russian classics the word anecdote is more often used.