RNC News
The Russian National Corpus is a powerful tool for analyzing and researching language. It contains millions of texts that allow its users to better understand the language in all the diversity of its forms. One of the most important aspects of processing the corpus is analyzing statistical data.
The summary statistics of the RNC is available from the main page. This section contains data on size of the corpus included in the RNC (texts, sentences, and tokens), as well as tables with the distribution of texts of the Main corpus by types and other metatextual parameters.
By clicking on the corpus name in the table, you can navigate to the statistics in the Corpus portrait of the selected corpus. You can also navigate to the corpus statistics from the query form by clicking on the icon (i). Now the corpus statistics are available for the Main, Educational, and Media corpora, some historical corpora, as well as “Russian Classics” and “From 2 to 15”.
In corpora with advanced statistics, one can compare a customized subcorpus with the entire corpus. To view compared data, click on the icon (i) in the subcorpus header.