More search results are now available in Frequency mode, and even more can be downloaded as a spreadsheet. This is a feature of major importance for the researcher who is interested not only in the most common variants, but also in a broader picture.  A spreadsheet (Excel or CSV) now shows the top 1000 most frequent query results. Up to 5000 output results with frequency data can be downloaded. Read more about this and other types of output in the User Guide.

When downloaded in the Excel format, the Info tab now shows the exact number of both found and downloaded documents and examples. Thus the user can evaluate the output results more accurately and interpret them correctly.


Diachronic subcorpus statistics have been added to the Main and Regional corpora. Now you can compare graphs characterizing the size and composition of texts in a subcorpus, changing over time, with texts of the whole corpus. For example, one can see that women authors in the 19th century write relatively more fiction than texts in other genres, and the situation levels off in the twentieth century.

You can set the distribution, dates, and smoothing of frequencies. To see the charts of diachronic statistics, you need to click on the (i) button in the subcorpus header, select the Statistics section, and navigate to the Diachronic statistics tab.

From the tooltip, you can learn how to use the new charts and graphs and how to interpret the obtained results.

Earlier in February, diachronic statistics of the Main and Regional corpora became available to users.


In the Word at a glance widget, you can now examine the ratio of the number of occurrences of a word in a category to the size of that category multiplied by a million (instance per million, or ipm). With this widget you can determine, for example, whether Leo Tolstoy really used the word мир ‘peace’/’world’ more often than other Russian classics, taking into account the overall size of their texts. Yet another question: which romantic poet mentions всадник ‘horsemen’ more, Lermontov or Pushkin?

The new chart is available in the Statistics widget of the Word at a glance. The user can select the meta-attribute for which the chart is to be plotted from the list of the most representative attributes of the corpus. To see a pie chart containing the exact number of contexts of a word in a category or the number of texts containing the searched word, one has to switch from ipm to words or texts.

In addition, in the “Statistics” mode, the ipm information in the table has appeared. By default, the table is sorted by the number of occurrences. To change the sorting criterion, click on the column name.


In the Social Networks corpus the annotation of sentiment has been added. Now texts of positive or negative sentiment can be selected for research. Texts where sentiment could not be determined are categorized as undefined.

Sentiment labeling in the “Social Networks” corpus appeared thanks to the Friends of NeuroRNC. They helped us to collect data for the training dataset, so that we could train the neural network model, and then label the texts of the corpus. A field in the subcorpus form and in the text information is marked with a special icon indicating that the values for the attribute were generated by NeuroRNC.

Errors may occur in the automatic sentiment annotation. If you find them, please let us know using the “Report an error” button in the text information. This will help us to improve the quality of the annotation.