News

17.07.2024

The Middle Russian corpus has been enlarged by 500 thousand tokens. Texts of different genres and time periods have been added to the corpus, from the Pskov gramoty of the fourteenth and fifteenth centuries surviving in late copies to the early documents of Peter the Great and treatises in rhetoric of the 1690s. In addition, the Commission copy of the Novgorod First Chronicle, collections of peasant petitions and Muscovite diplomatic correspondence with Germany and the Crimean Tatars are now available for searching. The morphological analysis of the text has been significantly upgraded: the corpus dictionary has grown by about 40 000 lexemes.

01.07.2024

More search results are now available in Frequency mode, and even more can be downloaded as a spreadsheet. This is a feature of major importance for the researcher who is interested not only in the most common variants, but also in a broader picture.  A spreadsheet (Excel or CSV) now shows the top 1000 most frequent query results. Up to 5000 output results with frequency data can be downloaded. Read more about this and other types of output in the User Guide.

When downloaded in the Excel format, the Info tab now shows the exact number of both found and downloaded documents and examples. Thus the user can evaluate the output results more accurately and interpret them correctly.

01.07.2024

Diachronic subcorpus statistics have been added to the Main and Regional corpora. Now you can compare graphs characterizing the size and composition of texts in a subcorpus, changing over time, with texts of the whole corpus. For example, one can see that women authors in the 19th century write relatively more fiction than texts in other genres, and the situation levels off in the twentieth century.

You can set the distribution, dates, and smoothing of frequencies. To see the charts of diachronic statistics, you need to click on the (i) button in the subcorpus header, select the Statistics section, and navigate to the Diachronic statistics tab.

From the tooltip, you can learn how to use the new charts and graphs and how to interpret the obtained results.

Earlier in February, diachronic statistics of the Main and Regional corpora became available to users.

13.06.2024

In the Word at a glance widget, you can now examine the ratio of the number of occurrences of a word in a category to the size of that category multiplied by a million (instance per million, or ipm). With this widget you can determine, for example, whether Leo Tolstoy really used the word мир ‘peace’/’world’ more often than other Russian classics, taking into account the overall size of their texts. Yet another question: which romantic poet mentions всадник ‘horsemen’ more, Lermontov or Pushkin?

The new chart is available in the Statistics widget of the Word at a glance. The user can select the meta-attribute for which the chart is to be plotted from the list of the most representative attributes of the corpus. To see a pie chart containing the exact number of contexts of a word in a category or the number of texts containing the searched word, one has to switch from ipm to words or texts.

In addition, in the “Statistics” mode, the ipm information in the table has appeared. By default, the table is sorted by the number of occurrences. To change the sorting criterion, click on the column name.