News

05.03.2024

In February, we significantly improved the National media corpus.

It was updated with new texts counting 49,6 million tokens. These are printed media from the 1990s ("Nezavisimaya Gazeta", including weekly supplements, "Moskovsky Komsomolets", and "St. Petersburg Vedomosti").

In all the texts of the corpus, grammatical homonymy has been automatically resolved and annotation of syntactic relations (starting from the second token upon clicking "add condition") has been completed. Thus all the latest functions that are already available in the Main and Regional media corpora, such as searching by syntactic relations and properties, collocation search, frequency dictionary, frequency of query results, are also searchable within the National media corpus, the largest of the three.

The RNC Media corpus is now the world's largest Russian online corpus with the ability to search by syntactic relations!

In the form of the subcorpus, it is now possible to select texts by topic and type. For annotation of these fields RuRoBERTa model is used, further trained on the Regional Corpus data. Fields in the form of subcorpus and text information with values generated by NeuroRNC, are marked with a special icon. Errors are possible with automatic annotation. There is a "Report an error" button in the text information pop-up window. Please inform us of any inaccuracies or errors in the definition of topics and types.

05.03.2024

The Syntactic Corpus now offers the possibility to select a subcorpus by basic parameters, such as author, text title, date of creation and author's year of birth, as well as by genres and text types and by markup date.

Follow our news on the website and social networks, in March we will continue to improve the Syntactic Corpus!

13.02.2024

The Russian Classics corpus was expanded by more than 1 million tokens. Complete works by Alexander Radishchev and Ivan Krylov were added, as well as some texts by authors already represented in the corpus that had been omitted in the previous release of the corpus. The function of diachronic graphs is available, queries can be compared, and subcorpus can be customized by both genre and date. The search results can now be sorted by date of creation and by author and genre.

13.02.2024

The Сorpus portrait now features diachronic statistics for the Main and Regional corpora. Distribution of the corpus size and metatextual parameters by creation date is represented on a graph. Within the Regional corpus, distribution of texts by countries and regions is also available diachronically.

To see diachronical graphs, click the (i) button in the corpus header, select Statistics and navigate to Diachronic statistics.
The user may specify smoothing, date range, and distribution (axis step), the choice is applied at once to all the graphs on the page.