RNC News

To see all the information on a given word, you can now use the Word at a glance functionality. As of today, the Word Portrait includes:

  • grammatical and semantic properties of the word
  • Similar words (β, only in the main corpus)
  • word usage examples in the corpus
  • distribution of examples by year and by type of text

For quick access to the Word Portrait and other corpus features as well as to the User's Guide you can now use buttons on the main page of ruscorpora.ru.

The output view Frequency has been improved: 

  • The "Contexts" column has been added
  • Grouping of results can be either disabled or applied for some words only. Users can retrieve combination of words with any distance between them (within the distance specified in the original query). Some of the words can be grouped by lemma/word form/grammatical features, and the remainder is retrieved without grouping. For example, for the query красивый ('beautiful’) + any noun one can get the frequency distribution of all nouns found in the search results and the overall frequency for the combination with any noun as well
  • The size of the downloaded table with "raw" data can reach 5000 lines

The frequency dictionary of a subcorpus as compared to the entire corpus can be sorted by differences of lemma ranks. The lemmas that are found only in the subcorpus top 500 frequent items are given first, followed by those with the highest gain in frequency rank with regard to the statistical population. For example, the frequency dictionary of texts written by women can be sorted in such a way that it starts with the characteristic lemmas like девочка (‘girl’), стараться ('try’), проблема (‘problem’), искусство (‘art') etc.

A new corpus "Russian classics" is available. It includes poetic, prosaic, journalistic and epistolary works from representative academic editions by Russian classical writers of the 19th - early 20th centuries: Pushkin, Baratynsky, Gogol, Tolstoy, Turgenev, Chekhov and others. A significant part of these texts are also included into the Main or Poetic corpus. Currently the corpus is in beta-version ("Russian classics β"). New authors and works are to be added later. The size of the corpus is more than 17.5 million tokens.

The interface of the Birchbark letter corpus has been substantially updated, and the corpus is connected to the Overview feature. Early Old East Slavic lemmas are available for search (not only слати, but also сълати ‘send’). An important innovation is that the original and translated texts are now shown in two columns, and a translation (Russian or any of the two English variants) can be chosen to be displayed.

Show all