RNC News

The search interface for the media corpora, both national and regional, has been updated. Media corpora are now suggested when the «Feature overview» function is activated, and their descriptions in Russian and English have been redesigned and updated.

The following changes have been made to the new interface of ruscorpora.ru:

On the home page, by clicking on the «All corpora» link, you can now open a full list of 38 corpora (including all the bilingual tiers of the parallel corpus, all the historical corpora, etc.). You can go to the search form for any corpus by clicking on its name.
The «Statistics» page also has a full list of corpora with data on the number of texts, sentences, and tokens.

The search and subcorpus selection forms for all corpora transferred to the new interface have been improved. The «Lemmas and tags» search form is expanded by default; if desired, the user may expand the «Exact search» query bar. The lemma entry field is displayed first in the query form. When selecting a subcorpus, an option is provided to select the date range of the corpus release.

Using the menu on the Search button, the user can now select their preferred type of output (concordance, KWIC, graphs, n-grams). The user's choice will be remembered and will be used by default since. 

On clicking a word in its popup window «Similar words» are displayed. These are the words that are semantically closely related to the word in question and are used in similar contexts. The closeness coefficient in brackets is calculated using distributional semantics models. It is built using the main corpus of the RNC and provided by the RusVectōrēs project. Read more about this experiment here

It is planned to gradually transfer the other corpora to the new interface and platform. Feel free to use the new version of the site and report any bugs you notice.

Show all