RNC News

The Accentological corpus has been switched to the new interface. The "Word at a glance" function is available. The corpus appeared in the “Get overview” section.

The “Word at a glance” function for the Main corpus has been updated:
In the "Morphemic structure" widget, alternative morphemic parses for different parts of speech are put apart. For example, the word печь is parsed differently as a noun and as a verb. Alternative structures are made available by switching between parts of speech in the ”Word at a glance” function.

The Poetry corpus has been switched to a new interface, the "Word at a glance" service of the Poetry corpus is available. The user can view the search results in the mode with metric formulas, where each line of a poem will be provided with poetic annotation. When showing the extended context, the entire poem is viewable. A subcorpus from texts by multiple authors can be customized on the respective page, and conditions on other meta-attributes can be specified there.

The "Authors" section in "Corpus portrait" displays a list of all authors featured in the Corpus. The list can be sorted alphabetically, by birth and death dates, and by gender of the author. The alphabetical filter allows the user to view only authors whose last name begins with a given Cyrillic letter. By selecting one author in the list, one can create a subcorpus of only his/her texts.

The "Random Poem" widget has appeared in the "Get overview" section: a random poetic example is selected for any word or phrase.

The tables below the "Graph by year" within the Main Corpus now show the number of texts and the number of examples in the search results and in the corpus as a whole.

In the Social networks corpus, wrong dates have been corrected and non-unique texts removed. The corpus has now become an effective tool for studying the diachrony of linguistic phenomena. Chronology of language items becoming popular or going out of fashion is available (see хайп, превед, уметь во что-л.).
The corpus features a collection of social network texts prepared by the staff and students of Voronezh State University. It includes materials from the Big Voronezh Forum and other local networks in Voronezh, posts by well-known Voronezh bloggers, discussions in local groups on popular platforms VK, Telegram, Livejournal and others. In total, this collection counts about 22.8 million word uses. The texts of the Voronezh collection have more detailed metatextual annotation and cover a large timespan of 2001-2023. In the future, it is planned to include materials from social networks of other Russian regions into the corpus.

The Word at a glance service in the Main Corpus has been enriched with data on word families. The new widget now shows families of cognate words. For now, this option is only available for words with a single root (e.g. стол, but not пароход) that are manually annotated within the morphemic analysis dictionary. Data on other words will appear in future, but even now you can see interesting connections between words.

As it is a custom already, you see a "Rate" button next to the new widget. Feel free to let us know if you notice any bugs. Thanks to your feedback, we keep improving the neurolinguistic models underlying the Word at a glance service. It is very interesting and important to us what you think about the first version of the word family model.

It has become possible to specify more precisely the conditions of lexico-grammatical search in the Main, National media and Regional corpora. One may set conditions on the distance between words in the search form. Until now, if the specified range included 0 (for example, from -1 to 1), a single token in the results could match both words specified. Now, at the top of the search form, you can select the "word matches excluded" option to remove the zero distance from the range. For example, you can find plural animate nouns conjoined with крестьяне ‘peasants’. Here is the resulting frequency list. Previously, a similar query would also find the word крестьяне alone, without its "neighbors" (since at the zero distance it matches all the conditions for a conjoined noun).

Subscribe to our Telegram channel to follow our updates and receive illustrated corpus instructions.

There are several upgrades in the Word at a Glance service:

  • New sketches have been added: coordinated nouns, adjectives, verbs, and adverbs. 
  • In the Main corpus for all the sketches, navigation from sketches to concordances of the respective collocations is made available.
  • Thanks to feedback from keen users, the morphemic analysis has been updated. Please keep providing feedback if you notice any bugs by using the "Rate" button.

Handling of speaker information in the Spoken corpus is upgraded. Names of speakers and film characters are now highlighted in the search results. Detailed information on sociological parameters can be obtained in a pop-up window that opens by clicking on the name.

When downloading search results into an Excel file, an additional sheet called Info now displays data on the parameters of the respective query and also has a link to the query itself.

The Word at a glance service continues to evolve. A new widget appeared for nouns in the Main corpus, which shows the forms of the word that occur in the corpus more than 5 times. For the same noun form (case + number) different variants and/or spellings can be displayed if they occur in the corpus. Since the Main corpus has automatic annotation, you may find some forms incorrectly related to the word you are looking for. If you notice such discrepancies, please report them to us using the "Rate" button.

All of the corpora available in the new interface now have the navigation between the popup windows with word-by-word analysis and the "Word at a glance" service.

The Spoken corpus has migrated to the new interface. The "Word at a glance" and "Get overview" services are now available for it.

In the Word at a glance service, the morphemic structure of each word is visualized: prefixes, roots, suffixes and endings are highlighted using the geometrical signs adopted in the school Russian language teaching. The word structure annotation is based on the morphemic dictionary specially developed for the corpus. Automatical annotation is added for the lemmas that are absent in the morphemic dictionary by the NeuroRNC algorithm. Please note that the morphemic structuring of words may differ from what you are accustomed to (see "Principles of annotation").

Errors of automatic annotation are always possible. Please report errors using the "Rate" button.

The multilingual parallel corpus is available in the new interface, as well as within the Word at a glance and Get overview services. Now all the parallel corpora are available in the new interface.

For the Old East Slavic corpus, the Word at a glance service and Word frequency widget are available.

The Poetry corpus has been expanded by 400,000 word uses. In particular, new texts by twentieth-century poets have been added, as well as a large collection of Russian translations of ancient poetry, including hexametric versions of Iliad, "Aeneid" and Horatian "Satyres".

All the parallel bilingual corpora are now available in the new interface.

The interface of the Old East Slavic corpus has been substantially updated, it is now connected to the Overview feature. The selection of a subcorpus within the Old East Slavoc corpus is now available on a separate page. You can select from a list one or more Slavic literary monuments to be searched.

In the collocations search the user can specify the syntactic links. For example, if a user specifies решение 'solution' as the key, "verb" as a grammatic feature of the collocate, "object" as a syntactic role, the second word as a dependent, they can find out what is most often done with solutions (принять 'accept', согласовать 'agree', etc.). The table with the search results will show the 100 most frequent collocations with this syntactic relationship. For each of these collocations you can access a list of examples by clicking on the link.

Users of the Main corpus now can get frequency dictionaries by major parts of speech: nouns, adjectives, verbs, and adverbs. The same selection is available in the subcorpus frequency dictionary as well. Now you can specify the part of speech when comparing the most frequent lemmas of your selected subcorpus with the frequency dictionary of the whole corpus.

The parallel corpora started migrating to the new interface. At the end of April the following corpora are available with it:

For each bilingual pair, within the search form you can select any of three options: exact forms search, lexico-grammatical search or bilingual search. An important innovation is that in the new interface, the bilingual search is available on the main search page rather than on a new one. Queries in Russian and other languages are entered in two different query forms. The search results are formatted in two columns. This layout is already familiar to the users of the Birchbark letters corpus. On the left you see the original, and on the right, all the available translations.

This year, the RNC actively collaborated with Total Dictation, an annual educational event that unites people who speak Russian and strive to write correctly. 

On the day of the dictation, Vladimir Plungian shared his thoughts on why the RNC is necessary for both linguists and non-linguists, how it changes, and which years were the most productive in the history of the Corpus. Watch the recording of the conversation; it's informative and exciting.