RNC News

The Multimedia corpus has been switched to a new interface. Now the corpus search is redesigned, the "Word at a Glance" service is available. 

A special feature of the corpus is the multimedia search functionality. Three search queries may be specified simultaneously: for words, gestures and speech acts. You will find clips where both video/audio and text match these queries: for example, those with the preposition за in speech and тост (toast) as gesture.

Please note that by default the form hides some of the conditions for words, gestures and speech act. One can add these conditions by clicking the “Add Conditions” button. 

For example, to find clips in which somebody moves their head in a certain way, you need to add two conditions in the Gestures section: for the Main organ and for the Gesture direction. Then you may specify, respectively, голова (head) and из стороны в сторону (side to side) as features. To find clips where people whisper, please add a condition for the Manner of speech to the search form and select the value шепот (whisper).

Additionally, you can set conditions describing the vocalic and orthoepic structure of words.

The annual update of the Birchbark letters corpus is released. Fifteen medieval letters found in Veliky Novgorod and Staraia Russa last year - and yet two more awaiting academic publication since 2021 - are now simultaneously available in the RNC and in the gramoty.ru database. The work on the corpus of birchbark letters was supported by a grant from the Russian Science Foundation (Project 19-18-00352 "Vernacular writing of Old Rus' of the 11-15 centuries (birchbark letters and epigraphy): new sources and research methods"). An illustrated story about the findings of the last year can be read on the Arzamas website.

We have updated the Educational Corpus with over 1,000 new texts. It now contains all the major works from the school curriculum, including those recommended for extracurricular reading.

But that's not all. For morphological annotation of all texts we have applied neural network models. The automatic annotation has resolved grammatical homonymy, allowing us to add modern tools for analyzing words and texts to the Educational Corpus.

The Word at a Glance function shows collocations, similar words, frequency of use, paradigms and history of use, as well as examples from corpus texts. You can use the Query comparison to compare the frequency of use of words and phrases. 

You can also analyze texts. For this purpose, there is the Corpus portrait section, which provides information about the history of creation and composition of the corpus, as well as statistics and frequency vocabulary. With the Subcorpus portrait section, one can analyze the features of selected texts and compare them with the rest of the corpus.

With the new tools, it is possible to compose more diverse assignments for students. Students can also use them for independent research, for example, for writing an essay. For those teachers and students who are ready to conduct more complex research, we have added new types of search results output (Statistics, Frequency, N-grams) and a new type of search – Collocation Search.

In the new version of the Word at a Glance function within the Main Corpus, the "families" of cognate words have been supplemented by the NeuroRNC neural network model. For example, for the word актер, all the cognates except for актриса and киноактриса are found out by NeuroRNC. Also, if NeuroRNC finds at least 5 words with the same root, we add a new family of cognate words to the Word at a Glance function, even if the word you are looking for was not in the morphemic parsing dictionary. A very impressive word portrait was obtained for the word эстет using only neural networks algorithms.

In order to help our users interpret the results of search queries, we have matched each tag of semantic markup with names in Russian and English. Now in the pop-up window in the search results and in the Word at a glance function you can see supernatural beings, substances and materials or positive evaluation instead of convenience tags such as  t:hum:supernat, t:stuff, or ev:posit.

Old-timers remember that in the Main Corpus of the RNC there has been an option to compare exact search queries plotting them on a single chart. Now the Corpus offers an extended functionality for comparing query results:

  • It is possible to compare search queries of different types, for example, the results of two lexico-grammatical queries. This is how we managed to find out when we started saying не более чем instead of не более как.
  • Different subcorpora can be customized for different queries, e.g. comparing different authors or text types.
  • All comparisons are made within a single corpus, and the functionality is now available in almost all corpora that use the new interface. Here is a historical accent study we managed to conduct in the Poetry corpus.

For access to  the new comparison functionality, we ask users to authorize with a login and password. This is necessary to be able to store a large number of query parameters and return to a previously saved comparison.

The Accentological corpus has been switched to the new interface. The "Word at a glance" function is available. The corpus appeared in the “Get overview” section.

The “Word at a glance” function for the Main corpus has been updated:
In the "Morphemic structure" widget, alternative morphemic parses for different parts of speech are put apart. For example, the word печь is parsed differently as a noun and as a verb. Alternative structures are made available by switching between parts of speech in the ”Word at a glance” function.

The Poetry corpus has been switched to a new interface, the "Word at a glance" service of the Poetry corpus is available. The user can view the search results in the mode with metric formulas, where each line of a poem will be provided with poetic annotation. When showing the extended context, the entire poem is viewable. A subcorpus from texts by multiple authors can be customized on the respective page, and conditions on other meta-attributes can be specified there.

The "Authors" section in "Corpus portrait" displays a list of all authors featured in the Corpus. The list can be sorted alphabetically, by birth and death dates, and by gender of the author. The alphabetical filter allows the user to view only authors whose last name begins with a given Cyrillic letter. By selecting one author in the list, one can create a subcorpus of only his/her texts.

The "Random Poem" widget has appeared in the "Get overview" section: a random poetic example is selected for any word or phrase.

The tables below the "Graph by year" within the Main Corpus now show the number of texts and the number of examples in the search results and in the corpus as a whole.

In the Social networks corpus, wrong dates have been corrected and non-unique texts removed. The corpus has now become an effective tool for studying the diachrony of linguistic phenomena. Chronology of language items becoming popular or going out of fashion is available (see хайп, превед, уметь во что-л.).
The corpus features a collection of social network texts prepared by the staff and students of Voronezh State University. It includes materials from the Big Voronezh Forum and other local networks in Voronezh, posts by well-known Voronezh bloggers, discussions in local groups on popular platforms VK, Telegram, Livejournal and others. In total, this collection counts about 22.8 million word uses. The texts of the Voronezh collection have more detailed metatextual annotation and cover a large timespan of 2001-2023. In the future, it is planned to include materials from social networks of other Russian regions into the corpus.

The Word at a glance service in the Main Corpus has been enriched with data on word families. The new widget now shows families of cognate words. For now, this option is only available for words with a single root (e.g. стол, but not пароход) that are manually annotated within the morphemic analysis dictionary. Data on other words will appear in future, but even now you can see interesting connections between words.

As it is a custom already, you see a "Rate" button next to the new widget. Feel free to let us know if you notice any bugs. Thanks to your feedback, we keep improving the neurolinguistic models underlying the Word at a glance service. It is very interesting and important to us what you think about the first version of the word family model.

It has become possible to specify more precisely the conditions of lexico-grammatical search in the Main, National media and Regional corpora. One may set conditions on the distance between words in the search form. Until now, if the specified range included 0 (for example, from -1 to 1), a single token in the results could match both words specified. Now, at the top of the search form, you can select the "word matches excluded" option to remove the zero distance from the range. For example, you can find plural animate nouns conjoined with крестьяне ‘peasants’. Here is the resulting frequency list. Previously, a similar query would also find the word крестьяне alone, without its "neighbors" (since at the zero distance it matches all the conditions for a conjoined noun).

Subscribe to our Telegram channel to follow our updates and receive illustrated corpus instructions.

There are several upgrades in the Word at a Glance service:

  • New sketches have been added: coordinated nouns, adjectives, verbs, and adverbs. 
  • In the Main corpus for all the sketches, navigation from sketches to concordances of the respective collocations is made available.
  • Thanks to feedback from keen users, the morphemic analysis has been updated. Please keep providing feedback if you notice any bugs by using the "Rate" button.

Handling of speaker information in the Spoken corpus is upgraded. Names of speakers and film characters are now highlighted in the search results. Detailed information on sociological parameters can be obtained in a pop-up window that opens by clicking on the name.

When downloading search results into an Excel file, an additional sheet called Info now displays data on the parameters of the respective query and also has a link to the query itself.

The Word at a glance service continues to evolve. A new widget appeared for nouns in the Main corpus, which shows the forms of the word that occur in the corpus more than 5 times. For the same noun form (case + number) different variants and/or spellings can be displayed if they occur in the corpus. Since the Main corpus has automatic annotation, you may find some forms incorrectly related to the word you are looking for. If you notice such discrepancies, please report them to us using the "Rate" button.

All of the corpora available in the new interface now have the navigation between the popup windows with word-by-word analysis and the "Word at a glance" service.

The Spoken corpus has migrated to the new interface. The "Word at a glance" and "Get overview" services are now available for it.