News

15.09.2023

The Accentological corpus has been switched to the new interface. The "Word at a glance" function is available. The corpus appeared in the “Get overview” section.

The “Word at a glance” function for the Main corpus has been updated:
In the "Morphemic structure" widget, alternative morphemic parses for different parts of speech are put apart. For example, the word печь is parsed differently as a noun and as a verb. Alternative structures are made available by switching between parts of speech in the ”Word at a glance” function.

25.08.2023

The Poetry corpus has been switched to a new interface, the "Word at a glance" service of the Poetry corpus is available. The user can view the search results in the mode with metric formulas, where each line of a poem will be provided with poetic annotation. When showing the extended context, the entire poem is viewable. A subcorpus from texts by multiple authors can be customized on the respective page, and conditions on other meta-attributes can be specified there.

The "Authors" section in "Corpus portrait" displays a list of all authors featured in the Corpus. The list can be sorted alphabetically, by birth and death dates, and by gender of the author. The alphabetical filter allows the user to view only authors whose last name begins with a given Cyrillic letter. By selecting one author in the list, one can create a subcorpus of only his/her texts.

The "Random Poem" widget has appeared in the "Get overview" section: a random poetic example is selected for any word or phrase.

The tables below the "Graph by year" within the Main Corpus now show the number of texts and the number of examples in the search results and in the corpus as a whole.

21.07.2023

In the Social networks corpus, wrong dates have been corrected and non-unique texts removed. The corpus has now become an effective tool for studying the diachrony of linguistic phenomena. Chronology of language items becoming popular or going out of fashion is available (see хайп, превед, уметь во что-л.).
The corpus features a collection of social network texts prepared by the staff and students of Voronezh State University. It includes materials from the Big Voronezh Forum and other local networks in Voronezh, posts by well-known Voronezh bloggers, discussions in local groups on popular platforms VK, Telegram, Livejournal and others. In total, this collection counts about 22.8 million word uses. The texts of the Voronezh collection have more detailed metatextual annotation and cover a large timespan of 2001-2023. In the future, it is planned to include materials from social networks of other Russian regions into the corpus.

21.07.2023

The Word at a glance service in the Main Corpus has been enriched with data on word families. The new widget now shows families of cognate words. For now, this option is only available for words with a single root (e.g. стол, but not пароход) that are manually annotated within the morphemic analysis dictionary. Data on other words will appear in future, but even now you can see interesting connections between words.

As it is a custom already, you see a "Rate" button next to the new widget. Feel free to let us know if you notice any bugs. Thanks to your feedback, we keep improving the neurolinguistic models underlying the Word at a glance service. It is very interesting and important to us what you think about the first version of the word family model.

It has become possible to specify more precisely the conditions of lexico-grammatical search in the Main, National media and Regional corpora. One may set conditions on the distance between words in the search form. Until now, if the specified range included 0 (for example, from -1 to 1), a single token in the results could match both words specified. Now, at the top of the search form, you can select the "word matches excluded" option to remove the zero distance from the range. For example, you can find plural animate nouns conjoined with крестьяне ‘peasants’. Here is the resulting frequency list. Previously, a similar query would also find the word крестьяне alone, without its "neighbors" (since at the zero distance it matches all the conditions for a conjoined noun).

Subscribe to our Telegram channel to follow our updates and receive illustrated corpus instructions.