Word at a glance

Word at a glance is a word portrait that shows the word's behaviour in a given corpus. Here you can see its grammatical and semantic features, similar words, its typical combinations with other words in sentences, usage examples from the corpus and the distribution of examples by years and by texts properties.

To see Word at a glance for the Main corpus, click the corresponding banner at the RNC main page.

You can switch to Word at a glance from the pages of the other corpora through the Corpus selection menu.

This feature is only available for the corpora in the new RNC interface.

How to search

To show a word's portait in the Word at a glance mode, specify:

  • its lemma (its basic word form). You can use the suggest tool or the virtual keyboard to type in the lemma. If you specify a word form that does not match any lemma, information about grammar, semantics, and similar words will not be displayed.
  • its part of speech. If you do not select any part of speech, information about all parts of speech for this word that occur more than 5 times in the corpus will be shown. 

Click Show the portrait to see the versatile information about the word.

If the word you seached for can belong to more than one part of speech, you will be able to switch between these parts of speech to see different portraits.

The portrait of the word is built based on the full corpus without taking into account the user's subcorpus.However, when switching from the portrait of the word to the search results, the subcorpus selection will be applied, so the examples of the word's usage that are given in the portrait may not coincide with the first examples in the search results.

Word sketches

The Sketches widget helps one understand how the query word interacts with the other Russian words. It is shown through its collocations which include words of different parts of speech (taking into account the syntactic relations), covering the bulk of its uses. For each part of speech, the most representative set of syntactic relations is different, namely:

For nouns:

  • adjectives defining the noun
  • verbs for which the noun is the subject
  • verbs for which the noun is the direct object
  • verbs for which the noun is an indirect object without a preposition
  • verbs for which the noun is an indirect object with a preposition
  • coordinated nouns

For verbs:

  • nouns acting as subjects
  • nouns acting as direct objects
  • nouns acting as indirect objects without a preposition
  • nouns acting as indirect objects with a preposition
  • coordinated verbs

For adjectives:

  • nouns defined by the adjective
  • adverbial modifiers
  • coordinated adjectives

For adverbs:

  • verbs modified by the adverb
  • adjectives defining the adverb
  • coordinated adverbs

The widget shows up to 10 collocates for each sketch, using the logDice metric for ranking. Accordingly, the list of colocates may be empty if the search for collocations of a noun, adjective, verb or adverb with a given syntactic relation has not yielded results. For proper names, toponyms, abbreviations and words that have non-standard spellings or are rarely found in the corpus, Sketches are not displayed. Sketches are not displayed for other parts of speech. Click on the word in the table to view sketch examples in the corpus.

To see more sketches, you can use the scrollbar or slider on mobile devices.

To go to Collocation Search, you can click Show all collocations button.

Currently the sketches are only available in the Main and Regional corpora. 

About the word

The About the word widget shows the grammatical and semantic features of the word.

For nouns, adjectives, verbs and adverbs, you can get the most complete grammatical information.

The semantic features of homonyms are shown in separate lines.

Word frequency

The widget shows a word frequency scale consisting of six ranges.

For the searched word the frequency (IPM) is defined as the ratio of the number of occurrences of all word forms divided by the corpus size and multiplied by one million. Depending on the resulting IPM value, the word will fall into one of the following ranges:

>10000 high frequency, the word is very common
1000..10000 quite high frequency, the word is common
100...1000 rather high frequency
10...100 rather low frequency
1...10 rather low frequency, the word is rare
<1 low frequency, the word is very rare

By hovering the mouse over the widget, you can see the numerical value of the IPM.

The frequency information is available if there are examples of word usage and in disambiguated corpora only.

Morphemic structure β

The widget shows the morphemic structure of a word: prefixes, roots, suffixes and endings are highlighted using the geometrical signs adopted in the school Russian language teaching.  

The word structure annotation is based on the morphemic dictionary specially developed for the corpus, which provides analyses for 75,000 lemmas (310,000 non-unique morphemes) as of May 2023. 

Automatical annotation is added for the lemmas that are absent in the morphemic dictionary, including some fairly frequent lexical items. For example, the word гарантировать is not included into the morphemic dictionary, so its structure (гарант-ирова-ть) is predicted by the algorithm. Such analyses are tagged by a special attribute "generated by NeuroRNC".

Errors in automatic annotation may always occur. Please note that the morphemic structuring of words may differ from what you are accustomed to (see "Annotation principles"). 

Word family β

The widget shows the same root words of a query word.

The root is displayed on the left, the query word and ou to 10 most common words with the same root are displayed on the right.

The roots are highlighted thanks to the word structure annotation, made with the help of a morphemic dictionary specially developed for the corpus with automatical annotation added. This morphemic structuring of words may differ from what you are accustomed to.

In the current version of the widget, words with the same root are shown

  • for words annotated manually, with the help of the morphemic dictionary. However, the "nests" of same root words contain not only words from the dictionary, but are also extended with the help of NeuroRNC. For example, for the word актер, all the words of the same root, except актриса and киноактриса, are selected by NeuroRNC.
  • for words that are not in the dictionary, if the NeuroRNC has found at least 5 words with the same root. For example, эстет
  • for words with multiple roots, same root words for one of the roots are shown
  • only in the Word at a glace in the Main corpus.

When hovering the mouse over a word you can see the IPM (the number of occurrences of the word form divided by the volume of the corpus and multiplied by a million). At the same time, if a  word in a word family belongs to different parts of speech, the total IPM will be shown.

Click on any word in the widget to see the Word at a glance section.

Errors in automatic annotation may always occur. If you find any discrepancy with "Annotation principles", please report errors using the "Rate" button

Word forms in the corpus

The widget shows word forms that occur in the corpus more than 5 times. Different spellings can be displayed for the same word form if they occur in the corpus.

The cell color depends on the frequency of the word form: the more occurences of it found in the corpus, the richer the color. Hovering the mouse over each word form you can see its IPM (the number of occurrences of the word form divided by the size of the corpus and multiplied by one million). If the word form does not occur in the corpus, a dash is displayed in the cell.

Click on the word form to view its usage examples in the corpus.

Information about the word forms is currently available only for nouns in the main corpus. Errors are possible in the annotation of lemmas and forms.

Similar words

The Similar words widget displays the closest semantic associates of the word. The proximity coefficient of words, which can be seen by hovering the mouse over a word in the Word cloud, is calculated using distributive semantics models based on the actual materials of the main corpus of the RNC. The closer the coefficient value is to 1, the larger the word in the Word cloud is, and the more similar the contexts with this word should be to the contexts with the keyword.

The current version of Similar words works only in the main, regional and middle russian corpora and only shows semantic associates of the same part of speech for nouns, verbs, adjectives and adverbs. For proper names, toponyms, abbreviations and words that have non-standard spellings or are rarely found in the corpus, similar words are not displayed.

The widget is marked with a special sign "Generated by NeuroRNC". It means, the selection of associates is completely automatical and errors may occur in the lists, for example, incorrectly formed words or word associations that are not intuitively clear.

Usage examples

The widget contains five usage examples of the word in the corpus. To select the examples, the lexical and grammatical search by lemma and part of speech is used. Display settings:

  • random sorting
  • no more than one example from each document
  • user's subcorpus is not taken into account

By clicking Show more examples, you can switch to the full search results (Concordance mode).

The word portrait is built based on the full corpus without taking into account the user's subcorpus. However, if you had specified the subcorpus earlier, by clicking the Show all examples button, it will be included. In this case, the word usage examples given in the portrait may not match the first examples in the search results.

Word at a glance feature (word portraits) is available in all corpora in the new interface. In some corpora, some of the widgets are not available.

Distribution of texts

The pie chart shows in which types of corpus texts the query word occurs. You can select a meta-attribute for which to build a chart from the list of the most representative attributes of the corpus, as well as the unit of size measurement: texts or words. When switching the meta-attribute and/or unit of measurement, the chart is redrawn.

The chart shows the distribution of the top 10 meta-attribute values. The remaining values are merged into the Other category. To the right of the chart is the list of values and its percentages. When you hover the mouse over a sector of the chart, you can see the name of the value and the corresponding number of words or texts that include the query lemma.

Text distribution visualization is not yet available in all corpora.

Distribution by year

In several corpora, a word usage frequency by year graph (ipm, frequency per million word forms) is available.

You can use a ready-made graph that includes examples of the word usage for all years, or refine the displayed results by changing the time period.

Smoothing the graph allows you to see the overall trend behind random frequency fluctuations. For example, smoothing at 10 years averages the word frequency over the preceding and subsequent 5 years. To get accurate data for each year, you can set smoothing to 0.

By moving the mouse to any point on the line, you can see the relative usage frequency (ipm) for a particular year. The ipm value is defined as the number of occurrences of a word in a year divided by the size of the corpus in that year and multiplied by one million.

Updated on