Morphology
The morphological standard of the RNC

The presentation of morphological information (part of speech, gender, case, aspect, etc.) in the Corpus is mainly based on the morphological model suggested by Zalizniak in the Grammatical dictionary of Russian (Moscow, 1977; 4th ed. Moscow, 2003). Nevertheless, the specifics of the Corpus as a universal language research tool require in some cases different solutions; every departure away from Zalizniak's model is motivated by such specific requirements.

The structure of morphological information

Morphological information assigned to a wordform consists of four fields, or groups of tags:

  1.  Lexeme (a dictionary form of the lexeme and the part of speech to which it belongs).
  2.  A variety of the lexeme's grammatical features, known as word-classifying features (for example, gender for nouns and transitivity for verbs).
  3.  A variety of the wordform's grammatical features, known as word-altering features (for example, case for nouns and number for verbs).
  4.  Information concerning non-standard forms of the word-form, orthographic variations, etc.

Morphological analysis (or a number of them), assigned to the lexeme as a part of the search result, is displayed as a tooltip when the mouse cursor is over the wordform. In the disambiguated corpus the full analysis is displayed, in the rest of the corpus the lexeme and the part of speech is displayed.

The metalanguage of the grammatical features is based on a set of tags, designed with a foreign audience in mind. It is also possible to search using traditional Russian names of grammatical categories.

The following is the inventory of grammatical tags used in the Corpus, with examples in brackets.

Parts of speech
S noun
A adjective
NUM numeral
ANUM numeral
V verb
ADV adverb
PRAEDIC predicative (жаль, хорошо, пора)
PARENTH parenthesis (кстати, по-моему)
SPRO pronoun (она, что)
APRO adjectival pronoun (который, твой)
ADVPRO adverbial pronoun (где, вот)
PRAEDICPRO predicative pronoun (некого, нечего)
PR preposition (под, напротив)
CONJ conjunction (и, чтобы)
PART particle (бы, же, пусть)
INTJ interjection (увы, батюшки)
Grammatical categories
Gender
m masculine (работник, стол)
f feminine (работница, табуретка)
m-f common (задира, пьяница)
n neuter (животное, озеро)
Animacy
anim animate (человек, ангел, утопленник)
inan inanimate (рука, облако, культура)
Number
sg singular (яблоко, гордость)
pl plural (яблоки, ножницы, детишки)
Case
nom nominative (голова, сын, степь, сани, который)
gen genitive (головы, сына, степи, саней, которого)
dat genitive (головы, сына, степи, саней, которого)
acc accusative (голову, сына, степь, сани, который/которого)
ins instrumental (головой, сыном, степью, санями, которым)
loc locative ([о] голове, сыне, степи, санях, котором)
gen2 second genitive (чашка чаю)
acc2 second accusative (постричься в монахи; по два человека)
loc2 second locative (в лесу, на оси́)
voc vocative (Господи, Серёж, ребят)
adnum “count form”, or adnumerative (два часа́, три шара́)
Short/Full form
brev short form (высок, нежна, прочны, рад)
plen full form (высокий, нежная, прочные, морской)
Degree
comp comparative (глубже)
comp2 prefix по + comparative (поглубже)
supr superlative (глубочайший)
Aspect
pf perfective (пошёл, встречу)
ipf imperfective (ходил, встречаю)
Transitivity
intr intransitive (ходить, вариться)
tran transitive (вести, варить)
Voice
act active (разрушил, разрушивший)
pass passive (adjectival participles only: разрушаемый, разрушенный)
med middle (verbs ending in -ся: разрушился)
Verb form
inf infinitive (украшать)
partcp participle (украшенный)
ger gerund (украшая)
Mood
indic indicative (украшаю, украшал, украшу)
imper imperative (украшай)
imper2 1st person plural imperative ending in -те (идемте)
Tense
praet past (украшали, украшавший, украсив)
praes present (украшаем, украшающий, украшая)
fut future (украсим)
Person
1p first person (украшаю)
2p second person (украшаешь)
3p third person (украшает)
Other features
persn first name (Иван, Дарья, Леопольд, Эстер, Гомер, Маугли)
patrn patronymic (Иванович, Павловна)
famn family name (Николаев, Волконская, Гумбольдт)
0 indeclinable (шоссе, Седых)

A number of these tags, namely second accusative, vocative, count form, prefix по- + comparative, common gender, transitivity, and indeclinability, are only available for the disambiguated corpus.

Multiple analyses

In certain cases the tagging will show multiple morphological analyses for one wordform. Such cases are:

  • Adjectives matching participles (открытый), where both the adjective lexeme (открытый) and the verb (открыть) are suggested.

  • In cases where an unambiguous choice of a lexeme or grammatical meaning is impossible in the context (не видел родного отца – gen/acc, манекену – anim/inan, спазмами – lexemes спазм/спазма).
Nonstandard forms

The disambiguated Corpus employs a number of tags to signal nonstandard or peculiar wordforms. The lack of such distinguishing features is marked with a tag 'normal'.

  • anom («Anomalous form») — various morphological anomalies, possible in the case of old or colloquial, non-literary forms (три дни instead of the norm три дня, ляжь instead of the norm ляг)

  • distort («Distorted form») — orthographic and/or phonetic distortion of a word, often used to show peculiarities of pronunciation (дэвушка, това'ищи, про-хо-ди, низнаю).

  • ciph («Numeral recording») — notation of a numeral, a numeral adjective or an adjective (fully or partly) with numbers (73, LXXIII, 73-й, 22-летний). In such cases wordforms are assigned to a count form lexeme; number and case are only displayed in cases where an ending is recorded (as in 14-му).

  • INIT («Initials») — notations of the type “capital letter and a dot” (M., P.). The initials are not expanded in the lexeme field; no grammatical features are given.

  • abbr («Abbreviation») — an abbreviated notation (тов., гг., ч.). In the lexeme field the abbreviation is expanded (except initials), a grammatical form is supplied according to the context. Acronyms such as ООН, вуз and shortened words like зав, зам, recorded without a dot and not expanded in reading, do not receive the abbr tag and are treated like normal words (declinable or indeclinable).

In addition, the non-disambiguated Corpus uses a special tag for non-dictionary forms (forms not included in the dictionary of the parser but derived by analogy). As the dictionary is updated the occurrence of these forms will decrease. To lower the amount of “noise” in searches in the non-disambiguated corpus it may be advisable to exclude these forms from the search; for some tasks, however, the search may be limited to such forms entirely.

Updated on 24.04.2024