Morphology

The morphological standard of the RNC

The presentation of morphological information (part of speech, gender, case, aspect, etc.) in the Corpus is mainly based on the morphological model suggested by Zalizniak in the Grammatical dictionary of Russian (Moscow, 1977; 4th ed. Moscow, 2003). Nevertheless, the specifics of the Corpus as a universal language research tool require in some cases different solutions; every departure away from Zalizniak's model is motivated by such specific requirements.

The structure of morphological information

Morphological information assigned to a wordform consists of four fields, or groups of tags:

Lexeme (a dictionary form of the lexeme and the part of speech to which it belongs).
A variety of the lexeme's grammatical features, known as word-classifying features (for example, gender for nouns and transitivity for verbs).
A variety of the wordform's grammatical features, known as word-altering features (for example, case for nouns and number for verbs).
Information concerning non-standard forms of the word-form, orthographic variations, etc.

Morphological analysis (or a number of them), assigned to the lexeme as a part of the search result, is displayed as a tooltip when the mouse cursor is over the wordform. In the disambiguated corpus the full analysis is displayed, in the rest of the corpus the lexeme and the part of speech is displayed.

The metalanguage of the grammatical features is based on a set of tags, designed with a foreign audience in mind. It is also possible to search using traditional Russian names of grammatical categories.

The following is the inventory of grammatical tags used in the Corpus, with examples in brackets.

Parts of speech

S	noun
A	adjective
NUM	numeral
ANUM	numeral
V	verb
ADV	adverb
PRAEDIC	predicative (жаль, хорошо, пора)
PARENTH	parenthesis (кстати, по-моему)
SPRO	pronoun (она, что)
APRO	adjectival pronoun (который, твой)
ADVPRO	adverbial pronoun (где, вот)
PRAEDICPRO	predicative pronoun (некого, нечего)
PR	preposition (под, напротив)
CONJ	conjunction (и, чтобы)
PART	particle (бы, же, пусть)
INTJ	interjection (увы, батюшки)

Grammatical categories

Gender

m	masculine (работник, стол)
f	feminine (работница, табуретка)
m-f	common (задира, пьяница)
n	neuter (животное, озеро)

Animacy

anim	animate (человек, ангел, утопленник)
inan	inanimate (рука, облако, культура)

Number

sg	singular (яблоко, гордость)
pl	plural (яблоки, ножницы, детишки)

Case

nom	nominative (голова, сын, степь, сани, который)
gen	genitive (головы, сына, степи, саней, которого)
dat	genitive (головы, сына, степи, саней, которого)
acc	accusative (голову, сына, степь, сани, который/которого)
ins	instrumental (головой, сыном, степью, санями, которым)
loc	locative ([о] голове, сыне, степи, санях, котором)
gen2	second genitive (чашка чаю)
acc2	second accusative (постричься в монахи; по два человека)
loc2	second locative (в лесу, на оси́)
voc	vocative (Господи, Серёж, ребят)
adnum	“count form”, or adnumerative (два часа́, три шара́)

Short/Full form

brev	short form (высок, нежна, прочны, рад)
plen	full form (высокий, нежная, прочные, морской)

Degree

comp	comparative (глубже)
comp2	prefix по + comparative (поглубже)
supr	superlative (глубочайший)

Aspect

pf	perfective (пошёл, встречу)
ipf	imperfective (ходил, встречаю)

Transitivity

intr	intransitive (ходить, вариться)
tran	transitive (вести, варить)

Voice

act	active (разрушил, разрушивший)
pass	passive (adjectival participles only: разрушаемый, разрушенный)
med	middle (verbs ending in -ся: разрушился)

Verb form

inf	infinitive (украшать)
partcp	participle (украшенный)
ger	gerund (украшая)

Mood

indic	indicative (украшаю, украшал, украшу)
imper	imperative (украшай)
imper2	1st person plural imperative ending in -те (идемте)

Tense

praet	past (украшали, украшавший, украсив)
praes	present (украшаем, украшающий, украшая)
fut	future (украсим)

Person

1p	first person (украшаю)
2p	second person (украшаешь)
3p	third person (украшает)

Other features

persn	first name (Иван, Дарья, Леопольд, Эстер, Гомер, Маугли)
patrn	patronymic (Иванович, Павловна)
famn	family name (Николаев, Волконская, Гумбольдт)
0	indeclinable (шоссе, Седых)

A number of these tags, namely second accusative, vocative, count form, prefix по- + comparative, common gender, transitivity, and indeclinability, are only available for the disambiguated corpus.

Multiple analyses

In certain cases the tagging will show multiple morphological analyses for one wordform. Such cases are:

Adjectives matching participles (открытый), where both the adjective lexeme (открытый) and the verb (открыть) are suggested.

In cases where an unambiguous choice of a lexeme or grammatical meaning is impossible in the context (не видел родного отца – gen/acc, манекену – anim/inan, спазмами – lexemes спазм/спазма).

Nonstandard forms

The disambiguated Corpus employs a number of tags to signal nonstandard or peculiar wordforms. The lack of such distinguishing features is marked with a tag 'normal'.

anom («Anomalous form») — various morphological anomalies, possible in the case of old or colloquial, non-literary forms (три дни instead of the norm три дня, ляжь instead of the norm ляг)

distort («Distorted form») — orthographic and/or phonetic distortion of a word, often used to show peculiarities of pronunciation (дэвушка, това'ищи, про-хо-ди, низнаю).

ciph («Numeral recording») — notation of a numeral, a numeral adjective or an adjective (fully or partly) with numbers (73, LXXIII, 73-й, 22-летний). In such cases wordforms are assigned to a count form lexeme; number and case are only displayed in cases where an ending is recorded (as in 14-му).

INIT («Initials») — notations of the type “capital letter and a dot” (M., P.). The initials are not expanded in the lexeme field; no grammatical features are given.

abbr («Abbreviation») — an abbreviated notation (тов., гг., ч.). In the lexeme field the abbreviation is expanded (except initials), a grammatical form is supplied according to the context. Acronyms such as ООН, вуз and shortened words like зав, зам, recorded without a dot and not expanded in reading, do not receive the abbr tag and are treated like normal words (declinable or indeclinable).

In addition, the non-disambiguated Corpus uses a special tag for non-dictionary forms (forms not included in the dictionary of the parser but derived by analogy). As the dictionary is updated the occurrence of these forms will decrease. To lower the amount of “noise” in searches in the non-disambiguated corpus it may be advisable to exclude these forms from the search; for some tasks, however, the search may be limited to such forms entirely.

Updated on 24.04.2024