The morphological standard of the RNC
The presentation of morphological information (part of speech, gender, case, aspect, etc.) in the Corpus is mainly based on the morphological model suggested by Zalizniak in the Grammatical dictionary of Russian (Moscow, 1977; 4th ed. Moscow, 2003). Nevertheless, the specifics of the Corpus as a universal language research tool require in some cases different solutions; every departure away from Zalizniak's model is motivated by such specific requirements.
The structure of morphological information
Morphological information assigned to a wordform consists of four fields, or groups of tags:
- Lexeme (a dictionary form of the lexeme and the part of speech to which it belongs).
- A variety of the lexeme's grammatical features, known as word-classifying features (for example, gender for nouns and transitivity for verbs).
- A variety of the wordform's grammatical features, known as word-altering features (for example, case for nouns and number for verbs).
- Information concerning non-standard forms of the word-form, orthographic variations, etc.
Morphological analysis (or a number of them), assigned to the lexeme as a part of the search result, is displayed as a tooltip when the mouse cursor is over the wordform. In the disambiguated corpus the full analysis is displayed, in the rest of the corpus the lexeme and the part of speech is displayed.
The metalanguage of the grammatical features is based on a set of tags, designed with a foreign audience in mind. It is also possible to search using traditional Russian names of grammatical categories.
The following is the inventory of grammatical tags used in the Corpus, with examples in brackets.
Parts of speech
S |
noun |
A |
adjective |
NUM |
numeral |
ANUM |
numeral |
V |
verb |
ADV |
adverb |
PRAEDIC |
predicative (жаль, хорошо, пора) |
PARENTH |
parenthesis (кстати, по-моему) |
SPRO |
pronoun (она, что) |
APRO |
adjectival pronoun (который, твой) |
ADVPRO |
adverbial pronoun (где, вот) |
PRAEDICPRO |
predicative pronoun (некого, нечего) |
PR |
preposition (под, напротив) |
CONJ |
conjunction (и, чтобы) |
PART |
particle (бы, же, пусть) |
INTJ |
interjection (увы, батюшки) |
Grammatical categories
Gender
m |
masculine (работник, стол) |
f |
feminine (работница, табуретка) |
m-f |
common (задира, пьяница) |
n |
neuter (животное, озеро) |
Animacy
anim |
animate (человек, ангел, утопленник) |
inan |
inanimate (рука, облако, культура) |
Number
sg |
singular (яблоко, гордость) |
pl |
plural (яблоки, ножницы, детишки) |
Case
nom |
nominative (голова, сын, степь, сани, который) |
gen |
genitive (головы, сына, степи, саней, которого) |
dat |
genitive (головы, сына, степи, саней, которого) |
acc |
accusative (голову, сына, степь, сани, который/которого) |
ins |
instrumental (головой, сыном, степью, санями, которым) |
loc |
locative ([о] голове, сыне, степи, санях, котором) |
gen2 |
second genitive (чашка чаю) |
acc2 |
second accusative (постричься в монахи; по два человека) |
loc2 |
second locative (в лесу, на оси́) |
voc |
vocative (Господи, Серёж, ребят) |
adnum |
“count form”, or adnumerative (два часа́, три шара́) |
Short/Full form
brev |
short form (высок, нежна, прочны, рад) |
plen |
full form (высокий, нежная, прочные, морской) |
Degree
comp |
comparative (глубже) |
comp2 |
prefix по + comparative (поглубже) |
supr |
superlative (глубочайший) |
Aspect
pf |
perfective (пошёл, встречу) |
ipf |
imperfective (ходил, встречаю) |
Transitivity
intr |
intransitive (ходить, вариться) |
tran |
transitive (вести, варить) |
Voice
act |
active (разрушил, разрушивший) |
pass |
passive (adjectival participles only: разрушаемый, разрушенный) |
med |
middle (verbs ending in -ся: разрушился) |
Verb form
inf |
infinitive (украшать) |
partcp |
participle (украшенный) |
ger |
gerund (украшая) |
Mood
indic |
indicative (украшаю, украшал, украшу) |
imper |
imperative (украшай) |
imper2 |
1st person plural imperative ending in -те (идемте) |
Tense
praet |
past (украшали, украшавший, украсив) |
praes |
present (украшаем, украшающий, украшая) |
fut |
future (украсим) |
Person
1p |
first person (украшаю) |
2p |
second person (украшаешь) |
3p |
third person (украшает) |
Other features
persn |
first name (Иван, Дарья, Леопольд, Эстер, Гомер, Маугли) |
patrn |
patronymic (Иванович, Павловна) |
famn |
family name (Николаев, Волконская, Гумбольдт) |
0 |
indeclinable (шоссе, Седых) |
A number of these tags, namely second accusative, vocative, count form, prefix по- + comparative, common gender, transitivity, and indeclinability, are only available for the disambiguated corpus.
Multiple analyses
In certain cases the tagging will show multiple morphological analyses for one wordform. Such cases are:
- Adjectives matching participles (открытый), where both the adjective lexeme (открытый) and the verb (открыть) are suggested.
- In cases where an unambiguous choice of a lexeme or grammatical meaning is impossible in the context (не видел родного отца – gen/acc, манекену – anim/inan, спазмами – lexemes спазм/спазма).
Nonstandard forms
The disambiguated Corpus employs a number of tags to signal nonstandard or peculiar wordforms. The lack of such distinguishing features is marked with a tag 'normal'.
- anom («Anomalous form») — various morphological anomalies, possible in the case of old or colloquial, non-literary forms (три дни instead of the norm три дня, ляжь instead of the norm ляг)
- distort («Distorted form») — orthographic and/or phonetic distortion of a word, often used to show peculiarities of pronunciation (дэвушка, това'ищи, про-хо-ди, низнаю).
- ciph («Numeral recording») — notation of a numeral, a numeral adjective or an adjective (fully or partly) with numbers (73, LXXIII, 73-й, 22-летний). In such cases wordforms are assigned to a count form lexeme; number and case are only displayed in cases where an ending is recorded (as in 14-му).
- INIT («Initials») — notations of the type “capital letter and a dot” (M., P.). The initials are not expanded in the lexeme field; no grammatical features are given.
- abbr («Abbreviation») — an abbreviated notation (тов., гг., ч.). In the lexeme field the abbreviation is expanded (except initials), a grammatical form is supplied according to the context. Acronyms such as ООН, вуз and shortened words like зав, зам, recorded without a dot and not expanded in reading, do not receive the abbr tag and are treated like normal words (declinable or indeclinable).
In addition, the non-disambiguated Corpus uses a special tag for non-dictionary forms (forms not included in the dictionary of the parser but derived by analogy). As the dictionary is updated the occurrence of these forms will decrease. To lower the amount of “noise” in searches in the non-disambiguated corpus it may be advisable to exclude these forms from the search; for some tasks, however, the search may be limited to such forms entirely.