|
ðóññêàÿ âåðñèÿ
Semantics
Representation of lexical and semantic information
Currently, the Corpus facilitates searches for lexical and semantic characteristics of words as the texts are semantically tagged.
Most words in a text are tagged with a number of semantic and derivational parameters such as “person”, “substance”, “space”, “movement”, “diminutive”, “verbal noun”, etc. It is possible for a word to be assigned characteristics along several different parameters. The texts are tagged by the Semmarkup program (by A. E. Poliakov) which uses the Semantic dictionary of the Corpus. Semantic homonymy is not disambiguated because such a process would have to be done by hand and would be extremely time-consuming; homonyms are assigned multiple semantic analyses. The semantic tagging is based on the classification system which was developed for the database Lexicograph beginning from 1992 under the leadership of E. V. Paducheva and E. V. Rakhilina at the Department of Linguistic Research at the All-Russian Institute of Scientific and Technical Information of the Russian Academy of Sciences. Since then the dictionary was essentially expanded, several new semantic classes and the derivational parameters were added for the needs of the Corpus.
The Semantic dictionary is based on the morphological dictionary of the DIALING system (120 thousand words) which in its turn is an expansion of the Zalizniak’s Grammatical dictionary of Russian .
The structure of semantic and lexical information
There are three groups of tags assigned to words to reflect lexical and semantic information:
- Class (a name, a reflexive pronoun, etc.)
- Lexical and semantic features (a lexeme's thematic class, indications of causality or assessment, etc.)
- Derivational features (a diminutive, an adjectival adverb, etc.)
The set of semantic and lexical parameters is different for different parts of speech. Moreover, nouns are divided into three subclasses (concrete nouns, abstract nouns, and proper names), each with its own hierarchy of tags.
Lexical and semantic tags are grouped as follows:
- Taxonomy (a lexeme's thematic class) – for nouns, verbs, adjectives and adverbs.
- Mereology (“part – whole” and “element – aggregate” relationships) – for concrete and abstract nouns
- Topology – for concrete names
- Causation – for verbs
- Auxiliary status – for verbs
- Evaluation – for abstract and concrete nouns, adjectives and adverbs
A word in the semantic dictionary is assigned a set of characteristics along the following parameters:
• taxonomic class; for example: ‘persons’, ‘spaces’, ‘texts’ (for nouns); ‘motion’, ‘location’, ‘emotion’ (for verbs); ‘speed’, ‘duration’, ‘place’ (for adjectives and adverbs);
• mereological class (for nouns): ‘parts’, ‘sets’ etc.;
• topological class (for nouns): ‘containers’, ‘horizontal surfaces’, etc.;
• causative / non-causative (for verbs);
• positive and negative evaluation (for all parts of speech);
• derivational features.
- Morpho-semantic features, for example: diminutive, caritive, semelfactive, etc.;
- Class of the motivating word, for example: verbal noun, adjectival adverb;
- Taxonomic type of the motivating word, for example: adverb derived from adjective of size;
- Morphological type of derivation (substantivization, compound word, etc.)
The meta-language of tags is based on English notation; it is, however, possible to make a search using traditional Russian category names in the “semantic features” form. The following is an inventory of all currently available tags with examples in parenthesis.
Nouns (S)
Categories
r:concr — concrete nouns (äåâî÷êà, ñòîë, ìîëîêî)
r:abstr — abstract nouns (âîæäåíèå, ÿðêîñòü, âðåìÿ)
r:propn — proper names (Èâàí, Ýéíøòåéí, Ïåòðîãðàä)
Concrete nouns
Taxonomy
t:hum — person (÷åëîâåê, ó÷èòåëü)
t:hum:etn — ethnonyms (ýôèîï, èòàëüÿíêà)
t:hum:kin — kinship terms (áðàò, áàáóøêà)
t:hum:supernat — supernatural creatures (ðóñàëêà, èíîïëàíåòÿíèí)
t:animal — animals (êîðîâà, æèðàô, ñîðîêà, ÿùåðèöà, ìóðàâåé)
t:plant — plants (áåðåçà, ðîçà, òðàâà)
t:stuff — substances and materials (âîäà, ïåñîê, òåñòî, æåñòü, øåëê)
t:space — space and places (êîñìîñ, ãîðîä, òàéãà, îâðàã, âõîä)
t:constr — buildings and constructions (äîì, øàëàø, ìîñò)
t:tool — tools and appliances (ìîëîòîê, ïàëêà, ïóãîâèöà, ìàøèíà)
t:tool:instr — tools (øòîïîð, èãëà, êàðàíäàø)
t:tool:device — machinery and devices (òåëåôîí, ñåÿëêà, ãðàäóñíèê)
t:tool:transp — vehicles (àâòîáóñ, ïîåçä, ñàíè)
t:tool:weapon — weapons (ñàáëÿ, ïèñòîëåò, ãàóáèöà)
t:tool:mus — musical instruments (ðîÿëü, ñêðèïêà, êîëîêîë)
t:tool:furn — furniture (ñòîë, äèâàí, øêàô)
t:tool:dish — kitchen utensils (÷àøêà, êàñòðþëÿ, ôëÿæêà)
t:tool:cloth — clothes and footwear (ïëàòüå, øëÿïà, áîòèíêè)
t:food — food and drinks (ïèðîã, êàøà, ìîëîêî)
t:text — texts (ðàññêàç, êíèãà, àôèøà)
Mereology:
pt:part — parts (âåðõóøêà, êîí÷èê, ïîëîâèíà)
pt:partb& pc:hum — human body parts and organs (ãîëîâà, ñåðäöå, íîãîòü)
pt:partb& pc:animal — animal body parts and organs (õâîñò, æàëî)
pt:part& pc:plant — parts of plants (ëèñò, âåòêà, êîðåíü)
pt:part& pc:constr — parts of buildings and constructions (êîìíàòà, äâåðü, àðêà)
pt:part& pc:tool — parts of tools (äåòàëü, ëîïàñòü, êðûøêà)
pt:part& pc:tool:instr — parts of instruments (òîïîðèùå, ëåçâèå)
pt:part& pc:tool:device — parts of machinery and devices (äèñïëåé, êîðïóñ, êíîïêà)
pt:part& pc:tool:transp — parts of vehicles (ðóëü, êîëåñî, êàïîò)
pt:part& pc:tool:weapon — parts of weapons (äóëî, êóðîê, ýôåñ)
pt:part& pc:tool:mus — parts of musical instruments (ñòðóíà, ãðèô)
pt:part& pc:tool:furn — parts of furniture (ñèäåíüå, ïîäëîêîòíèê)
pt:part& pc:tool:dish — parts of kitchen utensils (íîñèê, ãîðëûøêî)
pt:part& pc:tool:cloth — parts of clothes and footwear (ðóêàâ, êàáëóê)
pt:qtm — quanta and portions (êàïëÿ, êîìîê, ïîðöèÿ)
pt:set/ pt:aggr — sets and aggregates (íàáîð, áóêåò, ìåáåëü, ÷åëîâå÷åñòâî)
hi:class — classes (æèâîòíîå, ÿãîäà, èíñòðóìåíò)
Topology
top:contain — containers (êîøåëåê, êîìíàòà, îçåðî, íèøà)
top:horiz — horizontal surfaces (ïîë, ïëîùàäêà)
Evaluation:
ev — evaluation (neither positive nor negative) (îçîðíèê, ìàõèíà)
ev:posit — positive evaluation (óìíèöà, ñâåòèëî)
ev:neg — negative evaluation (íåãîäÿé, âåðòèõâîñòêà)
Derivational tags
d:dim — diminutives (çàé÷èê, êîðîáî÷êà)
d:aug — augmentatives (äåòèíà, äîìèùå)
d:sing — singulatives (ïûëèíêà, èçþìèíêà)
d:nag — nomina agentis (ïèñàòåëü, ñîçäàòåëü, äîêëàä÷èê)
d:fem — nomina feminina (íåìêà, ãåíåðàëüøà, äîÿðêà)
Abstract nouns
Taxonomy
t:move — movement (áåãîòíÿ, âûíîñ, êà÷êà)
t:move:body — body movement (ïîêëîí)
t:put — placement of objects (ðàçìåùåíèå, ðàññòàíîâêà, ïîãðóçêà, íàìîòêà)
t:impact — physical impact (óäàð, âòèðàíèå, îáìîëîò)
t:impact:creat — creation of physical objects (ëåïêà, îòëèâêà, ïëåòåíèå, ñîîðóæåíèå, ñòðîèòåëüñòâî)
t:impact:destr — destruction (ñëîì, ñîææåíèå)
t:changest — change of state or features (óêðåïëåíèå, çàòâåðäåíèå, îñóøåíèå, êîíäåíñàöèÿ, îñëîæíåíèå)
t:be — sphere of being
t:be:exist — existence (æèçíü, íàëè÷èå, áûòèå)
t:be:appear — start of existence (âîçíèêíîâåíèå, ðîæäåíèå, ôîðìèðîâàíèå, ó÷ðåæäåíèå, òâîðåíèå)
t:be:disapp — end of existence (ñìåðòü, êàçíü, ëèêâèäàöèÿ)
t:loc — location (ìåñòîïîëîæåíèå)
t:loc:body — body location (ëåæàíèå)
t:contact — contact and support (ïðèêîñíîâåíèå, îáúÿòèå)
t:poss — sphere of possession (îáëàäàíèå, ïðèîáðåòåíèå, ïîêóïêà, ïîòåðÿ, ëèøåíèå)
t:ment — mental sphere (çíàíèå, àáñòðàêöèÿ, âîîáðàæåíèå, âîñïîìèíàíèå, äîãàäêà)
t:perc — perception (îñÿçàíèå, ñëóõ, âèäèìîñòü, âçãëÿä, çðåëèùå)
t:psych — psychological states (àïàòèÿ, áåçóìèå, âäîõíîâåíèå, ñïîêîéñòâèå)
t:psych:emot — emotions (âîñòîðã, ðàñêàÿíèå, ïå÷àëü)
t:psych:volit — volition (íàìåðåíèå, ðåøåíèå)
t:speech — speech (äèñêóññèÿ, ìîëâà, àõèíåÿ, ðåïëèêà, ïîäêîâûðêà)
t:physiol — physiology (æàæäà, êðîâîèçëèÿíèå, ñóäîðîãà, óòîìëåíèå, èêîòà)
t:weather — natural phenomena (çàðíèöà, âüþãà, çíîé)
t:sound — sounds (øóì, ïåðåçâîí, õëîïîê, àïëîäèñìåíòû, äèññîíàíñ)
t:color — colours (îêðàñêà, êîëîðèò, æåëòèçíà, ïðîçåëåíü)
t:light — light (ëó÷, ïîëóìðàê, ñâåòëûíü, èëëþìèíàöèÿ)
t:taste — taste (âêóñíîòà, ãîð÷èíêà, êèñëÿòèíà)
t:smell — smells (àðîìàò, ïåðåãàð)
t:temper — temperature (ïðîõëàäà, ñòóæà, íàãðåâ)
t:time — time (âåñíà, ãîäîâùèíà, ìèíóòà, ñîâðåìåííîñòü)
t:time:period — period of time (ìåæñåçîíüå, ïóòèíà, ñåíîêîñ, ñòàæ)
t:time:moment — moment of time (ìèã, ìãíîâåíèå)
t:time:week — day of week (ïîíåäåëüíèê)
t:time:month — month (ÿíâàðü)
t:time:age — age (äåòñòâî, ìîëîäîñòü, äâàäöàòèëåòèå)
t:humq — human qualities (ïîðÿäî÷íîñòü, áåçâîëèå, îñòðîóìèå)
t:behav — human behaviour (ðàçãèëüäÿéñòâî, ïîäõàëèìàæ, íåïîâèíîâåíèå, ðåáÿ÷åñòâî, ïðåäàòåëüñòâî)
t:inter — interaction and interrelation (âçàèìîïîìîùü, âðàæäà, ñõâàòêà, äðàêà)
t:action — social events (àóêöèîí, âåðíèñàæ, âå÷åðèíêà, âûáîðû, èìåíèíû, çàñåäàíèå, êóëüòïîõîä)
t:disease — diseases (àíãèíà, äèàáåò)
t:game — games (æìóðêè, ïîêåð, äîìèíî, âîëåéáîë)
t:sport — sport (ñïàðòàêèàäà, àêðîáàòèêà, áàñêåòáîë)
t:param — parameters (âûñîòà, ãðóçîïîäúåìíîñòü)
t:unit — units of measurement (áàëë, êèëîãðàìì, ìåòð, ìèíóòà)
Mereology
pt:part — part (íà÷àëî, ôèíàë)
pt:qtm — quantum (îáîðîò, ïðûæîê, êèâîê)
pt:set — set (ñèñòåìà, âûáîðêà, àëãîðèòì)
Evaluation
ev — evaluation (îçîðíèê, ìàõèíà)
ev:posit — positive evaluation (áëàãîóõàíèå, çàãëÿäåíüå, èçþìèíêà)
ev:neg — negative evaluation (áåçâêóñèöà, àõèíåÿ)
Derivational tags
der:v — verbal nouns (âûáîð, äåìîíñòðàöèÿ)
der:a — adjectival nouns (êðàñíîòà, æàäíîñòü)
Proper names
Taxonomy
t:hum| t:hum:supernat — people (Ëþäìèëà, ×åðíîìîð)
t:persn — personal names (Àëåêñàíäð)
t:patrn — patronymics (Ñåðãååâè÷)
t:famn — surnames (Ïóøêèí)
t:topon — toponyms (Åâðîïà, Âîëãà, Ýëüáðóñ, Ìîñêâà, Ïðåîáðàæåíêà)
Derivational tags
d:dim — diminutives (Ñàøà, Æåíå÷êà, Íèêîëàè÷)
Adjectives (A)
Categories
r:qual — qualitative (õîðîøèé, áîëüøîé)
r:rel — relative (äåðåâÿííûé, ëóííûé)
r:poss — possessive (áîæèé, îòöîâ, ìóæíèí)
r:invar — non-inflectable (áåæ, äæåðñè)
Semantic tags
t:size — size (âûñîêèé, êîðîòêèé)
t:size:max — large size (âûñîêèé, äëèííûé)
t:size:min — small size (íèçêèé, êîðîòêèé)
t:size:abs — absolute size (äâóõýòàæíûé)
t:dist — distance (äàëåêèé, ñîñåäíèé)
t:dist:max — long distance (äàëüíèé, îòäàëåííûé)
t:dist:min — short distance (áëèçêèé, íåäàëåêèé)
t:quant — quantity (áîëüøîé, äîñòàòî÷íûé, òðåõêðàòíûé)
t:quant:max — large quantity (îáèëüíûé, ìíîãî÷èñëåííûé)
t:quant:min — small quantity (íè÷òîæíûé, ìàëî÷èñëåííûé)
t:quant:abs — absolute quantity (äâóõòûñÿ÷íûé, âîñüìèìèëèîííûé)
t:place — place (ëåâûé, ïðèäîðîæíûé, òåìåííîé)
t:dir — direction (îáðàòíûé, ïîäâåòðåííûé)
t:time — time (ïðîøëûé, íî÷íîé)
t:time:dur — duration (äîëãèé, êðàòêèé)
t:time:dur:max — long duration (äîëãèé, ïðîäîëæèòåëüíûé)
t:time:dur:min — short duration (êðàòêèé, êðàòêîâðåìåííûé)
t:time:dur:abs — absolute duration (âîñüìè÷àñîâîé)
t:time:age — age (çðåëûé)
t:time:age:max — old age (ñòàðûé, äðåâíèé)
t:time:age:min — young age (ìîëîäîé, ìàëîëåòíèé)
t:time:age:abs — absolute age (òðåõëåòíèé)
t:speed — speed (ïðîâîðíûé)
t:speed:max — high speed (ñêîðûé, áûñòðûé)
t:speed:min — low speed (ìåäëåííûé, òÿãó÷èé)
t:physq — physical qualities (ìÿãêèé, âÿçêèé)
t:physq:form — form (êðèâîé, êðóãëûé)
t:physq:color — colour (êðàñíûé, áåñöâåòíûé)
t:physq:taste — taste (êèñëûé, ïðèòîðíûé)
t:physq:smell — smell (àðîìàòíûé, òóõëûé)
t:physq:temper — temperature (ãîðÿ÷èé, ëåäÿíîé)
t:physq:weight — weight (òÿæåëûé, ëåãêèé)
t:humq — human qualities (óìíûé, âåðíûé, ëîâêèé)
Evaluation
ev — evaluation (òîëêîâûé, ìåøêîâàòûé)
ev:posit — positive evaluation (âåçó÷èé, ëàäíûé)
ev:neg — negative evaluation (ïðîäàæíûé, ñâàðëèâûé)
Derivational tags
d:dim — diminutives (òèõîíüêèé, êðîõîòíûé)
d:aug — augmentatives (çäîðîâåííûé, çëþùèé)
d:atten — attenuatives (óãëîâàòûé, æóëèêîâàòûé)
d:habit — habitives (ãëàçàñòûé, ïóçàòûé)
d:carit — caritives (áåçãëàçûé, áåçäûõàííûé)
d:potent/ d:impot — potentials (ïëàâó÷èé, íåäååñïîñîáíûé)
d:potent — possibilitives (ïëàâó÷èé, ïëîäîðîäíûé, çàíèìàòåëüíûé)
d:impot — impossibilitives (íåñîèçìåðèìûé, íåäååñïîñîáíûé)
der:s — denominal adjectives (äîìàøíèé, æåëåçíûé)
der:v — derverbal adjectives (êîâêèé, íàâÿç÷èâûé, êî÷åâîé)
der:adv — deadverbial adjectives (ïîçäíèé, çäåøíèé)
Numerals (NUM, A-NUM)
Categories
r:card — cardinal (äâà, ïÿòü, äåñÿòü)
r:card:pauc — paucal numerals (äâà, òðè, ÷åòûðå, îáà, ïîë, ïîëòîðà)
r:ord — ordinal (ïåðâûé, âòîðîé, äåñÿòûé)
Pronouns, including:
S-PRO — personal pronouns (îí, êòî)
A-PRO — adjectival pronouns (åãî, êàêîé)
ADV-PRO — adverbial pronouns (ãäå, êàê)
Categories
r:pers — personal (ÿ, îí)
r:ref — reflexive (ñåáÿ)
r:poss — possessive (ìîé, åãî, ñâîé)
r:rel — interrogative/relative (êòî, êîòîðûé, êîãäà)
r:dem — demonstrative (ýòîò, òàêîé)
r:indet — indefinite (íåêîòîðûé, íåêîãäà)
r:neg — negative (íèêàêîé, íè÷åé)
r:spec — quantifiers (âñÿêèé, êàæäûé, ëþáîé)
Verbs (V)
Semantic tags
t:move — movement (áåæàòü, äåðãàòüñÿ, áðîñèòü, íåñòè)
t:move:body — spatial configuration (ñîãíóòü, íàãíóòüñÿ, ïðèìîñòèòüñÿ)
t:put — placement (ïîëîæèòü, âëîæèòü, ñïðÿòàòü)
t:impact — physical impact (áèòü, êîëîòü, âûòèðàòü)
t:impact:creat — creation of a physical object (âûêîâàòü, ñìàñòåðèòü, ñøèòü)
t:impact:destr — destruction of a physical object (âçîðâàòü, ñæå÷ü, çàðåçàòü)
t:changest — change of state or property (âçðîñëåòü, áîãàòåòü, ðàñøèðèòü, èñïà÷êàòü)
t:be — sphere of existence (æèòü, âîçíèêíóòü, óáèòü)
t:be:exist — existence (æèòü, ïðîèñõîäèòü)
t:be:appear — start of existence (âîçíèêíóòü, ðîäèòüñÿ, ñôîðìèðîâàòü, ñîçäàòü)
t:be:disapp — end of existence (óìåðåòü, óáèòü, óëåòó÷èòüñÿ, ëèêâèäèðîâàòü, èñêîðåíèòü)
t:loc — location (ëåæàòü, ñòîÿòü, ïîëîæèòü)
t:loc:body — spatial configuration ñèäåòü)
t:contact — contact and support (êàñàòüñÿ, îáíèìàòü, îáëîêîòèòüñÿ)
t:poss — sphere of possession (èìåòü äàòü, ïîäàðèòü, ïðèîáðåñòè, ëèøèòüñÿ)
t:ment — mental sphere (çíàòü, âåðèòü, äîãàäàòüñÿ, ïîìíèòü, ñ÷èòàòü)
t:perc — perception (ñìîòðåòü, ñëûøàòü, íþõàòü, ÷óÿòü)
t:psych — psychological sphere (ãèïíîòèçèðîâàòü, ñî÷óâñòâîâàòü, íàñòðîèòüñÿ, òåðïåòü)
t:psych:emot — emotion (ðàäîâàòüñÿ, îáèäåòü)
t:psych:volit — volition (ðåøèòü)
t:speech — speech (ãîâîðèòü, ñîâåòîâàòü, ñïîðèòü, êàëàìáóðèòü)
t:behav — human behaviour (êóðîëåñèòü, ïðèâåðåäíè÷àòü)
t:physiol — sphere of physiology (êàøëÿòü, èêàòü)
t:weather — natural phenomena (áóøåâàòü, âüþæèòü)
t:sound — sounds (ãóäåòü, øåëåñòåòü)
t:light — light (ãàñíóòü, ëó÷èòüñÿ)
t:smell — smell (ïàõíóòü, áëàãîóõàòü)
Auxiliary verbs
aux:phase — phasal verbs (íà÷àòü, ïðîäîëæàòü, ïðåêðàòèòü)
aux:caus —verbs of causation (âûçâàòü, ïðèâåñòè <ê>)
Causativity
ca:caus — causative verbs (ïîêàçàòü, âåðòåòü)
ca:noncaus — non-causative verbs (âèäåòü, âåðòåòüñÿ)
Derivational tags
d:pref — prefixal verb (çàáåãàòü, îãëÿäåòü)
d:semelf — semelfactive (êèâíóòü, ÷èõíóòü, áîäíóòü, êà÷íóòüñÿ)
d:impf — secondary imperfectives (with -èâà-, -âà-, -à-) (âûïèâàòü, âáèâàòü, ïðîãîíÿòü)
Adverbs (ADV)
Semantic tags
t:place — place (çäåñü, ïîñåðåäèíå)
t:dir — direction (òóäà, íàâåðõ)
t:dist — distance (äàëåêî, áëèçêî)
t:dist:max — long distance (äàëåêî, âäàëè, âäàëåêå)
t:dist:min — short distance (áëèçêî, âáëèçè)
t:time — time (òîãäà, ïîçäíî)
t:time:dur — duration (âå÷íî, íåäîëãî)
t:time:dur:max — long duration (âå÷íî, ïîäîëãó, âñåãäà)
t:time:dur:min — short duration (âðåìåííî, íåäîëãî)
t:speed — speed (áûñòðî, ìåäëåííî)
t:speed:max — fast (áûñòðî, ìèãîì)
t:speed:min — slow (ìåäëåííî, íåòîðîïëèâî)
t:quant — quantity (ñòîëüêî, äîñòàòî÷íî)
t:quant:max — large quantity (ìíîãî, íàâàëîì)
t:quant:min — small quantity (ìàëî, ÷óòü-÷óòü)
Evaluation
ev — evaluation (áåñïå÷íî, áîéêî)
ev:posit — positive evaluation (áîéêî, áåçóïðå÷íî)
ev:neg — negative (áåçäàðíî, íåëîâêî)
Derivational tags
d:dim — diminutive (íåìíîæêî, áûñòðåíüêî)
d:atten — attenuative (ðàíîâàòî, ñóõîâàòî)
der:s — denominal adverbs (ââåðõó, äîìà)
der:v — deverbal adverbs (îòðîäÿñü, ñòîéìÿ)
der:a — deadjectival adverbs (áûñòðî, îáû÷íî)
Taxonomy of motivating adjectives
der:a& dt:size — size (âûñîêî, êîðîòêî)
der:a& dt:size:max — large size (âûñîêî, áåñêîíå÷íî)
der:a& dt:size:min — small size (êîðîòêî, íèçêî)
der:a& dt:physq — physical qualities (òâåðäî, ïëîòíî)
der:a& dt:physq:form — form (ïëîñêî, ïðÿìî)
der:a& dt:physq:color — colour (êðàñíî, äîáåëà)
der:a& dt:physq:taste — taste (ãîðüêî, âêóñíî)
der:a& dt:physq:smell — smell (ñìðàäíî, çëîâîííî)
der:a& dt:physq:temper — temperature (òåïëî, ïðîõëàäíî)
der:a& dt:physq:weight — weight (òÿæåëî, ëåãêî)
der:a& dt:humq — human qualities (âíèìàòåëüíî, ãðóáî)
|