Новости НКРЯ

16.09.2022

Our large corpora now feature new layers of annotation built using neural network methods, namely lemmatization and grammatical annotation with automatic disambiguation and automatic syntactic parsing. The annotation in question is searchable within the Regional and international media corpus; at the next stage this will become available for the Main and Media corpus.
Morphological homonyms are automatically tagged throughout the regional corpus: for example, the noun печь is now tagged differently from the verb печь, and the dative case is marked separately from the prepositional case. The user can search for syntactic parameters such as different types of multiclausal sentences, clauses, complements, copulas, vocatives, and many others. The syntactic annotation within the Regional corpus is organized differently from the separate Syntax corpus and is more strongly oriented towards the syntax of the constituents.
Feel free to use the new search features and report any errors you notice to us.

The Syntax corpus has been significantly updated with information about the texts, namely the gender of the author, the topic and type of the text, its source and the date when the document was annotated and added to the corpus. For sentences with unchangeable multiword units (such as потому что or по меньшей мере) two variants of sentence structure are shown, featuring these multiword units as a single token (resp. structural node) or as multiple ones. The size of the corpus has reached 1.5 million tokens.

Show all

RNC News