RNC News

The functionality of the main corpus has been significantly upgraded. Now it features lexical and grammatical annotation with automatic homonymy resolution and automatic syntactic annotation. Within the main corpus grammatical homonyms are disambiguated. It is also searchable by syntactic parameters such as types of compound sentences, predicative phrases (clauses), complements, copulas, and many others. With this new annotation, all the new functions that appeared earlier in the corpus of regional media are available in the main corpus: Searching for collocations, Frequency dictionary, Search results types. Frequency.

In addition, the main and newspaper corpora now are searchable for lemmas and word forms using regular expressions (β-version). They feature corpus and subcorpus statistics: overall size in texts and words, a geographical map (for the Regional media corpus only) and charts of metatextual attributes. These functions allow users to compare a given subcorpus with the bulk of the corpus, including the visualization.

The interface of the Church Slavonic corpus has been substantially updated, and the corpus is connected to the Overview feature.

The multimedia corpus counts 5.7 million tokens.
The parallel corpus counts 168 million tokens. Now it features four new language pairs with Russian, namely two larger South Slavic subcorpora, Serbian and Slovene, as well as two smaller pilot corpora of Korean and Hindi, both coming with transliteration and dictionary support. The Korean and Hindi tiers include aligned poetical texts, a new feature within the parallel corpus. The Czech and Spanish language pairs are also updated.

Show all