RNC News

The Old East Slavic corpus has been updated and now contains 655 thousand tokens. It includes texts of the 11th-14th centuries, representing a variety of genres. They feature such famous works as Lives of Boris and Gleb, The Testament of Vladimir Monomakh and The Tale of Igor's Campaign, as well as other hagiographic, didactic and canonical texts. A collection of Old Novgorod business documents (gramoty), both on parchment and paper, has been added. The Old East Slavic metatextual information now contains the date of the text and the date of surviving copy.

The corpus of birchbark letters is now a parallel corpus: it presents original texts aligned with their translations into Russian and English.

The poetry corpus has also been updated and now counts 13 million tokens. The update consists of poems by A. Vertinsky, G. Sapgir and others.

The parallel corpus now contains almost 163 million words. It has been updated with two new language pairs: Portuguese-Russian and Romanian-Russian. The Finnish-Russian text collection has been significantly expanded and now includes translations of fiction and journalistic texts, as well as the corpus of international treaties (we thank Mikhail Mikhailov who provided the texts). The collections of English and German texts in Russian translations have also been expanded.

Within the spoken corpus, a new search field 'Region' is now available.
Within the Old East Slavic corpus, it is now possible to search homonyms by semantics. In the Middle Russian corpus, a suggestion list has been attached to the Lemma field.

Show all