The Russian National Corpus is a representative collection of texts in Russian, counting more than 2 bln tokens and completed with linguistic annotation and search tools
Search in corpora
News
Show allThe Spoken and Accentological corpora have been updated with new collections of oral speech. The additions include recordings of conversations and interviews with residents from various regions of Russia, featured in documentary films from the series "Letters from the Province" and in video blogs, as well as materials from folklore expeditions.
The corpora have also been expanded with monologue-style memoirs and everyday dialogic speech, including speech of young people, collected by students of the Voronezh State University, the Lomonosov Moscow State University, and the State University of Education (Prosvet). The total volume of this update amounts to approximately 180,000 tokens.
The Spoken Corpus now contains 15 million tokens, while the Accentological Corpus, including naive poetry, has reached a total of 135.9 million tokens.
The Multimedia Corpus has been expanded by almost 120,000 words. The following data have been added:
A large collection of literary reading recordings — stories and novellas by Nikolai Gogol, Alexander Pushkin, Alexander Kuprin, Mikhail Zoshchenko, Vsevolod Ivanov, Yuri Kazakov, and Vasily Shukshin, performed by renowned actors Tatyana Doronina, Natalya Gundareva, Vitaly Solomin, Alexander Kalyagin, Igor Ilyinsky, Evgeny Knyazev, Boris Chirkov, and Sergei Yursky.
Recordings of television interviews and talk shows.
The collection of regional speech recordings has also been expanded. It now includes conversations and interviews with residents of the Voronezh Region, Buryatia, Tatarstan, and the Komi Republic, featured in documentary films from the series "Letters from the Province".
In addition, the corpus now includes recordings of non-public speech — informal conversations with relatives and friends in everyday settings.
The Old East Slavic Corpus has reached a size of 913,000 words. It has been expanded with several additional literary and administrative texts, as well as a new collection of selected extratexts of East Slavic manuscripts. These span from refined passages about the book’s patron and its making to down-to-earth marginalia. Such texts are of great value for the study of both language history and cultural context. The extratexts have been given based on the most up-to-date scholarly publications (by Vadim Krys'ko, Maria Galchenko, Alexei Gippius, Savva Mikheev, and others) and, where possible, cross-checked against digital copies of the original manuscripts. The number of tokens with fully annotated Greek correspondences has reached 237,000.