русская версия

Russian National Corpus

This website contains a corpus of the modern Russian language incorporating over 300 million words. The corpus of Russian is a reference system based on a collection of Russian texts in electronic form.

The Corpus is intended for all who are interested in the Russian language and various associated fields: professional linguists, language teachers, school and university students, foreigners learning the language.

More details

News

February 22, 2011
A set of shuffled sentences from the disambiguated corpus is made free available. It consists of 180,000 tokens (including 90,000 tokens for media texts, and for fiction, laws and scientific texts, 30,000 tokens each).

December 30, 2010
Accent, spoken and poetic corpora have been updated.

December 28, 2010
Multimodal Russian Corpus (MURCO) is fully accessible from now on.

October 9, 2010
The collection of articles "Russian National Corpus: 2006 — 2008" is available online.

July 23, 2010
The Corpus updated, performance improved:

  • annotation in the main disambiguated Corpus unified (aspect and voice of the verbal lemma, participal and gerundial forms, accent in frequent non-dictionary tokens, annotation of proper names, abbreviations, non-Russian tikens and other minor corrections);
  • the main non-disambiguated corpus updated: fiction and journalistic texts before 1950, memoirs (2nd half of the 20th century), scientific texts, popular science, politic magazines (1950-1980s), newspapers (1990s), online communication, official texts. The whole main corpus amounts to 176 million tokens.
  • the poetic corpus updated: poets flourishing before 1840s and 1910s (the "Poet's Library" volumes, Myatlev, Scherbina, Bal'mont, Baltrushaitis, Vyacheslav Ivanov and others). The poetic corpus amounts to 5 million tokens.
  • Parallel corpora updated and now amounts to 9 million tokens. English (E. Brontë, J. Galsworthy, C. S. Lewis, K. Vonnegut and others) and German (Novalis, J. von Eichendorff, H. Hesse, H. Böll and others) texts are available with their aligned Russian translations. The Ukrainian-Russian and Russian-Ukrainian alinged corpora are now inaugurated (500 thousand tokens).

July 8, 2010
The KWIC format (key word in context) is available in the search results; use "KWIC format" or "Settings" links in the search result.

June 15, 2010
A new version of the deeply annotated corpus of Russian texts, SynTagRus, has been uploaded.

June 4, 2010
Bug report is available. To report a mistake in a token or in a document, click it and select «Сообщить об ошибке».
The search results page features now links to the search results in the other corpora.

February 3, 2010
A 100-million-token Corpus of the contemporary Russian press (newspapers and Internet news of 2000—2008) is now available. The texts are courtesy of the Corpus Technologies company.

January 1, 2010
In 2009, two collective volumes are issued with extensive contributions from the Corpus team:

  1. Национальный корпус русского языка: 2006—2008. Новые результаты и перспективы. — СПб.: Нестор-История, 2009. — 502 с.
  2. Корпусные исследования по русской грамматике. — М., Пробел, 2009. — 516 с.

November 18, 2009
The RNC was awarded the special prize of the electronical media competition «Impeccable command of the Russian language in the professional activity»

November 18, 2009
The RNC was awarded the special prize of the electronical media competition «Impeccable command of the Russian language in the professional activity»

November 18, 2009
The RAS Institute of the Russian language site now hosts four dictionaries based on the RNC: the Grammatical dictionary of Russian neologisms, the New Russian frequency dictionary, the Combinatory dictionary of Russian intensifiers, the Verbal combinatory dictionary of Russian abstract nouns.

November 18, 2009
A new version of the deeply annotated corpus of Russian texts, SynTagRus, has been uploaded. As compared to the previous version, the corpus has been supplemented by 88 modern papers of popular science, economic, and political genres, published in Russian newspapers, journals or magazines in 2007-2008. Simultaneously, certain errors have been detected and corrected. At present, SynTagRus counts 41,187 tagged sentences.

November 2, 2009
The Educational gateway of the RNC is now available.

November 2, 2009
Poetic corpus updated with XVIII – XIX century texts, including many poetae minores of the 1790s–1830s. The list of the authors is available with links to their subcorpora.

February 26, 2009
In the Main Corpus, words within idiomatic expressions and beyond them are now searchable. An Advanced Semantic Search is available that allows the user to look for the main and peripheral senses of a word and take into account (partial) word-sense disambiguation.

February 25, 2009
The parallel corpus updated: a German-Russian corpus is now available via the common search form for the parallel corpora.

January 12, 2009
Spoken and accentological corpora updated. There are now circa 4,45 million tokens in the accentological corpus, and circa 7,8 million tokens in the spoken corpus.

December 25, 2008
Main and poetic corpora updated. There are now more than 3 million tokens in the poetic corpus, the 18th century tunt now 2,6 million tokens, and the texts of the 1900-1950 period are expanded to 40 million.

December 8, 2008
English-Russian and Russian-English parallel corpora are searchable again, now on the main site of the RNC and with a standartized markup.

November 25, 2008
The English search and subcorpus customizing interface for three major corpora (main, spoken and syntactic) is available.

November 10, 2008
The English search interface for the main subcorpus of the RNC is now available.

October 24, 2008
The Historical Accentological Corpus is now searchable (Russian interface only).

October 3, 2008
The Dictionary of compound lexical units is now available (Russian interface only).

March 26, 2008
The Corpus of Spoken Russian is now searchable.

March 18, 2008
Deeply Annotated Corpus is now searchable.

March 17, 2008
Welcome to the English webpage of the Russian National Corpus.

Russian National Corpus
© 2003–2014
info@ruscorpora.ru