This page presents general statistical information about the corpora included in the Russian National Corpus (RNC).
For some RNC corpora, extended statistics are available, including not only data on the number of texts and words but also charts showing the distribution of metadata attributes, geographic maps, and volume distribution graphs by country and region (for corpora with regional annotation). To access corpus statistics, click icon in the corpus header or click on the corpus name on this page.
In corpora with extended statistics, it is also possible to compare users’ subcorpora with the entire corpus. To view comparative data, click icon in the subcorpus header.
Number of texts
Texts by subcorpora
Corpora | Number of texts | Number of sentences | Number of tokens | Percentage of tokens |
---|
Main | 133,554 | 31,626,103 | 389,470,868 | 17.5% |
---|
including manually disambiguated | 2,164 | 514,169 | 5,988,177 | 0.3% |
Media | 2,838,953 | 57,862,754 | 850,630,557 | 38.2% |
---|
National media | 2,728,688 | 55,215,073 | 815,141,029 | 36.6% |
Regional & international | 110,265 | 2,647,681 | 35,489,528 | 1.6% |
SynTagRus | 1,304 | 109,886 | 1,568,027 | 0.1% |
---|
Social networks | 1,768,134 | 14,443,641 | 161,432,452 | 7.3% |
---|
Spoken | 4,598 | 2,052,206 | 14,854,033 | 0.7% |
---|
Accentological | 1,342,851 | 13,559,353 | 135,768,732 | 6.1% |
---|
Multimedia | 1,383 | 1,019,768 | 5,763,881 | 0.3% |
---|
MultiPARC | 56 | 91,341 | 528,851 | 0.0% |
---|
Russian | 26 | 48,236 | 299,520 | 0.0% |
English-Russian | 30 | 43,105 | 229,331 | 0.0% |
Parallel | 13,791 | 16,619,044 | 214,046,070 | 9.6% |
---|
English | 1,556 | 3,476,864 | 51,982,439 | 2.3% |
Armenian | 28 | 126,636 | 1,570,735 | 0.1% |
Bashkir | 124 | 124,270 | 550,387 | 0.0% |
Belarusian | 312 | 1,162,868 | 10,916,697 | 0.5% |
Bulgarian | 59 | 418,986 | 5,159,901 | 0.2% |
Buryat | 7 | 30,750 | 401,516 | 0.0% |
Veps | 989 | 40,780 | 343,133 | 0.0% |
Spanish | 168 | 492,464 | 7,359,538 | 0.3% |
Italian | 126 | 302,264 | 4,930,970 | 0.2% |
Karelian | 2,355 | 125,702 | 1,223,760 | 0.1% |
Chinese | 1,075 | 253,500 | 4,422,747 | 0.2% |
Korean | 185 | 12,300 | 73,752 | 0.0% |
Latvian | 245 | 410,438 | 4,398,564 | 0.2% |
Lithuanian | 65 | 72,244 | 702,471 | 0.0% |
German | 299 | 2,234,024 | 32,276,755 | 1.5% |
Polish | 54 | 501,800 | 6,355,629 | 0.3% |
Portuguese | 38 | 88,572 | 1,602,412 | 0.1% |
Romanian | 31 | 60,140 | 903,375 | 0.0% |
Serbian | 37 | 144,027 | 1,903,176 | 0.1% |
Slovene | 53 | 173,172 | 1,989,641 | 0.1% |
Ukrainian | 865 | 919,426 | 9,383,774 | 0.4% |
Finnish | 320 | 299,184 | 3,741,431 | 0.2% |
French | 67 | 498,180 | 7,631,430 | 0.3% |
Khakas | 331 | 126,710 | 1,194,971 | 0.1% |
Hindi β | 9 | 9,292 | 122,347 | 0.0% |
Romani | 20 | 16,240 | 185,142 | 0.0% |
Czech | 556 | 334,562 | 4,387,470 | 0.2% |
Chuvash | 2,820 | 2,375,948 | 24,168,622 | 1.1% |
Swedish | 787 | 1,344,054 | 16,520,152 | 0.7% |
Estonian | 95 | 192,493 | 2,154,889 | 0.1% |
Japanese | 103 | 31,512 | 453,279 | 0.0% |
Multilingual | 12 | 219,642 | 5,034,965 | 0.2% |
Dialect | 2,014 | 125,156 | 599,258 | 0.0% |
---|
Educational | 1,247 | 1,184,926 | 13,761,608 | 0.6% |
---|
From 2 to 15 | 75 | 413,781 | 4,408,536 | 0.2% |
---|
Poetry | 103,626 | 1,361,340 | 14,097,265 | 0.6% |
---|
Russian classics β | 27,289 | 1,544,467 | 18,556,005 | 0.8% |
---|
Historical | 11,996 | 833,227 | 15,427,893 | 0.7% |
---|
Old East Slavic | 337 | — | 881,706 | 0.0% |
Inscriptions | 749 | — | 6,039 | 0.0% |
Birchbark letters | 1,249 | 1,249 | 23,932 | 0.0% |
Middle Russian | 8,242 | 399,642 | 9,251,633 | 0.4% |
Church Slavonic | 1,419 | 432,336 | 5,264,583 | 0.2% |
Panchronic | 141,035 | 30,890,027 | 384,096,728 | 17.3% |
---|
Total | 6,391,906 | 173,737,020 | 2,225,010,764 | 100% |
---|
Text types
Texts within the main corpus by type and other meta features
Text type | Number of texts | Number of sentences | Number of tokens | Percentage of tokens |
---|
Non-fiction | 122,251 | 16,721,496 | 231,498,004 | 59.4% |
---|
Fiction | 11,303 | 14,904,607 | 157,972,864 | 40.6% |
---|
Total | 133,554 | 31,626,103 | 389,470,868 | 100% |
---|
Fiction
Genre | Number of texts | Number of sentences | Number of tokens | Percentage of tokens |
---|
Crime | 138 | 817,058 | 7,656,452 | 4.8% |
---|
Children's literature | 860 | 759,540 | 7,006,755 | 4.4% |
---|
Nonfiction | 464 | 1,062,528 | 12,620,560 | 7.9% |
---|
Drama | 306 | 636,194 | 3,440,098 | 2.1% |
---|
Historical prose | 295 | 1,287,535 | 14,918,140 | 9.3% |
---|
Love story | 69 | 190,000 | 1,805,976 | 1.1% |
---|
Medical prose | 3 | 17,773 | 170,643 | 0.1% |
---|
No genre | 6,366 | 8,222,857 | 90,763,917 | 56.5% |
---|
Transliteration | 30 | 43,291 | 696,938 | 0.4% |
---|
Adventure | 273 | 519,559 | 5,506,667 | 3.4% |
---|
Sentimental fiction | 30 | 10,463 | 167,255 | 0.1% |
---|
Thriller | 1 | 6,950 | 60,653 | 0.0% |
---|
Sci-fi | 774 | 1,007,091 | 10,150,031 | 6.3% |
---|
folklore | 77 | 8,715 | 180,657 | 0.1% |
---|
Humour and satire | 1,560 | 569,944 | 5,614,438 | 3.5% |
---|
Total | 11,246 | 15,159,498 | 160,759,180 | 100% |
---|
Non-fiction
Domain | Number of texts | Number of sentences | Number of tokens | Percentage of tokens |
---|
Day-to-day life | 6,802 | 3,214,744 | 33,709,904 | 14.3% |
---|
Official and business | 3,660 | 353,924 | 5,375,463 | 2.3% |
---|
Technical | 1,211 | 116,853 | 1,639,468 | 0.7% |
---|
Journalism | 98,350 | 9,936,754 | 140,263,010 | 59.4% |
---|
Advertising | 2,153 | 76,326 | 844,061 | 0.4% |
---|
Academic | 8,369 | 2,565,262 | 44,186,194 | 18.7% |
---|
Fiction | 57 | 124,643 | 1,257,854 | 0.5% |
---|
Theological | 1,218 | 332,874 | 5,290,038 | 2.2% |
---|
Electronic communication | 877 | 336,484 | 3,382,171 | 1.4% |
---|
Total | 122,697 | 17,057,864 | 235,948,163 | 100% |
---|
Text topic | Number of texts | Number of sentences | Number of tokens | Percentage of tokens |
---|
Administration and management | 17,487 | 1,430,212 | 17,800,870 | 4.5% |
---|
антропология | 10 | 15,284 | 313,667 | 0.1% |
---|
Army and armed conflict | 12,778 | 1,244,133 | 15,577,899 | 4.0% |
---|
Archaeology | 21 | 2,021 | 29,228 | 0.0% |
---|
Astrology, parapsychology, esoterica | 432 | 99,808 | 1,035,462 | 0.3% |
---|
Astronomy | 449 | 41,100 | 648,036 | 0.2% |
---|
Business, commerce, economics, finance | 12,348 | 741,269 | 10,337,185 | 2.6% |
---|
Biology | 1,257 | 297,158 | 4,732,290 | 1.2% |
---|
Military affairs | 13 | 11,495 | 244,685 | 0.1% |
---|
Geography | 470 | 216,106 | 3,711,171 | 0.9% |
---|
Geodesy | 1 | 746 | 15,342 | 0.0% |
---|
Geology | 631 | 128,256 | 1,872,559 | 0.5% |
---|
Mining industry | 393 | 25,102 | 419,263 | 0.1% |
---|
Home and home economy | 1,342 | 130,787 | 1,925,568 | 0.5% |
---|
Leisure and entertainment | 5,878 | 457,482 | 4,844,725 | 1.2% |
---|
Natural science | 679 | 192,826 | 2,293,034 | 0.6% |
---|
Natural history | 30 | 13,084 | 210,379 | 0.1% |
---|
Health and medicine | 6,098 | 498,185 | 6,683,889 | 1.7% |
---|
IT | 691 | 82,659 | 1,318,016 | 0.3% |
---|
Art and culture | 18,724 | 3,473,674 | 41,890,793 | 10.6% |
---|
Art history | 122 | 36,553 | 570,920 | 0.1% |
---|
history | 5,373 | 1,758,270 | 27,607,738 | 7.0% |
---|
Crime | 10,701 | 367,033 | 3,927,182 | 1.0% |
---|
Culturology | 732 | 193,298 | 3,297,462 | 0.8% |
---|
Light industry, food industry | 329 | 23,991 | 371,259 | 0.1% |
---|
Forestry | 94 | 9,430 | 146,011 | 0.0% |
---|
Logic | 1 | 3,478 | 51,815 | 0.0% |
---|
Mathematics | 222 | 41,315 | 608,027 | 0.2% |
---|
Machinery | 25 | 1,987 | 30,883 | 0.0% |
---|
Metallurgy | 21 | 2,078 | 32,288 | 0.0% |
---|
Science and technology | 11,860 | 2,427,720 | 40,645,901 | 10.3% |
---|
Education | 4,146 | 656,607 | 7,610,130 | 1.9% |
---|
Politics and society | 34,650 | 4,016,535 | 54,470,831 | 13.8% |
---|
Political science | 18 | 7,009 | 117,321 | 0.0% |
---|
Law | 3,704 | 301,081 | 4,697,751 | 1.2% |
---|
Nature | 4,621 | 538,074 | 6,278,412 | 1.6% |
---|
Industry | 5,093 | 348,168 | 4,415,324 | 1.1% |
---|
Accidents | 237 | 9,552 | 97,367 | 0.0% |
---|
Psychology | 712 | 176,527 | 2,811,204 | 0.7% |
---|
Travel | 2,337 | 935,298 | 12,756,625 | 3.2% |
---|
Religion | 7,016 | 1,055,934 | 14,963,383 | 3.8% |
---|
Agriculture | 2,186 | 238,562 | 3,204,086 | 0.8% |
---|
Sociology | 513 | 148,612 | 2,440,354 | 0.6% |
---|
Sport | 4,206 | 288,331 | 3,697,728 | 0.9% |
---|
Statistics | 374 | 18,439 | 286,021 | 0.1% |
---|
Construction, architecture | 2,258 | 161,675 | 2,097,443 | 0.5% |
---|
Technology | 8,279 | 585,399 | 7,514,822 | 1.9% |
---|
Transport | 4,994 | 220,232 | 2,441,052 | 0.6% |
---|
Physics | 1,359 | 126,230 | 1,930,682 | 0.5% |
---|
Philology | 1,097 | 385,822 | 6,524,220 | 1.7% |
---|
Philosophy | 893 | 502,349 | 8,862,164 | 2.3% |
---|
Chemical industry | 108 | 8,028 | 114,948 | 0.0% |
---|
Chemistry | 1,168 | 139,780 | 2,067,993 | 0.5% |
---|
Private life | 21,568 | 4,569,054 | 50,039,138 | 12.7% |
---|
Electronics | 748 | 45,599 | 670,143 | 0.2% |
---|
Energy industry | 177 | 18,067 | 277,405 | 0.1% |
---|
этнография | 7 | 8,098 | 164,164 | 0.0% |
---|
Total | 221,681 | 29,475,602 | 393,744,258 | 100% |
---|
Dates
Texts within the main corpus by dates created
Date | Number of texts | Number of sentences | Number of tokens | Percentage of tokens |
---|
1651 - 1700 | 5 | 16,633 | 332,648 | 0.1% |
---|
1701 - 1750 | 382 | 64,591 | 1,255,871 | 0.3% |
---|
1751 - 1800 | 1,931 | 338,302 | 6,631,305 | 1.7% |
---|
1801 - 1850 | 3,323 | 1,195,349 | 18,670,143 | 4.7% |
---|
1851 - 1900 | 4,976 | 4,587,242 | 64,977,239 | 16.2% |
---|
1901 - 1950 | 57,915 | 8,883,268 | 103,840,739 | 26.0% |
---|
1951 - 2000 | 21,998 | 9,984,063 | 111,278,506 | 27.8% |
---|
2001 - 2050 | 43,485 | 7,395,050 | 93,108,359 | 23.3% |
---|
Total | 134,015 | 32,464,498 | 400,094,810 | 100% |
---|
Parts of speech
Tokens by part of speech (Disambiguated corpus only)
Part of speech | Number of tokens | Percentage of tokens |
---|
noun | 1,718,410 | 28.7% |
---|
Adjective | 510,957 | 8.5% |
---|
Numeral | 96,851 | 1.6% |
---|
of these, recorded in numbers | 53,817 | 0.9% |
of these, recorded in writing | 43,034 | 0.7% |
numeral adjective | 24,589 | 0.4% |
---|
Verb | 1,013,248 | 16.9% |
---|
Adverb | 253,573 | 4.2% |
---|
Predicative | 42,762 | 0.7% |
---|
Parenthesis | 26,721 | 0.4% |
---|
Pronoun | 471,700 | 7.9% |
---|
Adjectival pronoun | 280,716 | 4.7% |
---|
Adverbial pronoun | 130,434 | 2.2% |
---|
Predicative pronoun (некого, нечего) | 678 | 0.0% |
---|
Preposition | 626,906 | 10.5% |
---|
Conjunction | 475,769 | 7.9% |
---|
Particle | 266,675 | 4.5% |
---|
Interjection | 8,628 | 0.1% |
---|
Initital | 10,002 | 0.2% |
---|
Other (foreign words, onomatopoeia) | 29,536 | 0.5% |
---|
Total | 5,988,155 | 100% |
---|