News

28.12.2024

On the last working day of the year, the team of the Russian National Corpus traditionally reflects on the results and recalls what new developments have taken place over the year.

In 2024, the Corpus grew by more than 109 million words. Many corpora now feature search and statistical tools that were previously available only in the Main, Media, and other "advanced" corpora.

We hope that in this image everyone will find tools that make your work with the Corpus even more productive and enjoyable. May the New Year bring you many fascinating discoveries and inspiring insights!

We extend our heartfelt gratitude to the creators of the Corpus of the Chuvash Language, the Open corpus of Veps and Karelian languages​, and the Digital Corpus of the Khakas Language for their fruitful collaboration.

With warmest wishes for the New Year,
The Team of the Russian National Corpus

25.12.2024

In the Regional Corpus, keyword annotations in texts have been updated. The use of keywords facilitates the analysis of narrow thematic categories and helps navigate texts of various topics.

The T-lite-instruct-0.1 model, trained on the corpus materials, was used for annotation. The new keywords contain fewer normalization and grammatical errors and more accurately describe the subject matter of the texts. As before, one keyword can consist of a single token (похолодание, гололед) or a two-word combination (таяние снега). A single-token query (община) yields both exact matches and two-word combinations with this word (сельская община). For each text, 5 to 10 keywords have been generated, ranked by relevance.

25.12.2024

New texts of about 100,000 tokens were added to the Dialect corpus. The new texts represent the dialects of the north (Arkhangelsk Oblast, Karelian and Komi Republics), the Volga region (a large collection of dialects from the Nizhny Novgorod Oblast) and the south (Smolensk and Kaluga Oblasts, and the Molokans of the Caucasus). The update includes both transcripts from pre-revolutionary and pre-WWII times, as well as field data from recent expeditions. Several hundred audio and ten video recordings have been added, where not only dialects are audible, but one can also see how a boat is tarred in the north or bees are raised in Azerbaijan.

03.12.2024

We are pleased to announce an important update of the search form on the Russian National Corpus website! Now users can add words before Word 1, thus making it much easier to compose and edit complex queries.

Previously, words could only be added to the right of Word 1 and subsequent words. For example, if you were searching for a construction like “adjective + pronoun + дорога (‘road’)”, having specified the syntactic relationship between these words, but then decided to search for “conjuction + adjective + pronoun + дорога (‘road’)”, you would have to rebuild the query from scratch. Now things are getting simpler: just click the “+” button on the left of the Word 1 and specify any attribute such as “conjunction”.

Please note: the principle of calculating the distance between words remains unchanged. The distance is always set from left to right: from the new Word 1 to the original Word 1, and then to the subsequent words.