Corpora
  • 107,811 texts
  • 14,477,581 words
Poetry corpus

The Poetic corpus has been developed since 2006 and includes poetic works, including plays in verse and verse translations. It includes texts created after 1700. It is based on the most complete and authoritative critical editions of culturally significant authors (Biblioteka Poeta or collected works); collective anthologies of poetae minores are also represented in the corpus. For the 20th century, preference is given to authors who have become the subject of contemporary scholarly research and who have influenced the literary process (including the history of verse). Texts written entirely or predominantly in foreign languages, in zaum (transrational) language, or based on techniques of visual poetry, etc., are not included in the corpus.

The compilers of the corpus generally avoid making independent decisions on textual scholarship and attribution, instead following authoritative editions. At the same time, they incorporate corrections and clarifications in this area that have appeared after the publication of those editions.

The corpus includes versions of the same text that were substantially revised by the author and/or are printed in full within the main section of authoritative editions. It also includes “doublets,” “twin poems,” and other cases characteristic of twentieth-century poetry in which significant textual duplication forms part of the author’s design.

The corpus provides the standard semantic and morphological annotation, similar to that used in the corpus with unresolved homonymy. Since 2025, an additional annotation layer has also been available, featuring automatically disambiguated morphological analyses, with partial manual correction.

A special verse annotation layer is provided. Thus, it is possible to customize a subcorpus written in amphibrachic meter, accentual verse systems, five-line stanzas, free rhyming, formes fixes such as sonnets, etc. From the subcorpus list one may navigate directly to the texts of poems without constructing a query.

Within the text, elements of verse annotation are also introduced: each verse line features information about the meter it is written in, and it is possible to construct a search query among lines written in a particular meter. When visualizing the output, the line-by-line annotation of the meter is viewable. Within the poetry corpus, metric ictuses are marked. Within rhymed texts, words falling within the line-final rhyming zone are marked as well; so it is possible to search for words from this zone only.

Search by ictus positions is also possible (in the “Word form” field, using the symbol '— for example, музы'к* or му'зык*).

The principles for selecting texts for inclusion in the Corpus and for applying verse (metrical) annotation are described in more detail in publications on the Poetic Corpus.

Publications

Check out the list of scientific publications on the Poetic corpus via the link: https://ruscorpora.ru/s/av5kM. In the Publications section, use filters to find other types of publications about the corpus.

Updated on 14.06.2026