Corpus "From 2 to 15"

The corpus was launched in 2022. It includes Russian literature which is normally read by modern children and teenagers. Currently the corpus includes 75 Russian prose pieces, some original and some translated from other languages. The texts are selected based on the results of large-scale surveys of children, teenagers, teachers and parents. Each text was annotated according to the age at which it is usually most interesting to read. Later it is planned to add annotation of words according to various spelling rules to help learning and teaching Russian orthography.

Annotation

For an automated annotation of text fragments according to the minimum age at which they are supposed to be understandable to their readers, a neural network model was created. The quality of the model's predictions is quite high (in 92% of cases, its decisions coincided with the data received from experts), but it still remains experimental, and annotation errors are possible. Moreover, of course, the rate of growth of vocabulary, the level of reading proficiency and the individual development of children can vary significantly; the current annotation of the corpus reflects average cases.

History

Corpus creation has begun in 2020. B.L. Iomdin, D.A. Morozov, N.N. Buylova, A.V. Glazkova worked to launch the corpus.

The authors express their gratitude to all students who participated in the surveys and helped to collect literature for inclusion in the corpus.
 

Publications

Check out the list of scientific publications on the "From 2 to 15" corpus via the link: https://ruscorpora.ru/s/boKPk. In the Publications section, use filters to find other types of publications about the corpus.

Updated on 22.07.2024