RNC News

In the Regional Corpus, keyword annotations in texts have been updated. The use of keywords facilitates the analysis of narrow thematic categories and helps navigate texts of various topics.

The T-lite-instruct-0.1 model, trained on the corpus materials, was used for annotation. The new keywords contain fewer normalization and grammatical errors and more accurately describe the subject matter of the texts. As before, one keyword can consist of a single token (похолодание, гололед) or a two-word combination (таяние снега). A single-token query (община) yields both exact matches and two-word combinations with this word (сельская община). For each text, 5 to 10 keywords have been generated, ranked by relevance.