Search conditions with regular expressions ß

In the lemma and tags search form you can add conditions on a lemma or a form using regular expressions.

A regular expression is a pattern that describes all possible variants of lemmas or word forms to be searched, matching the search rule. The search rule is specified by a set of single-character or multi-character elements and operators.

Reserved characters

The RNC regular expression engine supports all Unicode characters. The following characters are reserved as operators:

. ? + * @ | { } [ ] ( ) ^ \

 

 

Elementary operators

The elementary operators describe conditions on a single character or a non-breaking set of characters within a lemma or form, as well as conditions on their repetition.

PatternDescriptionExampleResult

[character_group]

Matches any single character in character_group

[аи]

Letter “а” or letter “и”

[^character_group]

 Negation: matches any single character that is not in character_group

[^а]

Any character except “а”

[First-Last]

Character range: matches any single character in the range from the first one to the last one

[а-я]

Any letter

^

The ^ character before the character in brackets is the negation for a character or a range

к[^ои]т

[^а-я]

Not “кот” and not “кит”

Not a letter

( … )

Forms a group that can be used for substitution as a single element

(ка)

Letter combination “ка”

|

Matches any element from those separated by a vertical bar (|)

а|и

“а” or “и”

.

Wildcard, matches any character: letter or number

с.н

"Сын", “сон”, “сан” and other three-character combinations where the first is the letter “с”, the last is “н”, and the middle can be any character

?

Matches the previous element zero times or one time. Often used to make the previous element optional

она?

“он” or “она”

+

Matches the previous element one times or more

не+

“не”“нее”“неее”, etc.

*

Matches the previous element zero times or more.

.*а

sequences of characters of any length that end in “а”

@

Matches any sequence of characters

г@г

sequences of characters of any length that begin and end with “г”

{}

The previous element is repeated exactly n times.

The previous element is repeated at least n times.

The previous element is repeated at least n times, but no more than m times.

Е{3}

Е{2,}

е{2,3}

“Еее”

“ее”, “еее” and more repetitions of “е”

“ее”, “еее”

 

Compound patterns

With the help of elementary operators, you can compose patterns that describe entire lemmas or words. To do this, it is necessary to describe the conditions on individual letters in the word, as well as the conditions on their repetition. For example:

ExamplePattern
Words with the letter combination "ка" going two or three times in a row

.*(ка){2,3}.*

Five-letter words

…..

Words starting with “й

(й).*

Restrictions

The RNC regular expression engine does not support anchor operators such as ^ (beginning of line) or $ (end of line). To find a lemma or form, the pattern must match the entire lemma or form.

Additional conditions on the word, as usual, are set using the menu of grammatical, syntactic and additional features.

ß version

The functionality of the ß version of regular expression search is limited:

  • The non-extended context of the examples includes only one sentence.
  • In the extended context, extra words may be highlighted.
  • There is no year distribution graph.

Updated on