[99% SOLVED] How to match Cyrillic characters with a regular expression

Debugger · Post by **Debugger** » Sat Apr 06, 2019 9:14 am

Text Editor:
Finding Russian sentences containing one or more dots and dashes and number and comma etc. and it is very important to look only in one line!

[ЁёА-Яа-я„«—]

example:
Что такое стиль. Настольная книга для писательницы
Что такое стиль. Настольная книга для писательницы - 2
Что такое стиль. Настольная — писательницы -
3. Что такое стиль. Настольная книга для писательницы

Post by **therube** » Sun Apr 07, 2019 11:30 am

(Of course I'm not following, but...)

Separate your letters out first.

regex:[ЁёА-Яа-я] (or regex:[ЁёА-я], I think)

> В цепях древней тайны.mp3
> Славься, Русь!.mp3

Then add your punctuation.
Will that work?

regex:[ЁёА-Яа-я] regex:[,„«—]+

> Славься, Русь!.mp3

Debugger · Post by **Debugger** » Sun Apr 07, 2019 1:59 pm

Regular expression WRONG.
Bad match of all characters in one line.
Finding virtually the some text than it should. It should not match normal text, for example, without searching for CHARACTER and other characters throughout the text, rather than being strictly defined on a single line that contains at least a text in Russian.

Not need operator regex:

Example:

Line1: Russian text and or not and other char
Line2: Russian text
Line3: Polish text
(Separator)Line4:===
Line5: Russian text and or not and other char
Line6: Russian text
Line7: Polish text
(Separator)Line8:===

Post by **void** » Mon Apr 08, 2019 2:35 am

Regular Expression Quick Start
pcrepattern specification

Debugger · Post by **Debugger** » Mon Apr 08, 2019 6:37 am

void - Well, yes, but I can not find anything on the subject that a regular expression in one line must include strictly defined characters (Russian), can not contain mixed text, English, Polish, German and other the same characters, etc.

.+[ЁёА-Яа-я.,„”"«—0-9)(]\n

Post by **void** » Mon Apr 08, 2019 7:57 am

Requires PCRE in multiline mode:

^([\p{Cyrillic}]+[\-\.—0-9]+[\p{Cyrillic}\-\.—0-9]*|[\-\.—0-9]+[\p{Cyrillic}]+[\p{Cyrillic}\-\.—0-9]*)$

This will also match at least one Cyrillic character, which I assume you want, otherwise it would match a long string of numbers or dashes or dots.

^ = match start of string (or line, in multiline mode)
[] = match character in a set
\p{Cyrillic} = match a Cyrillic character
\- = match a literal -
\. = match a literal .
+ = match previous element one or more times.
* = match previous element zero or more times.
$ = match end of string (or line, in multiline mode)

Debugger · Post by **Debugger** » Mon Apr 08, 2019 9:06 am

Unfortunately, I do not use PCRE, but I switched to the Onigmo engine and it will work.

I have modified a of the regex:
^([\p{Cyrillic}]+[\-\.\,\!\…\?\„\”\,\;\\\/\*\#\@\&\:\.x{200B}—0-9\s]+[\p{Cyrillic}\-\.\,\!\…\?\„\”\,\;\\\/\*\#\@\&\:\.x{200B}—0-9\s]*|[\-\.\,\…\?\„\”\,\;\\\/\*\#\@\&\:\.x{200B}—0-9\s]+[\p{Cyrillic}]+[\p{Cyrillic}\-\.\,\!\…\?\„\”\,\;\\\/\*\#\@\&\:\.x{200B}—0-9\s]*)$

but wrong regex

Text included:
\p{Cyrillic}
!
!!
!!!
!!!!
?
??
???
… (unicode)
— (unicode)
-
.
..
...
,
0-9
(
)
„ (unicode)
„ (unicode)
"
\s (space)
\
/
\x{200B} or really maybe .\x{200B}
*
#
@
&
:
;

voidtools forum

[99% SOLVED] How to match Cyrillic characters with a regular expression

[99% SOLVED] How to match Cyrillic characters with a regular expression

Re: How to match Cyrillic characters with a regular expression

Re: How to match Cyrillic characters with a regular expression

Re: How to match Cyrillic characters with a regular expression

Re: How to match Cyrillic characters with a regular expression

Re: How to match Cyrillic characters with a regular expression

Re: How to match Cyrillic characters with a regular expression