Text Editor:
Finding Russian sentences containing one or more dots and dashes and number and comma etc. and it is very important to look only in one line!
[ЁёА-Яа-я„«—]
example:
Что такое стиль. Настольная книга для писательницы
Что такое стиль. Настольная книга для писательницы - 2
Что такое стиль. Настольная — писательницы -
3. Что такое стиль. Настольная книга для писательницы
[99% SOLVED] How to match Cyrillic characters with a regular expression
[99% SOLVED] How to match Cyrillic characters with a regular expression
Last edited by Debugger on Thu Apr 18, 2019 9:15 am, edited 1 time in total.
Re: How to match Cyrillic characters with a regular expression
(Of course I'm not following, but...)
Separate your letters out first.
regex:[ЁёА-Яа-я] (or regex:[ЁёА-я], I think)
> В цепях древней тайны.mp3
> Славься, Русь!.mp3
Then add your punctuation.
Will that work?
regex:[ЁёА-Яа-я] regex:[,„«—]+
> Славься, Русь!.mp3
Separate your letters out first.
regex:[ЁёА-Яа-я] (or regex:[ЁёА-я], I think)
> В цепях древней тайны.mp3
> Славься, Русь!.mp3
Then add your punctuation.
Will that work?
regex:[ЁёА-Яа-я] regex:[,„«—]+
> Славься, Русь!.mp3
Re: How to match Cyrillic characters with a regular expression
Regular expression WRONG.
Bad match of all characters in one line.
Finding virtually the some text than it should. It should not match normal text, for example, without searching for CHARACTER and other characters throughout the text, rather than being strictly defined on a single line that contains at least a text in Russian.
Not need operator regex:
Example:
Line1: Russian text and or not and other char
Line2: Russian text
Line3: Polish text
(Separator)Line4:===
Line5: Russian text and or not and other char
Line6: Russian text
Line7: Polish text
(Separator)Line8:===
Bad match of all characters in one line.
Finding virtually the some text than it should. It should not match normal text, for example, without searching for CHARACTER and other characters throughout the text, rather than being strictly defined on a single line that contains at least a text in Russian.
Not need operator regex:
Example:
Line1: Russian text and or not and other char
Line2: Russian text
Line3: Polish text
(Separator)Line4:===
Line5: Russian text and or not and other char
Line6: Russian text
Line7: Polish text
(Separator)Line8:===
Re: How to match Cyrillic characters with a regular expression
void - Well, yes, but I can not find anything on the subject that a regular expression in one line must include strictly defined characters (Russian), can not contain mixed text, English, Polish, German and other the same characters, etc.
.+[ЁёА-Яа-я.,„”"«—0-9)(]\n
.+[ЁёА-Яа-я.,„”"«—0-9)(]\n
Re: How to match Cyrillic characters with a regular expression
Requires PCRE in multiline mode:
^([\p{Cyrillic}]+[\-\.—0-9]+[\p{Cyrillic}\-\.—0-9]*|[\-\.—0-9]+[\p{Cyrillic}]+[\p{Cyrillic}\-\.—0-9]*)$
This will also match at least one Cyrillic character, which I assume you want, otherwise it would match a long string of numbers or dashes or dots.
^ = match start of string (or line, in multiline mode)
[] = match character in a set
\p{Cyrillic} = match a Cyrillic character
\- = match a literal -
\. = match a literal .
+ = match previous element one or more times.
* = match previous element zero or more times.
$ = match end of string (or line, in multiline mode)
^([\p{Cyrillic}]+[\-\.—0-9]+[\p{Cyrillic}\-\.—0-9]*|[\-\.—0-9]+[\p{Cyrillic}]+[\p{Cyrillic}\-\.—0-9]*)$
This will also match at least one Cyrillic character, which I assume you want, otherwise it would match a long string of numbers or dashes or dots.
^ = match start of string (or line, in multiline mode)
[] = match character in a set
\p{Cyrillic} = match a Cyrillic character
\- = match a literal -
\. = match a literal .
+ = match previous element one or more times.
* = match previous element zero or more times.
$ = match end of string (or line, in multiline mode)
Re: How to match Cyrillic characters with a regular expression
Unfortunately, I do not use PCRE, but I switched to the Onigmo engine and it will work.
I have modified a of the regex:
^([\p{Cyrillic}]+[\-\.\,\!\…\?\(\)\„\”\,\;\\\/\*\#\@\&\:\.x{200B}—0-9\s]+[\p{Cyrillic}\-\.\,\!\…\?\(\)\„\”\,\;\\\/\*\#\@\&\:\.x{200B}—0-9\s]*|[\-\.\,\…\?\(\)\„\”\,\;\\\/\*\#\@\&\:\.x{200B}—0-9\s]+[\p{Cyrillic}]+[\p{Cyrillic}\-\.\,\!\…\?\(\)\„\”\,\;\\\/\*\#\@\&\:\.x{200B}—0-9\s]*)$
but wrong regex
Text included:
\p{Cyrillic}
!
!!
!!!
!!!!
?
??
???
… (unicode)
— (unicode)
-
.
..
...
,
0-9
(
)
„ (unicode)
„ (unicode)
"
\s (space)
\
/
\x{200B} or really maybe .\x{200B}
*
#
@
&
:
;
I have modified a of the regex:
^([\p{Cyrillic}]+[\-\.\,\!\…\?\(\)\„\”\,\;\\\/\*\#\@\&\:\.x{200B}—0-9\s]+[\p{Cyrillic}\-\.\,\!\…\?\(\)\„\”\,\;\\\/\*\#\@\&\:\.x{200B}—0-9\s]*|[\-\.\,\…\?\(\)\„\”\,\;\\\/\*\#\@\&\:\.x{200B}—0-9\s]+[\p{Cyrillic}]+[\p{Cyrillic}\-\.\,\!\…\?\(\)\„\”\,\;\\\/\*\#\@\&\:\.x{200B}—0-9\s]*)$
but wrong regex
Text included:
\p{Cyrillic}
!
!!
!!!
!!!!
?
??
???
… (unicode)
— (unicode)
-
.
..
...
,
0-9
(
)
„ (unicode)
„ (unicode)
"
\s (space)
\
/
\x{200B} or really maybe .\x{200B}
*
#
@
&
:
;