Very complex regular expression in multiline

Off-topic posts of interest to the "Everything" community.
Post Reply
Debugger
Posts: 630
Joined: Thu Jan 26, 2017 11:56 am

Very complex regular expression in multiline

Post by Debugger »

Very complex regular expression in multiline
Find all words after the line === not containing Russian letters or unicode, or any url

Line1: ===
Line2: URL
Line3: Title

Example:
NO

Code: Select all

===
www.site.ru/Compliments
Про комплименты
YES

Code: Select all

===
Про комплименты
void
Developer
Posts: 16751
Joined: Fri Oct 16, 2009 11:31 pm

Re: Very complex regular expression in multiline

Post by void »

Find all words after the line === not containing Russian letters or unicode, or any url
Your YES example contains russian after === so I am not sure what you want..

=== = match literal ===
(\r\n|\n) = match newline
[a-z]+\.[a-z]+ = very loosely match a URL.
www\.[a-z]+\.[a-z]+ = match a URL starting with www.
(http://|https://)?(www\.)?[a-z]+\.[a-z]+ = match a URL starting with possible http:// or https:// and/or www.
[\p{Cyrillic}] = match a Russian letter.
[^\x00-\x7f] = match a non-ASCII character.
Debugger
Posts: 630
Joined: Thu Jan 26, 2017 11:56 am

Re: Very complex regular expression in multiline

Post by Debugger »

Example:
Find in the line after the next === (3 brackets) missing URL. "==="Used to separate short or long texts and URL in a line.
The text contains several thousand lines.
Should find.
1. Any name for Polish, English or Russian words
2. The URL, also Unicode URL or any. If missing, find or show the name with the missing URL.
3. Text in a line

Example NOT MISSING URL:

Code: Select all

===
http://liter.org/liter.php?proizvid=8918
Co to jest piękno natury? To lasy, łąki.
KONIEC.
===
https://lifestyle.pl/newsy/alan-andersz-niczego-nie,p1310276229
1. Niczego nie planuje i nie zastanawiam się nad przyszłością. Życie mi pokazało, że wszystko
KONIEC.
===


------------------------

Example MISSING URL:
===
Co to jest piękno natury? To lasy, łąki.
KONIEC.
===
https://lifestyle.pl/newsy/alan-andersz ... 1310276229
1. Niczego nie planuje i nie zastanawiam się nad przyszłością. Życie mi pokazało, że wszystko
KONIEC.
===

Multiline without need \n
Regex: [Window Title]
EmEditor

[Main Instruction]
The search string contains CR. Do you want to enable the Treat CR and LF Separately option?

[Content]
===
<!?>(http://|https://)?(www\.)?[a-z]+\.[a-z]+


[ ] Do not show this message again [Enable the Treat CR and LF Separately option, and continue] [Do not enable the Treat CR and LF Separately option, but continue] [Cancel]
Post Reply