Multiple Regex override capture groups inconsistently

Discussion related to "Everything" 1.5 Alpha.
Post Reply
bit
Posts: 38
Joined: Fri Feb 17, 2023 8:57 am

Multiple Regex override capture groups inconsistently

Post by bit »

This happens when I want to use fileexists to filter uncompleted aria2-downloading files.
file-exists:
folder-exists:

Search for files or folders where the specified filename exists in the index based on a previous regex search.


Use Match Path, path: or a path separator in the regular expression to match full paths and filenames (otherwise, the same path is assumed)
Use \0-\9 to recall captured regex matches.
Use $0:-$9: to recall captured regex matches.
it works fine when I use only 1 regex.
if I prepend another regex(A) before the later regex(B), the capture group can be from A or B, this cause
fileexists
not works as intented.

See examples below:

Code: Select all

Z:\  regex:"dl\\"      regex:"^(.+)(?<!aria2)$"  file-exists:\1.aria2   

Code: Select all

Z:\  regex:"dl\\[\d]+[a-f]?\\"       regex:"^(.+)(?<!aria2)$"     
regex_override.png
regex_override.png (34.19 KiB) Viewed 2562 times
void
Developer
Posts: 16753
Joined: Fri Oct 16, 2009 11:31 pm

Re: Multiple Regex override capture groups inconsistently

Post by void »

regex:"dl\\" is optimized to a subpath ending with dl search.
Nothing is captured here. (not even regmatch 0)

Use the no-fast-regex: search modifier if you want to capture dl\ in regmatch0

For example:
no-fast-regex:regex:"dl\\"



A regex search on the name is consider faster than a regex search on the full path.
Everything will reorder your search terms so the regex search on the name is done first, and the full path is done after.

The last regex search will set regmatch0-9.

Use the no-reorder: search modifier to disable search term reordering.

For example:

no-reorder: Z:\ regex:"dl\\[\d]+[a-f]?\\" regex:"^(.+)(?<!aria2)$"



I will make the following changes for the next alpha update to improve results:
When referencing regmatch 0-9, fast regex will be automatically disabled.
When referencing regmatch 0-9, search terms will not be reordered.
bit
Posts: 38
Joined: Fri Feb 17, 2023 8:57 am

Re: Multiple Regex override capture groups inconsistently

Post by bit »

void wrote: Mon May 29, 2023 5:31 am regex:"dl\\" is optimized to a subpath ending with dl search.
1) Can you give a list of conditions and methods on when and how the regex optimization applies?
eg: regex:"a\\|b\\", will it be optimized to <a\|b\>?
2) How much imporved performance will be achieved by such an optimization?
Regex optimizations will need to be disabled if you would like to use the regex match 0 property without using a special character in your regex pattern.
3) "()" are special characters, so as long as the regex has one group, it won't be optimized, right?
". * + ? []", do these chars also considered special?
Can you provide a list of "special" chars?

A regex search on the name is consider faster than a regex search on the full path.
Everything will reorder your search terms so the regex search on the name is done first, and the full path is done after.
The last regex search will set regmatch0-9.
4) Can we know the priority list of search terms used to re-order?
Knowing which part is faster than another will help us wirte better search input.
I will make the following changes for the next alpha update to improve results:
When referencing regmatch 0-9, fast regex will be automatically disabled.
When referencing regmatch 0-9, search terms will not be reordered.
5) I won't favor such changes if the performance will drop significantly.
6) For now, when will fast-regex/re-order be disabled implicitly?
void
Developer
Posts: 16753
Joined: Fri Oct 16, 2009 11:31 pm

Re: Multiple Regex override capture groups inconsistently

Post by void »

1) Can you give a list of conditions and methods on when and how the regex optimization applies?
eg: regex:"a\\|b\\", will it be optimized to <a\|b\>?
No optimization is used if the regex expression contains any special regex characters:
[ ]
.
*
{ }
?
+
|
( )
^
$

(it's a little more complicated as Everything does process escaped characters with \, for example: \Q..\E or \.)


2) How much imporved performance will be achieved by such an optimization?
Timing and OP code information is reported in the debug console.

Check the performance between:
regex:"dl\\"
no-fast-regex:regex:"dl\\"


3) "()" are special characters, so as long as the regex has one group, it won't be optimized, right?
Correct.


". * + ? []", do these chars also considered special?
Yes.


4) Can we know the priority list of search terms used to re-order?
Tough to do as each search function has many OP codes and each OP code is weighted.
Search reordering is only applied to ANDed op codes where the order doesn't matter.


5) I won't favor such changes if the performance will drop significantly.
Capturing the correct information will always be more important than performance.
Any performance loss will be negligible compared to a search op that recalls a regmatch. (eg: fileexists:)


6) For now, when will fast-regex/re-order be disabled implicitly?
When you reference a regmatch0-9 with fileexists:/folderexists: (in the next alpha update)
When using the expand: search modifier.
When using child: and child-occurrence-count:
when using column assignment, for example: col1:=LEN($regmatch1:)
when comparing regmatches, for example: regmatch1:==regmatch2:
when using noreorder: search modifier.
when using regmatch0-9: search function.
when using a sibling: with $1-9:
when using eval: search function, for example: eval:LEN($regmatch1:)>LEN($regmatch2:)
bit
Posts: 38
Joined: Fri Feb 17, 2023 8:57 am

Re: Multiple Regex override capture groups inconsistently

Post by bit »

void wrote: Mon May 29, 2023 9:22 am
Check the performance between:
regex:"dl\\"
no-fast-regex:regex:"dl\\"
why no result for

Code: Select all

no-fast-regex:regex:"dl\\"
?

Code: Select all

search 'regex:"dl\\"' filter '' sort 1 ascending 1
parse flags 00020002 type 20c00100
TERM dl\\
FOLDER TERM START 00000000062981a8 M 0000000000afe8e0 N 0000000000afea00
00000000062981a8 a0e00904 M 0000000000afe8e0 N 0000000000afea00 OP 29 dl\\
FILE TERM START 00000000062981a8 M 0000000000afe8e0 N 0000000000afea00
00000000062981a8 a0e00904 M 0000000000afe8e0 N 0000000000afea00 OP 29 dl\\
found 1387 folders with 12 threads in 0.023050 seconds
found 26171 files with 12 threads in 0.356370 seconds
SET SORT 1
set sort 1 ascending 1 is valid 1
already sorted
finished sort, time taken 0.002252 seconds
total size 2089805441904, calculated in 0.000427 seconds
update selection 0.000000 seconds
ready

Code: Select all

search 'no-fast-regex:regex:"dl\\"' filter '' sort 1 ascending 1
parse flags 00000000 type 20c00100
TERM no-fast-regex:regex:dl\\
expanded **no-fast-regex:regex:dl**\****
check *\* no-fast-regex:regex:dl**\
check *\* no-fast-regex:regex:dl**\
FOLDER TERM START 000000004dbc1d98 M 0000000000afe8e0 N 0000000000afea00
000000004dbc1d98 a0e00104 M 0000000000afe8e0 N 0000000000afea00 OP 113 no-fast-regex:regex:dl\\
FILE TERM START 000000004dbc1d98 M 0000000000afe8e0 N 0000000000afea00
000000004dbc1d98 a0e00104 M 0000000000afe8e0 N 0000000000afea00 OP 113 no-fast-regex:regex:dl\\
new thread (19)
found 0 folders with 12 threads in 0.059210 seconds
update m 0 00000000151f4020
new thread (20)
update index C:
found 0 files with 12 threads in 0.663917 seconds
SET SORT 1
set sort 1 ascending 1 is valid 1
already sorted
finished sort, time taken 0.000378 seconds
total size 0, calculated in 0.000001 seconds
update selection 0.000000 seconds
USN CREATE config.db-wal
ready
4) Can we know the priority list of search terms used to re-order?
Tough to do as each search function has many OP codes and each OP code is weighted.
Search reordering is only applied to ANDed op codes where the order doesn't matter.
I don't understand this. why reorder when order doesn't matter?
Got it :idea: . slow_op() && fast_op() can be re-ordered to fast_op() && slow_op(), which speed-up the search and not affect the final result.
A regex search on the name is consider faster than a regex search on the full path.
Something like this is what I want to know from such re-order. Can you provide more rule examples simply like this?
void
Developer
Posts: 16753
Joined: Fri Oct 16, 2009 11:31 pm

Re: Multiple Regex override capture groups inconsistently

Post by void »

why no result for
no-fast-regex:regex:"dl\\"
?
Oh, Everything is being too pedantic with matching the hyphens (-)

Please try the following search:

nofastregex:regex:"dl\\"

I'll make no-fast-regex: work for the next alpha update.


Can you provide more rule examples simply like this?
Only a broad overview at this stage:

name / indexed size/date-modified / full path
indexed information
filelists
regex/wildcards / child:
regex/wildcards on full path / complex filelists
child-count / complex date comparisons
meta data from disk
content from disk
bit
Posts: 38
Joined: Fri Feb 17, 2023 8:57 am

Re: Multiple Regex override capture groups inconsistently

Post by bit »

void wrote: Mon May 29, 2023 10:40 am nofastregex:regex:"dl\\"
Still fails...

Code: Select all

search 'nofastregex:regex:"dl\\"' filter '' sort 1 ascending 1
parse flags 00020002 type 20c00100
TERM dl\\
FOLDER TERM START 000000004dbc5538 M 0000000000afe8e0 N 0000000000afea00
000000004dbc5538 80e00900 M 0000000000afe8e0 N 0000000000afea00 OP 275 dl\\
FILE TERM START 000000004dbc5538 M 0000000000afe8e0 N 0000000000afea00
000000004dbc5538 80e00900 M 0000000000afe8e0 N 0000000000afea00 OP 275 dl\\
found 0 folders with 12 threads in 0.015963 seconds
found 0 files with 12 threads in 0.227084 seconds
SET SORT 1
set sort 1 ascending 1 is valid 1
already sorted
finished sort, time taken 0.000284 seconds
total size 0, calculated in 0.000000 seconds
update selection 0.000000 seconds
ready
void
Developer
Posts: 16753
Joined: Fri Oct 16, 2009 11:31 pm

Re: Multiple Regex override capture groups inconsistently

Post by void »

Still fails...
Please try the following:

nofastregex:path:regex:"dl\\"



There's currently an issue with automatically enabling match path when using \\ and nofastregex:
I had already fixed this, so didn't see the issue my end..
bit
Posts: 38
Joined: Fri Feb 17, 2023 8:57 am

Re: Multiple Regex override capture groups inconsistently

Post by bit »

void wrote: Mon May 29, 2023 11:22 am
nofastregex:path:regex:"dl\\"
works now. :D

Code: Select all

search 'nofastregex:path:regex:"dl\\"' filter '' sort 1 ascending 1
parse flags 00020002 type 20c00100
TERM dl\\
FOLDER TERM START 000000004a3d7638 M 000000000078e4d0 N 000000000078e5f0
000000004a3d7638 80e00904 M 000000000078e4d0 N 000000000078e5f0 OP 276 dl\\
FILE TERM START 000000004a3d7638 M 000000000078e4d0 N 000000000078e5f0
000000004a3d7638 80e00904 M 000000000078e4d0 N 000000000078e5f0 OP 276 dl\\
new thread (19)
found 1387 folders with 12 threads in 0.066401 seconds
update m 0 0000000023df9180
found 26171 files with 12 threads in 0.790301 seconds
SET SORT 1
set sort 1 ascending 1 is valid 1
already sorted
update index C:
finished sort, time taken 0.003936 seconds
USN CREATE config.db-wal
total size 2089805441904, calculated in 0.000663 seconds
update selection 0.000000 seconds
USN CREATE config.db-shm
USN DATA_EXTEND CREATE config.db-shm
ready
void
Developer
Posts: 16753
Joined: Fri Oct 16, 2009 11:31 pm

Re: Multiple Regex override capture groups inconsistently

Post by void »

Everything 1.5.0.1348a fixes a few issues with fast regex:

no-fast-regex: search modifier will now work as expected.

Referencing regmatch0-9 will disable fast regex.
Referencing regmatch0-9 will disable search term reordering.

Using\\ or / in a regex search will now correctly enable full path matching.
Post Reply