Search Preprocessor Suggestions

Discussion related to "Everything" 1.5 Alpha.
Post Reply
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Search Preprocessor Suggestions

Post by raccoon »

Suggest:

#normalize:<text>
Returns text with non-ASCII Latin characters converted to nearest ASCII equivalent.
Example: #normalize:"déjà vu" -> deja vu

https://stackoverflow.com/a/10064701/8805628

Code: Select all

$string = 'Ë À Ì Â Í Ã Î Ä Ï Ç Ò È Ó É Ô Ê Õ Ö ê Ù ë Ú î Û ï Ü ô Ý õ â û ã ÿ ç';

$normalizeChars = array(
    'Š'=>'S', 'š'=>'s', 'Ð'=>'Dj','Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A',
    'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I',
    'Ï'=>'I', 'Ñ'=>'N', 'Ń'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U',
    'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a',
    'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i',
    'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ń'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u',
    'ú'=>'u', 'û'=>'u', 'ü'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'ƒ'=>'f',
    'ă'=>'a', 'î'=>'i', 'â'=>'a', 'ș'=>'s', 'ț'=>'t', 'Ă'=>'A', 'Î'=>'I', 'Â'=>'A', 'Ș'=>'S', 'Ț'=>'T',
);

//Output: E A I A I A I A I C O E O E O E O O e U e U i U i U o Y o a u a y c
echo strtr($string, $normalizeChars);
I would also add fancy quotes -> ascii quotes in with this function. But leave Cyrillic and other non-Latin sets alone. Toss in ligatures and digraphs if you're daring.
Last edited by void on Sat Oct 23, 2021 7:36 am, edited 3 times in total.
Reason: moved to Search Preprocessor Suggestions
void
Developer
Posts: 16755
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search Preprocessor Suggestions

Post by void »

Added to my TODO list.

Thank you for the suggestion.
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Search Preprocessor

Post by raccoon »

#fixed got accidentally lost in the "I"s beneath #find/#instr
void
Developer
Posts: 16755
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search Preprocessor Suggestions

Post by void »

Everything 1.5.0.1282a adds a #remove-diacritics: preprocessor search function to remove diacritics from the specified text.

Normalize isn't quite the right function name for the job.
I might add a normalize function similar to the javascript normalize function in a future release.
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Search Preprocessor Suggestions

Post by raccoon »

I would be interested in seeing/stealing your chosen mapping for #remove-diacritics:
void
Developer
Posts: 16755
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search Preprocessor Suggestions

Post by void »

#remove-diacritics: will normalize the text first with NFKD.

æ becomes ae
ffi becomes ffi
Å becomes A + ◌̊
ⓥ becomes v

Any remaining Unicode-marks (eg: ◌̊ ) are removed.


It is the same function when disabling Match Diacritics from the Search menu.
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Search Preprocessor Suggestions

Post by raccoon »

How can I go about generating a complete mapping table?
void
Developer
Posts: 16755
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search Preprocessor Suggestions

Post by void »

Everything doesn't have a simple mapping table.

Everything.Unicode.Tables.txt

values 0x0300 .. 0x036e map to the decomposition table.
3-byte UTF-8 to 3 decomposition characters are hard coded. (ffi -> ffi)
value 0x036f means the unicode point is a diacritic.

This table is likely to change during alpha.

These tables are generated from https://unicode.org/Public/UNIDATA/UnicodeData.txt
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Search Preprocessor Suggestions #number:000

Post by raccoon »

Advanced Rename / Folder Move --> #number:

Suggest: Allow #number:000 for convenient zero padding.
alias for #text:<#number:,000>
void
Developer
Posts: 16755
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search Preprocessor Suggestions

Post by void »

Thank you for the suggestion raccoon.

I have put on my TODO list to add a #number00: and #number000: preprocessor function for 00 and 000 padding.
void
Developer
Posts: 16755
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search Preprocessor Suggestions

Post by void »

Everything 1.5.0.1290a adds #number00: and #number000: for quick zero-padding.
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Search Preprocessor Suggestions

Post by raccoon »

void wrote: Tue Dec 21, 2021 5:35 am Everything 1.5.0.1290a adds #number00: and #number000: for quick zero-padding.
I can't tell you how many total accumulated hours that $number000: (formerly #number000:) has saved me over the past 2 years in the Advanced Rename dialog when naming audio files, and especially audiobooks, I've converted from CD. I just need to express my sincere appreciation here, and I'm shooting you $20 to your Donate page. <3
void
Developer
Posts: 16755
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search Preprocessor Suggestions

Post by void »

Thank you for your donation and support.

I am glad to hear you find #number000: useful!
Post Reply