Properties > Maximum Size but not Minimum Size.

Discussion related to "Everything" 1.5 Alpha.
Post Reply
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Properties > Maximum Size but not Minimum Size.

Post by raccoon »

In adding the SHA-512 hash digest Property to my index, I noticed that there only exists an option to specify a "Maximum size". "Properties will be ignored for files with a size larger than specified."

Image

However, I would like to ignore files smaller than 64 or 128 MB, skipping all the 10's of millions of tiny inconsequential files.

I would also like to only SHA-512 hash files that have the Readonly attribute set, and the Archive attribute unset. This is because their contents should never change (unless there is disk corruption). ((I always toggle static media to readonly=on, archive=off. Any write changes to a file automatically turns the archive bit back on.))

Can you tell me how long Everything will remember these hashes for? What will cause it to hash the files again? This is going to take a couple of weeks to complete, so I only want it to do it once.

Do you store the hash digests in memory as hexadecimal strings, or as binary values to save on space? A SHA-512 digest consumes 128 bytes of memory as a hex string, but only 64 bytes as binary.
void
Developer
Posts: 16753
Joined: Fri Oct 16, 2009 11:31 pm

Re: Properties > Maximum Size but not Minimum Size.

Post by void »

However, I would like to ignore files smaller than 64 or 128 MB, skipping all the 10's of millions of tiny inconsequential files.
I will consider an option to specify a minimum file size when indexing properties.

For now, please consider indexing SHA-512 for specific extensions, eg: *.mkv;*.mp4;*.iso
This has the added benefit of only storing the hash in the index for files with these extensions. (saving space).


I would also like to only SHA-512 hash files that have the Readonly attribute set, and the Archive attribute unset.
I will consider an option to do this.


Can you tell me how long Everything will remember these hashes for? What will cause it to hash the files again? This is going to take a couple of weeks to complete, so I only want it to do it once.
Forever, until:
you force a rebuild from Tools -> Options.

Everything is likely to go through some database changes during alpha which will also force a rebuild.


Do you store the hash digests in memory as hexadecimal strings, or as binary values to save on space? A SHA-512 digest consumes 128 bytes of memory as a hex string, but only 64 bytes as binary.
In memory as binary.
Any reason for SHA-512? has a SHA-256 collision even been found yet?
Files that do not match your indexed property filters will not require any additional memory.

Thank you for the suggestions.



What I am personally doing is storing a .sha256 file in each archived folder and using the sha256sum SHA-256 property.
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Properties > Maximum Size but not Minimum Size.

Post by raccoon »

Any reason for SHA-512? has a SHA-256 collision even been found yet?
What I am personally doing is storing a .sha256 file in each archived folder and using the sha256sum SHA-256 property.
In 2016 or 2017, I/we simply decided to go with .sha512 files, due in part to their relative scarcity in the wild, for futurization, and because it was an available option. It also calculated basically as fast as sha-256 in testing. I had not yet discovered that the "sha256sum property" reads from .sha256 files. Would you consider support for .sha512 files? Would you consider support for digests that also include filesize in these files as the middle parameter? "FFFFFF 123456 *filename.ext" ... Size was added as an early verification failure check, and to track filenames that were changed.

((If ever someone were to crack SHA-256, you would definitely hear about it. The Bitcoin ledger is SHA-256. Whoever defeats it becomes extremely rich.))
Everything is likely to go through some database changes during alpha which will also force a rebuild.
Thanks. I will keep that in mind!
For now, please consider indexing SHA-512 for specific extensions, eg: *.mkv;*.mp4;*.iso
This has the added benefit of only storing the hash in the index for files with these extensions. (saving space).
Indeed, I have been tinkering with exactly these. And it occurs to me that you do support regex: processing for includes/excludes, but no support for ext:, or other search syntax. Is there anything discouraging you from allowing any arbitrary search syntax for file and folder includes/excludes? Caching an internal 'file list' to work from?

((I copied the file ext: Filters for video, audio and archive media into Notepad++ and then replaced semicolon (;) with semicolon asterisk dot (;*.) to convert from "ext:zip;rar;7z" to "*.zip;*.rar;*.7z". I can see this being tedious for most people, instead of simply "<pic:|video:|audio:|compressed:> size:>64mb attrib:r !attrib:a"))

Are there any tools for utilizing and exporting the data in property columns? eg: "dupesha512:" syntax.
void
Developer
Posts: 16753
Joined: Fri Oct 16, 2009 11:31 pm

Re: Properties > Maximum Size but not Minimum Size.

Post by void »

Everything 1.5.0.1278a adds support for sha512sum .sha512 files.

Would you consider support for digests that also include filesize in these files as the middle parameter? "FFFFFF 123456 *filename.ext" ... Size was added as an early verification failure check, and to track filenames that were changed.
This doesn't appear to be in the md5sum or shasum standard.
What tool is using size as the middle parameter? do you have a specification?

no support for ext:, or other search syntax. Is there anything discouraging you from allowing any arbitrary search syntax for file and folder includes/excludes?
Two completely different code paths.
I've put on my TODO list to add ext: support to filters.

The filter needs to be well defined.
pic: or video: expansion might change, which would be a pain to track.

I will consider size: and attrib: support.


Are there any tools for utilizing and exporting the data in property columns? eg: "dupesha512:" syntax.
Someone might know some. Please post any you know here.
You can export all shown columns as CSV from File -> Export -> Save as type -> CSV.
You can find duplicated files in Everything by right clicking the sha512sum SHA-512 column header and clicking Find sha512sum SHA-512 duplicates.
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Properties > Maximum Size but not Minimum Size.

Post by raccoon »

void wrote: Thu Sep 30, 2021 7:43 am You can export all shown columns as CSV from File -> Export -> Save as type -> CSV.
You can find duplicated files in Everything by right clicking the sha512sum SHA-512 column header and clicking Find sha512sum SHA-512 duplicates.
Thanks, perfect! Those are the "tools" I was referring to.
The filter needs to be well defined.
pic: or video: expansion might change, which would be a pain to track.
I will consider size: and attrib: support.
I've put on my TODO list to add ext: support to filters.
Thank you again!
Would you consider support for digests that also include filesize in these files as the middle parameter? "FFFFFF 123456 *filename.ext" ... Size was added as an early verification failure check, and to track filenames that were changed.
This doesn't appear to be in the md5sum or shasum standard.
What tool is using size as the middle parameter? do you have a specification?
Rhash is a cross platform CLI tool that lets you specify your own digest file format.

There is no actual standard (ISO, IEEE) besides whatever software is popular in any given decade. I have .sfv files that contain just a single hash value and the filename is derived from the filename. The common layouts today are "FFFFF *filename", or "FFFFF filename". I have seen "filename FFFFF" and "filename 123456790 FFFFF". I use "FFFFF 1234567890 *filename". (where 1234567890 is the filesize).

The Windows programs HashCheck and HashTab support most of these layouts, as they are pretty easy to auto detect, and fallback on any ambiguity.

The best part with the inclusion of filesize is the ability to prematurely abort before even scanning the file contents, because a filesize mismatch is an automatic failure mode. A second benefit is the ability to automatically detect filename changes and make repairs to the digest files, if the filesize and filehashes both match.
void
Developer
Posts: 16753
Joined: Fri Oct 16, 2009 11:31 pm

Re: Properties > Maximum Size but not Minimum Size.

Post by void »

Thanks for the size information.


Everything will support:
FFFFF 1234567890 *filename
in the next alpha update.
(must be formatted exactly as <hash> <space> <size> <space> <* or space> <filename>)


Everything will only support:
filename FFFFF
in SFV files.

Everything will support:
FFFFF *filename
FFFFF filename
FFFFF filename (non standard / single space)
in .md5 and .sha files.
void
Developer
Posts: 16753
Joined: Fri Oct 16, 2009 11:31 pm

Re: Properties > Maximum Size but not Minimum Size.

Post by void »

Everything 1.5.0.1279a adds support for the
FFFFF 1234567890 *filename
hash file syntax.


ext: was too difficult to add to the exclude/include filters as these lists take a semicolon (;) separated list, as does the ext: list.
I will investigate other solutions or allow ext: inside double quotes.
Post Reply