Blake3 (hash function) implementation
Blake3 (hash function) implementation
Hi,
it would be possible to have the implementation of BLAKE3 hash function?
Here a c implentation https://github.com/BLAKE3-team/BLAKE3?t ... ementation
thanks in advance.
Bye bye
it would be possible to have the implementation of BLAKE3 hash function?
Here a c implentation https://github.com/BLAKE3-team/BLAKE3?t ... ementation
thanks in advance.
Bye bye
Re: Blake3 (hash function) implementation
(Makes sense if you need a crypto hash. It is crypto, I think?
Though xxHash is much preferred if you don't need crypto.
Since I just did this the other day...
@ office, on SSD, i get 411 MB/s (using FcHash.exe %*) [fchash v5.5.0]
T: @ home 172 MB/s, 171, 176, 174, 166 (w/4 files rather then 2), 55 MB/s when combine T: & slow W:
K: USB Flash Drive (2.0 connector)
T: Toshiba 7200 spinner
O: SSD
X: [same] USB Flash Drive ("3.0" connector, not quite sure?)
B: [same] USB Flash Drive ("3.0" connector, not quite sure? - on back of computer [twas the connector i had my mouse plugged into])
)
Though xxHash is much preferred if you don't need crypto.
Since I just did this the other day...
@ office, on SSD, i get 411 MB/s (using FcHash.exe %*) [fchash v5.5.0]
T: @ home 172 MB/s, 171, 176, 174, 166 (w/4 files rather then 2), 55 MB/s when combine T: & slow W:
K: USB Flash Drive (2.0 connector)
T: Toshiba 7200 spinner
O: SSD
X: [same] USB Flash Drive ("3.0" connector, not quite sure?)
B: [same] USB Flash Drive ("3.0" connector, not quite sure? - on back of computer [twas the connector i had my mouse plugged into])
Code: Select all
hash --sha1 e*
K: Total: 8files, 2129.6MiB, 64.7sec, 32.9MiB/s
T: Total: 8files, 2129.6MiB, 15.1sec, 140.9MiB/s
T: Total: 8files, 2129.6MiB, 12.3sec, 173.5MiB/s (a 2nd run. 1st was immediately after copying the files to T:)
X: Total: 8files, 2129.6MiB, 11.6sec, 184.2MiB/s
B: Total: 8files, 2129.6MiB, 66.4sec, 32.1MiB/s (so this is AS SLOW as K:)
O: Total: 8files, 2129.6MiB, 4.9sec, 430.7MiB/s
Code: Select all
hash --xxh3 e*
K: Total: 8files, 2129.6MiB, 64.0sec, 33.3MiB/s
T: Total: 8files, 2129.6MiB, 12.2sec, 173.9MiB/s
X: Total: 8files, 2129.6MiB, 11.0sec, 194.5MiB/s
B: Total: 8files, 2129.6MiB, 66.1sec, 32.2MiB/s
O: Total: 8files, 2129.6MiB, 4.4sec, 484.1MiB/s
Code: Select all
hash --sha256 e*
K: Total: 8files, 2129.6MiB, 64.6sec, 32.4MiB/s
T: Total: 8files, 2129.6MiB, 12.3sec, 173.5MiB/s (i sure was not expecting this)
X: Total: 8files, 2129.6MiB, 11.6sec, 183.2MiB/s
B: Total: 8files, 2129.6MiB, 68.2sec, 31.2MiB/s
O: Total: 8files, 2129.6MiB, 7.0sec, 305.4MiB/s (substantially slower)
Re: Blake3 (hash function) implementation
Blake3 actually does rather well on my SDD (essentially the same as xxh).
(Do note that you must clear cache when "benchmarking" or you'll get erroneous results, i.e., reading from cache.)
(Oh, same data set as above.)
(Do note that you must clear cache when "benchmarking" or you'll get erroneous results, i.e., reading from cache.)
Code: Select all
TimeThis : Command Line : fchash --xxh3 e*
TimeThis : Start Time : Mon Jun 10 11:01:34 2024
TimeThis : End Time : Mon Jun 10 11:01:40 2024
TimeThis : Elapsed Time : 00:00:05.709
Code: Select all
TimeThis : Command Line : b3sum e*
TimeThis : Start Time : Mon Jun 10 11:02:06 2024
TimeThis : End Time : Mon Jun 10 11:02:12 2024
TimeThis : Elapsed Time : 00:00:05.877
Re: Blake3 (hash function) implementation
In fact I don't need cryptographic hash, but a fast algorithm to use.
I would need the hash to verify duplicate files.
it would be more useful to have a hash of the video or audio without metadata, because I can change those often.
With ffmpeg there is a way to calculate the hash of the streams inside a file, while FLAC calculates it automatically.
I should test if ffmpeg gives the same result as the calculation done by flac.
It's probably enough to use MD5 even if it's broken but is slow.
Anyway I have to sit there and think of a way to have a unique way of identifying my media files even if I change the metadata.
It would be nice to have a quick function implementation to hash video and audio in anything other than SHA1 or MD5.
Maybe we can think of a method to do this, but I wouldn't want to burden the development of this product which is already fantastic as it is!
I would need the hash to verify duplicate files.
it would be more useful to have a hash of the video or audio without metadata, because I can change those often.
With ffmpeg there is a way to calculate the hash of the streams inside a file, while FLAC calculates it automatically.
I should test if ffmpeg gives the same result as the calculation done by flac.
It's probably enough to use MD5 even if it's broken but is slow.
Anyway I have to sit there and think of a way to have a unique way of identifying my media files even if I change the metadata.
It would be nice to have a quick function implementation to hash video and audio in anything other than SHA1 or MD5.
Maybe we can think of a method to do this, but I wouldn't want to burden the development of this product which is already fantastic as it is!
Re: Blake3 (hash function) implementation
For as far as I understand it, hashmedia.bat.
Take a .mp4
Copy (transcode) it to .mkv
ffmpeg -i input.mp4 -c copy out.mkv
Do the same a second time (output to a different filename) .mkv
ffmpeg -i input.mp4 -c copy out2.mkv
Because .mkv has a ... "Unique ID", the two .mkv will not hash compare.
out.mkv will have different file hash from out2.mkv
But if you compare the files' media contents, you will see they do compare.
Take a .mp4
Copy (transcode) it to .mkv
ffmpeg -i input.mp4 -c copy out.mkv
Do the same a second time (output to a different filename) .mkv
ffmpeg -i input.mp4 -c copy out2.mkv
Because .mkv has a ... "Unique ID", the two .mkv will not hash compare.
out.mkv will have different file hash from out2.mkv
But if you compare the files' media contents, you will see they do compare.
Re: Blake3 (hash function) implementation
Tools Versions used in test:
Tests: (run times take with oh my posh!)
I did a simple test to get a rough idea of how using one hash over another can bring me a clear advantage.
Examining a medium-sized group of files to bypass HDD's cache and windows's disk cache in a simple but unprofessional way, simulating a fast HDD, but with defragmented files and about 1% free space, using xxHash or md5 or SHA256, I don't get a noticeable performance increase.
The big difference in performance is obtained instead by using a ramdrive with 10GB of video files inside. With top transfer speed, blake3's speed doubles that of xxHash and as expected outperforms MD5 and SHA256.
If you are working with fast disks, implementing these 2 hashes might make sense.
I haven't yet managed to match the hash calculated by FLAC with that calculated by ffmpeg, I will certainly have to study for a moment, because the hash calculated by FLAC on the audio stream does not depend on the compression ratio chosen.
While to have a hash of a video stream excluding data that can be manipulated compared to just the video stream from the chosen container, ffmpeg seems to me the best choice.
Obviously ffmpeg supports a limited number of hashes and to implement new ones you would have to ask for it on their ticket system.
I haven't found a way to pass just the video stream directly to an external hashing program.
Now I'm trying to understand whether to add custom properties directly in Everything or add the value of the hash taken with ffmpeg in the tags field of the matroska.
For Example: V_MD5=xxxxxxxxxxx;City=Rome;Location=Centrum
By doing this I can search using "tags:"
Obviously it would be more convenient to have a supported matroska meta tag to be able to take advantage of Everything's column-level duplicate search.
An ongoing project of mine is to normalize all my videos to have only matroska containers which actually support almost any file type.
Obviously I won't comment on Matroska's proprietary Tagging system: a unique mental delirium!
Code: Select all
ffmpeg version 6.1.1-full_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --pkg-config=pkgconf --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libaribb24 --enable-libaribcaption --enable-libdav1d --enable-libdavs2 --enable-libuavs3d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libaom --enable-libjxl --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi
--enable-libharfbuzz --enable-liblensfun --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-dxva2 --enable-d3d11va --enable-libvpl --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libcodec2 --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
libavutil 58. 29.100 / 58. 29.100
libavcodec 60. 31.102 / 60. 31.102
libavformat 60. 16.100 / 60. 16.100
libavdevice 60. 3.100 / 60. 3.100
libavfilter 9. 12.100 / 9. 12.100
libswscale 7. 5.100 / 7. 5.100
libswresample 4. 12.100 / 4. 12.100
libpostproc 57. 3.100 / 57. 3.100
Code: Select all
xxhsum.exe 0.8.2 by Yann Collet
compiled as 64-bit x86_64 autoVec little endian with GCC 13.1.0
Code: Select all
b3sum 1.5.1
Tests: (run times take with oh my posh!)
Code: Select all
17 files - 54,8 GB - HDD Toshiba 8TB 7200rpm
[ 3:44.769s] ffmpeg -hide_banner -loglevel error -i $f -map 0:0 -c copy -f md5 -
[ 3:42.909s] ffmpeg -hide_banner -loglevel error -i $f -map 0:0 -c copy -f hash - (default hash SHA256)
[ 3:41.846s] xxhsum.exe -H0 $f
[ 3:41.898s] xxhsum.exe -H1 $f
[ 3:41.898s] xxhsum.exe -H2 $f
[ 7:46.274s] b3sum_windows_x64_bin.exe -l 4 $f
[ 7:37.951s] b3sum_windows_x64_bin.exe -l 8 $f
[ 7:36.662s] b3sum_windows_x64_bin.exe -l 16 $f
Code: Select all
3 files - 10 GB - Ramdrive
[ 2.919s] xxhsum.exe -H0 $f
[ 1.575s] xxhsum.exe -H1 $f
[ 1.26s] xxhsum.exe -H2 $f
[ 0.888s] b3sum_windows_x64_bin.exe -l 4 $f
[ 0.883s] b3sum_windows_x64_bin.exe -l 8 $f
[ 0.877s] b3sum_windows_x64_bin.exe -l 16 $f
[ 9.423s] ffmpeg -hide_banner -loglevel error -i $f -map 0:0 -c copy -f md5 -
[ 19.043s] ffmpeg -hide_banner -loglevel error -i $f -map 0:0 -c copy -f hash - (default hash SHA256)
Examining a medium-sized group of files to bypass HDD's cache and windows's disk cache in a simple but unprofessional way, simulating a fast HDD, but with defragmented files and about 1% free space, using xxHash or md5 or SHA256, I don't get a noticeable performance increase.
The big difference in performance is obtained instead by using a ramdrive with 10GB of video files inside. With top transfer speed, blake3's speed doubles that of xxHash and as expected outperforms MD5 and SHA256.
If you are working with fast disks, implementing these 2 hashes might make sense.
I haven't yet managed to match the hash calculated by FLAC with that calculated by ffmpeg, I will certainly have to study for a moment, because the hash calculated by FLAC on the audio stream does not depend on the compression ratio chosen.
While to have a hash of a video stream excluding data that can be manipulated compared to just the video stream from the chosen container, ffmpeg seems to me the best choice.
Obviously ffmpeg supports a limited number of hashes and to implement new ones you would have to ask for it on their ticket system.
I haven't found a way to pass just the video stream directly to an external hashing program.
Now I'm trying to understand whether to add custom properties directly in Everything or add the value of the hash taken with ffmpeg in the tags field of the matroska.
For Example: V_MD5=xxxxxxxxxxx;City=Rome;Location=Centrum
By doing this I can search using "tags:"
Obviously it would be more convenient to have a supported matroska meta tag to be able to take advantage of Everything's column-level duplicate search.
An ongoing project of mine is to normalize all my videos to have only matroska containers which actually support almost any file type.
Obviously I won't comment on Matroska's proprietary Tagging system: a unique mental delirium!
Last edited by Roby.One on Wed Jun 12, 2024 2:55 pm, edited 2 times in total.
Re: Blake3 (hash function) implementation
Code: Select all
T:\WINDOWS>timer b3sum -l 9 T:\WINDOWS\Dell_Win7_professional_64bit_sp1.iso W:\Z\Win10_1809Oct_English_
x64.iso
fdfb55611fd6c1d3f8 T:/WINDOWS/Dell_Win7_professional_64bit_sp1.iso
e3783ea5a9d00b1196 W:/Z/Win10_1809Oct_English_x64.iso
Kernel Time = 17.004 = 3%
User Time = 17.175 = 3%
Process Time = 34.179 = 7% Virtual Memory = 12 MB
Global Time = 457.089 = 100% Physical Memory = 5374 MB
Code: Select all
T:\WINDOWS\ :
xxh3 <b3ae76391873db5eba87765c95144f03>: Dell_Win7_professional_64bit_sp1.iso
W:\Z\ :
xxh3 <9a34afbb378239743641e6d8dc31a8d1>: Win10_1809Oct_English_x64.iso
Total: 2files, 10200.1MiB, 173.3sec, 58.9MiB/s
Kernel Time = 0.811 = 0%
User Time = 1.435 = 0%
Process Time = 2.246 = 1% Virtual Memory = 10 MB
Global Time = 173.343 = 100% Physical Memory = 12 MB
457 vs 173 sec -> put into english, that's 7 min 37 sec vs. 2 min 53 sec, that's quite the difference
5300 MB vs 12 MB RAM (Working Set, peak) -> now, just not sure just what that means, how "things" are affected, but it's quite the difference
(depending on ones fileset & other factors, difference might not be so large)
T: & W: are both spinners, T: 7200 internal, W: ? in a USB 2.0 enclosure [SLOWWW]
(i5-3570K, 16 GB RAM)
just some topics:
https://github.com/BLAKE3-team/BLAKE3/issues/278
https://github.com/BLAKE3-team/BLAKE3/issues/305
https://github.com/BLAKE3-team/BLAKE3/issues/208
Now on an SSD:
(i7-3770S, 8 GB RAM)
(not the same files as above, but close)
Code: Select all
C:\out\mozregression\small>timer b3sum -l 9 C:\out\mozregression\small\7601.24214.180801-1700.win7sp1_l
dr_escrow_CLIENT_ULTIMATE_x64FRE_en-us.iso C:\out\KKK\K-ORSAIR-0202\amber-OK\Dell_Win7_professional_64b
it_sp1.iso
c1395f4b35ad151961 C:/out/mozregression/small/7601.24214.180801-1700.win7sp1_ldr_escrow_CLIENT_ULTIMAT
E_x64FRE_en-us.iso
fdfb55611fd6c1d3f8 C:/out/KKK/K-ORSAIR-0202/amber-OK/Dell_Win7_professional_64bit_sp1.iso
Kernel Time = 10.108 = 23%
User Time = 11.637 = 26%
Process Time = 21.746 = 50% Virtual Memory = 13 MB
Global Time = 43.410 = 100% Physical Memory = 5373 MB
Code: Select all
C:\out\mozregression\small\ :
xxh3 <f3d06ac283e42848487d9a7d0d770567>: 7601.24214.180801-1700.win7sp1_ldr_escrow_CLIENT_ULTIMATE_x6
4FRE_en-us.iso
C:\out\KKK\K-ORSAIR-0202\amber-OK\ :
xxh3 <b3ae76391873db5eba87765c95144f03>: Dell_Win7_professional_64bit_sp1.iso
Total: 2files, 10963.8MiB, 22.2sec, 494.9MiB/s
Kernel Time = 0.733 = 3%
User Time = 1.450 = 6%
Process Time = 2.184 = 9% Virtual Memory = 10 MB
Global Time = 22.173 = 100% Physical Memory = 13 MB
For some reason, that 1st run of blake3 might have been a bit slow.
Later attempts have been faster, though still not as fast as xxhash.
Note that theoretically speaking xxhash -H128 should be a bit faster then the other variants.
(And the implementation in FcHash, might even be a bit faster then that of xxhash itself?)
Code: Select all
C:\out\mozregression\small>timer b3sum -l 9 C:\out\mozregression\small\7601.24214.180801-1700.win7sp1_l
dr_escrow_CLIENT_ULTIMATE_x64FRE_en-us.iso C:\out\KKK\K-ORSAIR-0202\amber-OK\Dell_Win7_professional_64b
it_sp1.iso
c1395f4b35ad151961 C:/out/mozregression/small/7601.24214.180801-1700.win7sp1_ldr_escrow_CLIENT_ULTIMAT
E_x64FRE_en-us.iso
fdfb55611fd6c1d3f8 C:/out/KKK/K-ORSAIR-0202/amber-OK/Dell_Win7_professional_64bit_sp1.iso
Kernel Time = 9.219 = 26%
User Time = 12.339 = 35%
Process Time = 21.559 = 62% Virtual Memory = 14 MB
Global Time = 34.523 = 100% Physical Memory = 5620 MB
Re: Blake3 (hash function) implementation
Oh, & you thought we were done .
--no-mmap, makes a HUGE difference on my end (i5-3570K, 16 GB RAM)
totally changing the "dynamics" of the hash computation
- why ?
so now blake3 is FASTER then xxhash, 2:35 vs 2:54 in blake3's favor (so 19 sec quicker) - why ?
(am i not invalidating caches, properly [not that i'm sure just HOW to do that?], OR ... ?)
---
With another dataset:
T:\K-ORSAIR\LIB\WIN7-DELL-HomePremium-ISO\sources/*
163 files, 3.2 GB, mostly small files < 10 MB, boot.wim 168 MB, install.wim 2.9 GB
(from fchash) Total: 163files, 3059.6MiB, 16.2sec, 188.9MiB/s
(so 188 MB/s is, i'm guessing, close to theoretical I/O speed for a 7200 spinner)
so if using
4-cores, & "memory mapping"
vs
1-core & NOT memory-mapping
is slower, for me...
that means, what?
that my memory is "slow", that the usage of multiple cores on my end is not that efficient ?
and to top it off, --num-threads 1:
so...
--num-threads 1, is way slower then --no-mmap, but way faster then defaults [i've got to check that]
while at the same time using more "Physical Memory" then --no-mmap, but FAR less then defaults
so... now, i'm really confused (scratches:head)
So they know what I found out, b3sum has poor performance for large files on spinning disks, when multi-threading is enabled.
--no-mmap, makes a HUGE difference on my end (i5-3570K, 16 GB RAM)
totally changing the "dynamics" of the hash computation
Code: Select all
T:\WINDOWS>timer b3sum -l 9 T:\WINDOWS\Dell_Win7_professional_64bit_sp1.iso W:\Z\Win10_1809Oct_English_
x64.iso --no-mmap
fdfb55611fd6c1d3f8 T:/WINDOWS/Dell_Win7_professional_64bit_sp1.iso
e3783ea5a9d00b1196 W:/Z/Win10_1809Oct_English_x64.iso
Kernel Time = 5.007 = 3%
User Time = 9.968 = 6%
Process Time = 14.976 = 9% Virtual Memory = 1 MB
Global Time = 154.813 = 100% Physical Memory = 3 MB
so now blake3 is FASTER then xxhash, 2:35 vs 2:54 in blake3's favor (so 19 sec quicker) - why ?
(am i not invalidating caches, properly [not that i'm sure just HOW to do that?], OR ... ?)
---
With another dataset:
T:\K-ORSAIR\LIB\WIN7-DELL-HomePremium-ISO\sources/*
163 files, 3.2 GB, mostly small files < 10 MB, boot.wim 168 MB, install.wim 2.9 GB
(from fchash) Total: 163files, 3059.6MiB, 16.2sec, 188.9MiB/s
(so 188 MB/s is, i'm guessing, close to theoretical I/O speed for a 7200 spinner)
Code: Select all
blake-no.bat --no-mmap 16.024
xxhash-fc-real.bat fchash --xxh3 16.219
blake.bat 167.557
4-cores, & "memory mapping"
vs
1-core & NOT memory-mapping
is slower, for me...
that means, what?
that my memory is "slow", that the usage of multiple cores on my end is not that efficient ?
and to top it off, --num-threads 1:
Code: Select all
T:\WINDOWS>timer b3sum -l 9 T:\WINDOWS\Dell_Win7_professional_64bit_sp1.iso W:\Z\Win10_1809Oct_English_
x64.iso --num-threads 1
fdfb55611fd6c1d3f8 T:/WINDOWS/Dell_Win7_professional_64bit_sp1.iso
e3783ea5a9d00b1196 W:/Z/Win10_1809Oct_English_x64.iso
Kernel Time = 19.500 = 7%
User Time = 12.043 = 4%
Process Time = 31.543 = 12% Virtual Memory = 12 MB
Global Time = 251.963 = 100% Physical Memory = 673 MB
--num-threads 1, is way slower then --no-mmap, but way faster then defaults [i've got to check that]
while at the same time using more "Physical Memory" then --no-mmap, but FAR less then defaults
so... now, i'm really confused (scratches:head)
this was from 2day, & quicker then the same from yesterday ?(do i have the same files on each system, heh?) --- NO, that's T: & T:, where b4 it was T: & W: (& W: is much slower), so *NOT* a valid comparison !Code: Select all
T:\WINDOWS>timer b3sum -l 9 T:\WINDOWS\Dell_Win7_professional_64bit_sp1.is fdfb55611fd6c1d3f8 T:/WINDOWS/Dell_Win7_professional_64bit_sp1.iso Kernel Time = 10.966 = 3% User Time = 8.377 = 2% Process Time = 19.344 = 5% Virtual Memory = 12 MB Global Time = 329.894 = 100% Physical Memory = 5374 MB
So they know what I found out, b3sum has poor performance for large files on spinning disks, when multi-threading is enabled.