Finding a needle in a haystack duplicate

Discussion related to "Everything" 1.5 Alpha.
Post Reply
anmac1789
Posts: 669
Joined: Mon Aug 24, 2020 1:16 pm

Finding a needle in a haystack duplicate

Post by anmac1789 »

Hello, so I've managed to find ALOT of duplicates using a custom column:

Code: Select all

ancestor:"C:\Users\main name\path1"|ancestor:"E:\Users\different name\path2" files: add-column:column1 column1:=name:"--"formatfiletime($dc:)"--"formatfiletime($dm:)"--"formatfiletime($da:)"--"size: find-dupes:column1
The total number of results I got is 57,160 files -- 28,587 files are from the C drive path and 28,573 files are from the E drive path. Shouldn't it find the exact half number of files for both C and E drive paths? 57,160 /2 = 28,580 ? Therefore, it seems that there are 28,580 - 28,587 from C results = 7 results which shouldn't be included or is improperly being detected. Similarly for the E drive path 28,580 - 28,573 = 7 results ...


How can I pinpoint what is going on, what are these 7 mysterious files?
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Finding a needle in a haystack duplicate

Post by raccoon »

You can have copies of the same file on the same drive, and they will match unto themselves. There is no specification that the matches must occur across drive volumes, only.

That's why I requested the function Compare-Paths
therube
Posts: 4984
Joined: Thu Sep 03, 2009 6:48 pm

Re: Finding a needle in a haystack duplicate

Post by therube »

Wouldn't UNIQUE give you what you want?

<c:/tmp | c:/out> !copy file: abc

(I didn't throw unique into the search line, instead by right-click the column header & 'Find ___ Duplicates'.)

I've got trees
I exclude files with the string copy (cause I have 100K of them)
I look for files that contain abc in those two trees

Then I unique: them, by some category; Name, Size, whatever, & I'm left with files that - don't fit, that are unique.


(Now, I've left the ancestor: & whatnot out, but I'd think it should work just as well thrown in.)
anmac1789
Posts: 669
Joined: Mon Aug 24, 2020 1:16 pm

Re: Finding a needle in a haystack duplicate

Post by anmac1789 »

raccoon wrote: Wed Feb 15, 2023 9:28 pm You can have copies of the same file on the same drive, and they will match unto themselves. There is no specification that the matches must occur across drive volumes, only.

That's why I requested the function Compare-Paths
I'm also waiting on this function it seems to be highly useful considering how many duplicate paths there are with similar subfolder names in between
therube wrote: Wed Feb 15, 2023 9:36 pm Wouldn't UNIQUE give you what you want?

<c:/tmp | c:/out> !copy file: abc

(I didn't throw unique into the search line, instead by right-click the column header & 'Find ___ Duplicates'.)

I've got trees
I exclude files with the string copy (cause I have 100K of them)
I look for files that contain abc in those two trees

Then I unique: them, by some category; Name, Size, whatever, & I'm left with files that - don't fit, that are unique.


(Now, I've left the ancestor: & whatnot out, but I'd think it should work just as well thrown in.)
Could you be a little specific in your description? What do you mean 'string copy' ?

I found a hack and slash kind of way I am not sure if it will work for your. I had to re-create some excel functions and then translate that into everything using custumn column. here is the custom column I used:

Code: Select all

RIGHT($path:,LEN($path:)-FIND("\Android\",$path:))
I chose this because I had to find the 1st instance of Android\ folder inside the path and then delete everything to the left and go fully to the right and complete the partial path. So far, the results I got were:

57,146 objects - 28,573 for paths beginning with C and 28,573 for paths beginning with E. So far so good...

Here is the full custom column I have:

Code: Select all

files: add-column:column1 column1:=name:"--"RIGHT($path:,LEN($path:)-FIND("\Android\",$path:))"--"formatfiletime($dc:)"--"formatfiletime($dm:)"--"formatfiletime($da:)"--"size: !find-dupes:column1
Post Reply