Request: about:ext-survey -- File extension forum survey.

Discussion related to "Everything" 1.5 Alpha.
Post Reply
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Request: about:ext-survey -- File extension forum survey.

Post by raccoon »

Dear Void,

I would like to conduct a forum survey of willing participants to help better identify popular file extensions in the wild, at least within our small sample group. This will help to give us a glimpse into which file extensions are most prevalent (by rank), and which if any are missing from the default Filters included in Everything.

Could you add the Search Command about:ext-survey that will scan the user's index for regex:(\.\S{1,12}$) and tally each unique file extension, and then convert that raw tally into a ratio of the total file count, from 0.0001 to 0.9999. Example:

Code: Select all

.jpg 0.5611
.png 0.2792
.mp4 0.1446
.xlsx 0.0199
.mkv 0.0170
.xls 0.0068
.tar.gz 0.0026
.exe 0.0014
... you get the idea. We can then add all these numbers together between multiple submissions and arrive at a tidy list of file extensions ranked by relative commonality. Not completely scientific but still useful.

about:ext-survey would need to provide a mechanism to copy the results to the clipboard, and should include the bbcode \[code\] block.

Bonus: Add the Windows Registry description of the file extension if it has one.

Code: Select all

.xlsx 0.0068 Microsoft Excel Worksheet
.exe 0.0014 Application
void
Developer
Posts: 16743
Joined: Fri Oct 16, 2009 11:31 pm

Re: Request: about:ext-survey -- File extension forum survey.

Post by void »

Done.

I had some existing code around to do this already.
I used extension frequency to look into database compression.

I need some input on constraints..

Maximum number of extensions. -I've set the default to 999 which seems fine.
Minimum number of times the extension occurs to be included. -I've set the default to 100.
Maximum extension length. -I've set the default to 6.

I have privacy concerns, for example: file.private-data would leak private-data.
Restricting the maximum extension length to 6 would mitigate the issue.
I would like the survey to be anonymous.
Users could post the results here:
https://www.voidtools.com/contact/
with email set to ext-survey
I could then post the results here.

The total file count can be inferred from the last extension, since the count will be around 100.



Example output of a fresh Windows 10 install:

Code: Select all

001 0.143637 dll
002 0.115428 cat
003 0.105505 mum
004 0.095041 mui
005 0.056205 png
006 0.032803 rtf
007 0.019707 exe
008 0.017229 
009 0.013628 xrm-ms
010 0.013576 xml
011 0.013371 adml
012 0.010506 pri
013 0.008053 inf
014 0.007691 sys
015 0.007525 cdf-ms
016 0.007454 js
017 0.006701 htm
018 0.006343 mfl
019 0.006198 html
020 0.004543 mid
021 0.004189 cdxml
022 0.004177 mof
023 0.003963 etl
024 0.003869 winmd
025 0.003836 dat
026 0.002989 svg
027 0.002823 txt
028 0.002663 man
029 0.002574 ttf
030 0.002465 css
031 0.002407 psd1
032 0.002371 ps1xml
033 0.002367 WMF
034 0.002238 ps1
035 0.002102 bin
036 0.001988 ini
037 0.001832 admx
038 0.001748 mun
039 0.001722 lnk
040 0.001705 cov
041 0.001683 p7x
042 0.001367 jpg
043 0.001320 xsd
044 0.001250 resx
045 0.001241 xbf
046 0.001239 gif
047 0.001112 NLS
048 0.001111 wav
049 0.001087 dl_
050 0.001065 fon
051 0.001061 cur
052 0.001043 LOG1
053 0.001043 LOG2
054 0.001037 loggz
055 0.001012 evtx
056 0.000925 log
057 0.000912 psm1
058 0.000904 ico
059 0.000876 cab
060 0.000860 msc
061 0.000836 lock
062 0.000823 config
063 0.000801 VSSX
064 0.000788 da_
065 0.000776 PNF
066 0.000762 pp_
067 0.000760 odlgz
068 0.000731 json
069 0.000704 aux
070 0.000692 pf
071 0.000621 chm
072 0.000575 tlb
073 0.000569 xsl
074 0.000560 sql
075 0.000503 gpd
076 0.000495 table
077 0.000465 cpl
078 0.000458 VSTX
079 0.000457 ppkg
080 0.000447 ch_
081 0.000426 tif
082 0.000425 pak
083 0.000366 aspx
084 0.000301 h
085 0.000298 DATA
086 0.000290 db
087 0.000283 ax
088 0.000279 qml
089 0.000252 vbs
090 0.000247 bmp
091 0.000238 ocx
092 0.000213 efi
093 0.000199 lex
094 0.000196 POC
095 0.000195 pma
096 0.000194 nlp
097 0.000184 xaml
098 0.000181 in_
099 0.000172 com
100 0.000170 wer
101 0.000168 dds
102 0.000166 Crwl
103 0.000161 msi
104 0.000144 gdl
105 0.000144 rs
106 0.000138 rll
107 0.000138 tmp
108 0.000136 hyb
109 0.000133 jrs
110 0.000133 ttc
111 0.000122 ascx
112 0.000122 sdb
113 0.000120 DPV
114 0.000110 drv
115 0.000110 wpl
116 0.000109 vmsg
117 0.000105 cso
118 0.000104 CFG
119 0.000103 mo
120 0.000099 bat
121 0.000093 HLP
122 0.000092 CMD


Some data is already available in Everything:
Right click the result list column header and click add columns...
Select the Extension Frequency and click OK.
Click the Extension Frequency column header to gather and sort by extension frequency.
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Request: about:ext-survey -- File extension forum survey.

Post by raccoon »

I compiled my own results last night using Notepad++ and some scripts. I'll send you my full results with 12 character length extensions. Mind that I have 30 years of old files slopped from every old harddrive I've owned and my parents owned onto new harddrives. Sent to https://www.voidtools.com/contact/

Looking at my data, I'm having second thoughts about the approach of total number of instances alone, and I think relative total byte size may be a necessary metric. Exclude 001 002 003 in case someone wants to remove a line anonymously.

Code: Select all

;rank size ext description
0.143637 0.000628 dll Application Extension
A huge swath of file extensions are just nonsensical software package support files, such as language translations, manifests, document templates, brushes, and all sorts of browser and cross-platform software (Linux on Windows) library stuff. Millions of tiny files that consume almost no disk space.

Maybe exclude any file extensions that do not have a registry entry. If the user isn't using a file viewer, they probably don't want a filter for those files? Knowing the registry entry will help you group files by filter type. Eg, LibreOffice Files.

If having the data sent directly to you, I don't know if limits should be necessary, or they should be loosened. Survey all the extensions, it's voluntary. That may be the only way to spot really unnecessary Filter defaults that don't benefit anyone.

Exclude entries with the @ symbol in case of really short email addresses.

I had 10,428 results. 5,005 with 2+ occurrences. 2,842 with 3+. 2,529 with 4+. 2,186 with 5+. 1,000 had 29+ occurrences, but I only have 26 *.arj and 20 *.ace (Compressed) files that would have been omitted from your proposed constraints.

Can this become an Extension Sidebar? :)
NotNull
Posts: 5461
Joined: Wed May 24, 2017 9:22 pm

Re: Request: about:ext-survey -- File extension forum survey.

Post by NotNull »

Or just get the list of filetypes from here.
void
Developer
Posts: 16743
Joined: Fri Oct 16, 2009 11:31 pm

Re: Request: about:ext-survey -- File extension forum survey.

Post by void »

Thanks for the link NotNull,

Looks like Everything is using most of the common file types from https://fileinfo.com/filetypes/common already.


Maybe exclude any file extensions that do not have a registry entry
This was how most of the default filters were generated for Everything 1.5.
I used HKEY_CLASSES_ROOT to build a list of well known extensions from a fresh Windows 10 install.

Everything 1.5.0.1291a adds an about:ext-survey search command.
void
Developer
Posts: 16743
Joined: Fri Oct 16, 2009 11:31 pm

Re: Request: about:ext-survey -- File extension forum survey.

Post by void »

Code: Select all

001 0.289801 
002 0.073812 txt
003 0.062824 dll
004 0.023128 mui
005 0.022354 mp3
006 0.021267 cat
007 0.020782 png
008 0.019572 mp4
009 0.019449 mkv
010 0.019423 cbz
011 0.016236 jpg
012 0.015385 html
013 0.015316 js
014 0.015242 doc
015 0.015022 exe
016 0.010706 mum
017 0.008322 aiff
018 0.007955 pdf
019 0.007679 INF
020 0.007530 zip
021 0.007477 GPD
022 0.007308 gif
023 0.005601 man
024 0.005568 CR2
025 0.005549 epub
026 0.005548 264
027 0.005548 x264
028 0.005506 XML
029 0.005430 ini
030 0.004802 sys
031 0.004101 mof
032 0.004092 dtd
033 0.003868 bat
034 0.003226 PNF
035 0.003213 cdf-ms
036 0.003027 csv
037 0.002973 1
038 0.002888 jsm
039 0.002776 change
040 0.002774 dng
041 0.002774 xyz
042 0.002487 htm
043 0.002413 css
044 0.002199 PPD
045 0.002134 xrm-ms
046 0.001950 bmp
047 0.001900 wav
048 0.001863 svg
049 0.001828 slg
050 0.001747 ttf
051 0.001689 dat
052 0.001666 sqlite
053 0.001612 chm
054 0.001567 vim
055 0.001522 xpi
056 0.001500 bbx
057 0.001448 json
058 0.001408 lnk
059 0.001376 NLS
060 0.001372 ico
061 0.001274 ARJ
062 0.001271 bin
063 0.001259 CUR
064 0.001189 rtf
065 0.001175 ps1
066 0.001108 fx
067 0.001106 mo
068 0.001090 lng
069 0.001080 fon
070 0.001080 xul
071 0.001066 brc
072 0.000881 00
073 0.000811 rdf
074 0.000764 utl
075 0.000761 config
076 0.000757 CD
077 0.000731 resx
078 0.000726 pset
079 0.000722 JE
080 0.000715 tlb
081 0.000688 aux
082 0.000685 mfl
083 0.000678 ptxml
084 0.000655 h
085 0.000640 src
086 0.000635 h1s
087 0.000629 CFG
088 0.000567 gbf
089 0.000558 OLD
090 0.000548 log
091 0.000533 isl
092 0.000495 adml
093 0.000495 admx
094 0.000487 AP
095 0.000487 nlp
096 0.000484 ax
097 0.000475 EXP
098 0.000474 tab
099 0.000461 db
100 0.000460 qm
101 0.000455 bcmap
102 0.000441 SQL
103 0.000434 msc
104 0.000419 ICC
105 0.000409 wmv
106 0.000391 aspx
107 0.000391 xls
108 0.000386 plist
109 0.000377 7z
110 0.000375 CAB
111 0.000352 DXT
112 0.000341 pyc
113 0.000330 lst
114 0.000329 XSD
115 0.000326 hlsl
116 0.000325 dl_
117 0.000311 PR
118 0.000298 icm
119 0.000288 xhtml
120 0.000283 cpl
121 0.000281 WMF
122 0.000274 evtx
123 0.000273 DEP
124 0.000258 CASH
125 0.000256 TBL
126 0.000251 GDL
127 0.000245 wsz
128 0.000239 ocx
129 0.000235 COM
130 0.000235 conf
131 0.000231 hlp
132 0.000227 psd1
133 0.000226 spl
134 0.000223 01
135 0.000223 reg
136 0.000222 rar
137 0.000219 MST
138 0.000218 tmp
139 0.000216 rs
140 0.000216 wbb
141 0.000211 ps1xml
142 0.000211 url
143 0.000205 LIB
144 0.000203 qmlc
145 0.000196 nef
146 0.000195 CR
147 0.000195 final
148 0.000195 msi
149 0.000193 py
150 0.000192 xsl
151 0.000191 SALES
152 0.000186 IMD
153 0.000184 dic
154 0.000181 ime
155 0.000181 lang
156 0.000180 vlpset
157 0.000178 hfp
158 0.000178 vbs
159 0.000176 CKA
160 0.000174 pf
161 0.000173 bcm
162 0.000169 INC
163 0.000163 glsl
164 0.000161 court
165 0.000158 2010
166 0.000158 X
167 0.000157 lrc
168 0.000155 pak
169 0.000153 jar
170 0.000149 emf
171 0.000149 iss
172 0.000147 am
173 0.000146 90
174 0.000146 me
175 0.000140 chk
176 0.000139 pub
177 0.000136 ASH
obz
Posts: 6
Joined: Tue Jan 04, 2022 2:27 pm

Re: Request: about:ext-survey -- File extension forum survey.

Post by obz »

my list, re-ordered and commented:

Code: Select all

Image formats and sidecars:
011 0.019124 CR2 (and CR3) Canon RAW image files
048 0.002135 ARW Sony RAW image files
029 0.007277 xmp image sidecars
016 0.016346 dop DxO Photolab sidecars

Electronic circuit design:
020 0.012323 asc LTSpice schematics (circuit simulation)
019 0.012340 asy LTSpice symbols
039 0.004015 lbr Eagle libraries
054 0.001832 sch Electronic Schematics
059 0.001558 ulp Eagle programs
060 0.001504 brd Eagle PCB/Board files
085 0.000840 scr Eagle scripts
202 0.000226 epf Eagle control file
211 0.000210 gbr PCB data (copper tracks, drill information)

Programming related:
080 0.000915 ld
130 0.000439 LST
147 0.000360 OBJ
218 0.000202 bas
248 0.000164 rc
253 0.000157 inc
254 0.000157 ls
257 0.000151 HEX

088 0.000788 DWG CAD drawing
144 0.000368 dxf CAD drawing
152 0.000348 stp 3D CAD file
207 0.000214 FCStd Freecad

197 0.000245 kmz Google geodata
249 0.000163 kml Google geodata
262 0.000142 gpx GPS tracks
228 0.000189 cup Waypoint and task file for gliding computers
070 0.001145 IGC GPS tracks from gliding computers

151 0.000352 ps Postscript
261 0.000147 eps Postscript

131 0.000428 db
132 0.000407 bin
164 0.000315 apk Android package
NotNull
Posts: 5461
Joined: Wed May 24, 2017 9:22 pm

Re: Request: about:ext-survey -- File extension forum survey.

Post by NotNull »

obz wrote: Fri Jan 07, 2022 5:11 pm

Code: Select all

Electronic circuit design:
211 0.000210 gbr PCB data (copper tracks, drill information)
FWIW:
.gb
is also a typical Gerber file extension.
Laus
Posts: 17
Joined: Sun Jul 04, 2021 11:44 am

Re: Request: about:ext-survey -- File extension forum survey.

Post by Laus »

Here's another list:

Code: Select all

001 0.642841 lnk
002 0.081694 xmp
003 0.081471 NEF
004 0.076360 jpg
005 0.011436 dll
006 0.009308 html
007 0.008648 png
008 0.008521 icm
009 0.008252 
010 0.005384 js
011 0.004547 mp3
012 0.003167 pyc
013 0.002816 mui
014 0.002611 py
015 0.002489 svg
016 0.002016 exe
017 0.001824 tif
018 0.001390 xml
019 0.001340 cat
020 0.001335 json
021 0.001073 bin
022 0.000981 opf
023 0.000909 epub
024 0.000909 pdf
025 0.000862 txt
026 0.000851 gz
027 0.000832 rs
028 0.000703 cdf-ms
029 0.000703 xrm-ms
030 0.000699 inf
031 0.000664 dcp
032 0.000656 qm
033 0.000627 sys
034 0.000624 mum
035 0.000540 qml
036 0.000535 lcp
037 0.000534 md
038 0.000501 pyi
039 0.000499 code
040 0.000476 css
041 0.000418 gif
042 0.000395 vim
043 0.000342 ini
044 0.000333 dat
045 0.000322 ui
046 0.000321 ico
047 0.000307 odt
048 0.000297 ttf
049 0.000295 pri
050 0.000291 mof
051 0.000289 pm
052 0.000279 db
053 0.000278 etl
054 0.000265 ctx
055 0.000260 log
056 0.000254 mo
057 0.000250 cdxml
058 0.000240 acda
059 0.000231 ps1
060 0.000221 resx
061 0.000217 sip
062 0.000217 qmlc
063 0.000211 BMP
064 0.000204 pak
065 0.000201 ps1xml
066 0.000195 h
067 0.000190 winmd
068 0.000171 FRM
069 0.000165 psd1
070 0.000164 LSP
071 0.000161 fp
072 0.000159 pl
073 0.000159 dbf
074 0.000158 cdx
075 0.000150 mfl
076 0.000150 ts
077 0.000148 man
078 0.000145 pyd
079 0.000144 csv
080 0.000140 p7x
081 0.000136 admx
082 0.000134 adml
083 0.000134 mun
084 0.000130 config
085 0.000129 done
086 0.000125 php
087 0.000122 xsd
088 0.000115 tcl
089 0.000112 out
090 0.000104 fon
091 0.000102 chm
092 0.000102 doc
093 0.000100 vbp
094 0.000100 vbw
095 0.000099 cur
096 0.000098 preset
097 0.000097 001
098 0.000097 000
099 0.000096 xsl
100 0.000094 cache
101 0.000094 rtf
102 0.000093 vb
103 0.000092 PNF
104 0.000088 icc
105 0.000086 ctl
106 0.000083 ods
107 0.000083 wav
108 0.000081 sql
109 0.000080 frx
110 0.000080 jar
111 0.000080 msg
112 0.000079 bas
113 0.000078 jsx
114 0.000078 tlb
115 0.000077 cube
116 0.000076 psm1
117 0.000076 xbf
118 0.000075 zip
119 0.000071 aspx
120 0.000069 NLS
121 0.000068 pdb
122 0.000067 jsm
123 0.000067 pf
124 0.000066 dwg
125 0.000065 LOG1
126 0.000065 LOG2
127 0.000064 db_
128 0.000064 eml
129 0.000064 lock
130 0.000064 tree
131 0.000064 cfs
132 0.000064 gen
133 0.000064 ht_
134 0.000064 key_
135 0.000063 woff2
136 0.000063 htm
137 0.000063 psd
138 0.000062 fpt
139 0.000062 pkl
140 0.000059 aux
141 0.000058 psf
142 0.000056 toml
143 0.000055 toc
144 0.000054 zdct
145 0.000053 emf
146 0.000053 yml
147 0.000052 m3u
148 0.000051 cls
149 0.000051 sqlite
150 0.000049 len
151 0.000048 xls
152 0.000047 apl
153 0.000046 3PP
154 0.000046 tlog
155 0.000046 cs
156 0.000045 woff
157 0.000045 lm
158 0.000045 enc
159 0.000044 old
160 0.000044 msc
161 0.000043 evtx
162 0.000043 data
163 0.000042 dic
164 0.000042 nlp
165 0.000042 sample
166 0.000040 1
167 0.000040 3DL
168 0.000039 cso
169 0.000037 dwt
170 0.000037 arx
171 0.000036 crate
172 0.000036 msi
173 0.000036 orig
174 0.000035 table
175 0.000035 vdi
176 0.000034 deploy
177 0.000034 RAF
178 0.000034 cot
179 0.000033 mat
180 0.000032 xba
181 0.000032 cof
182 0.000032 cop
183 0.000032 cpl
184 0.000031 2
185 0.000031 cab
186 0.000031 lng
187 0.000030 ADO
188 0.000030 cos
189 0.000030 cpp
190 0.000030 shx
191 0.000029 ppkg
192 0.000029 glsl
193 0.000029 pma
194 0.000028 msf
195 0.000028 t
196 0.000027 cmd
197 0.000027 lib
198 0.000027 m4a
199 0.000027 DCL
As you can see I mainly index raw image files (.NEF) and their side car files (.xmp) that are stored on net shares. However the vast majority of the files are tiny short link files (.lnk) that I use to map tags stored in .xmp files into a huge folder tree. Not very elegant but it works. [sigh] I have plans to map the tags into proxy .jpg proxy files instead that are supported by ET.

Laus
meteorquake
Posts: 500
Joined: Thu Dec 15, 2016 9:44 pm

Re: Request: about:ext-survey -- File extension forum survey.

Post by meteorquake »

An alternative approach is to compile statistics on which extensions are acted upon in some way by the user (double or right-clicked/dragged/renamed etc) since this will usually target what they are actually seeking.
It could even use this to offer adjusted filters to each individual without uploading any results, though a manual upload could send it in.

d
therube
Posts: 4977
Joined: Thu Sep 03, 2009 6:48 pm

Re: Request: about:ext-survey -- File extension forum survey.

Post by therube »

(IMO... a default is a default. And a user can add or remove as their needs dictate & change.
"Acted upon" isn't really relevant, necessarily, as files may just "be" & are never "acted on", or if so, are acted on completely outside of Everything. [Yes, I know, that is unheard of, but ;-).])

(BTW, my list is what void posted above. I can tell, at this later time, by reading down it ;-).
And the listed frequencies are not indicative of what I do, or don't do, with said extensions.)
NotNull
Posts: 5461
Joined: Wed May 24, 2017 9:22 pm

Re: Request: about:ext-survey -- File extension forum survey.

Post by NotNull »

How long should this survey be part of Everything?
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Request: about:ext-survey -- File extension forum survey.

Post by raccoon »

I learned what I needed to. I still think it would be useful information to @void if he collected this data, but also collected the file type program associations to get a better idea of what software opens/plays/views what filetypes, to round off the default filters of extensions used by leading software.
void
Developer
Posts: 16743
Joined: Fri Oct 16, 2009 11:31 pm

Re: Request: about:ext-survey -- File extension forum survey.

Post by void »

I'll remove the survey in the next alpha update.

Thank you to those that participated.
void
Developer
Posts: 16743
Joined: Fri Oct 16, 2009 11:31 pm

Re: Request: about:ext-survey -- File extension forum survey.

Post by void »

To quickly find extension frequencies in Everything, please try the following search:

add-columns:Extension;"Extension Frequency" distinct:"Extension" sort:"Extension Frequency"


Extension Frequency
Post Reply