Problem indexing content

Discussion related to "Everything" 1.5 Alpha.
Post Reply
klim
Posts: 13
Joined: Sun Feb 17, 2013 8:09 pm

Problem indexing content

Post by klim »

Hi, this is the first time I tried to search for indexed file content, but I've a problem with that feature.
I didn't understand the working principle how everything is indexing content.

e.g.
Settings:
Include only folder: X:\RFQ;
Include only files: *.txt

But "everything search" is indexing files from other/all locations, as you can see in the attachment, but why?
Attachments
indexing.png
indexing.png (19.38 KiB) Viewed 4107 times
void
Developer
Posts: 16754
Joined: Fri Oct 16, 2009 11:31 pm

Re: Problem indexing content

Post by void »

This tooltip also shows properties being indexed.

Are you indexing any properties under Tools -> Options -> Properties?
PeterPan8
Posts: 23
Joined: Mon May 01, 2023 7:00 pm

Re: Problem indexing content

Post by PeterPan8 »

I have some sort of similar problem using content.
Every time I open up SE, and I type something like content:"text to search"
The Indexing properties option starts right away, and it takes forever to complete, like 2-3 hours, or more!
I admit that I have a few Gb of data, but I limit my search to a specific path and folder. For instance, I have Tools\Options\Content setup like this:
Under Content:
Index file content. Checked.
Include only folders: D:\MyHardDrive\Folder2Index
Exclude folders: Empty
Include only files: *.doc;*.docx;*.pdf
Exclude files: Many extensions here.
Exclude not content indexed. Unchecked.
Exclude recall on data access: Checked.
Max size: 20 Mb.

I thought that the indexing properties would remain "somewhere", to make sure that SE doesn't re-index every time it starts. But it's not the case. So my computer is slowing down each time I start SE.

Any solutions?
void
Developer
Posts: 16754
Joined: Fri Oct 16, 2009 11:31 pm

Re: Problem indexing content

Post by void »

Check Tools -> Debug -> Statistics -> Build -> Last rebuild reason.
Check Tools -> Debug -> Statistics -> Build -> Last build date.
Check Tools -> Debug -> Statistics -> Save -> Last save date.
What is shown for these values?
PeterPan8
Posts: 23
Joined: Mon May 01, 2023 7:00 pm

Re: Problem indexing content

Post by PeterPan8 »

Under Build:
Last build date: 17-Sep-24 08:58
Last rebuild reason: invalid parent folder index 1710478 >= 726678 + 8

Under Save:
Last save date: 18-Sep-24 04:22

I stopped the indexing process yesterday, and put my computer to Sleep mode before turning it on early this morning, so those values may be wrong. I was wondering about the "Last rebuild reason", what does "Invalid parent folder" means, and what are those numbers?

How can I do a full rebuild, and maintain that database so SE doesn't have to redo it each time it's starting?
void
Developer
Posts: 16754
Joined: Fri Oct 16, 2009 11:31 pm

Re: Problem indexing content

Post by void »

Last rebuild reason: invalid parent folder index 1710478 >= 726678 + 8
This is database corruption.



Could you please send your Help -> Troubleshooting information to support@voidtools.com
Could you please send your Tools -> Debug -> Statistics to support@voidtools.com

I would like to do some tests my end.



Everything saves your database to disk on exit. (File -> Exit or shutdown Windows)
Everything saves your database to disk daily. (at 4am, or the next time a UI window is closed)
Something is going wrong with updating your database in memory.

Please try exiting Everything (File -> Exit) and restarting Everything.
Does Everything load immediately?
PeterPan8
Posts: 23
Joined: Mon May 01, 2023 7:00 pm

Re: Problem indexing content

Post by PeterPan8 »

My normal daily procedure is to use Windows Sleep function whenever I shut down for the day. Which means that some software are still there, but in sleeping mode. About once a week, I completely shut down Windows to have a clean restart.

As a translator, I usually keep 2 or 3 Word files open, some specialized dictionaries, xChange PDF viewer, Firefox and Thunderbird email client, and also Search Everything. So whenever I start up in the morning all of those are readily available. Otherwise, my (rather old) computer is taking a very long time to load, 15-20 minutes, including loading the above apps.

My system is Windows 10 (latest updates) and 16Gb RAM, no SSD (unfortunately!)

When Everything is loaded, it usually sits in the Taskbar, hidden on the right side. If I shut it down, I noticed that it stills reside in "memory" or as a service, from what I can see in Task Manager. So even though I shut Everything, I need to use the "end task" in Task Manager to really shut it down, and only then am I able to open it up again.

I'm sending you the Troubleshooting and Statistics file separately.

Thank you for your time to look into this. It's appreciated!
void
Developer
Posts: 16754
Joined: Fri Oct 16, 2009 11:31 pm

Re: Problem indexing content

Post by void »

Thank you for the troubleshooting information and statistics.

I found the issue by running Everything with your config from the troubleshooting information.

There is an issue with saving/loading the include-only-files setting when you have over 128 filters.



The issue will be fixed in the next alpha update.
I will have an update soon.



For now, please reduce the number of Tools -> Options -> Exclude -> include-only-file filters to 127 or less.
PeterPan8
Posts: 23
Joined: Mon May 01, 2023 7:00 pm

Re: Problem indexing content

Post by PeterPan8 »

Thank you!
For now, I removed all 128+ filters, and I restarted Everything. It's been doing an indexing since early this morning, non-stop (more than 5-6 hours), and it's at 51% as seen in the "Indexing properties" box.

In "Content", I have checked the box "Index file content" because I need to search for text inside many documents. I'm limiting my search to include only one very large folder, many Gb, in which I only include docx, pdf and eml data.

Is it normal that it take so long to do the indexing (it's now at 2Gb)?
Or could it be that it's searching in the former database still sitting somewhere?
What happens if I shut off my computer, will the indexing restart from the start, or continue where it left off?

I'm hoping to start from scratch with the new update.
void
Developer
Posts: 16754
Joined: Fri Oct 16, 2009 11:31 pm

Re: Problem indexing content

Post by void »

Indexing content will take a very long time.

I recommend keeping the index under 1GB.

Please try reducing the number of files content-indexed.



You can exit Everything.
Everything will save the content indexing progress.
Content indexing will resume the next time you start Everything.
PeterPan8
Posts: 23
Joined: Mon May 01, 2023 7:00 pm

Re: Problem indexing content

Post by PeterPan8 »

Everything will save the content indexing progress.
Content indexing will resume the next time you start Everything.
I shut down my computer yesterday, while Everything was at about 74% on Indexing properties. I was hoping, like you said, that this morning it would resume the indexing process. It was not the case. As soon as I opened Everything, searching for some Content, it re-started the whole process from scratch, re-indexing the properties. I had to Pause it, and it will remain as such until someone finds a solution.

Thanks for still looking into this situation.
void
Developer
Posts: 16754
Joined: Fri Oct 16, 2009 11:31 pm

Re: Problem indexing content

Post by void »

Please check your last rebuild info, what is shown for these values:

Check Tools -> Debug -> Statistics -> Build -> Last rebuild reason.
Check Tools -> Debug -> Statistics -> Build -> Last build date.
Check Tools -> Debug -> Statistics -> Save -> Last save date.
PeterPan8
Posts: 23
Joined: Mon May 01, 2023 7:00 pm

Re: Problem indexing content

Post by PeterPan8 »

Last rebuild reason: invalid parent folder index 1710478 >= 726678 + 8
Last build date: 23-Sep-24 11:18
Last save date: None.

If this has any value: My Indexing is turned off in Windows.
Should it be on to use Everything?
void
Developer
Posts: 16754
Joined: Fri Oct 16, 2009 11:31 pm

Re: Problem indexing content

Post by void »

Everything doesn't use the Microsoft system index for indexing content.



You can search your system index in Everything with the si: search function:

For example:

si:"My content search"



Are your include only file filters sticking?
-Please check Tools -> Options -> Exclude -> Include only files.
PeterPan8
Posts: 23
Joined: Mon May 01, 2023 7:00 pm

Re: Problem indexing content

Post by PeterPan8 »

I don't quite understand the search function si:"xxx"

Tools -> Options -> Exclude -> Include only files.
I still had the bunch of file extensions (~190), which I thought I had removed!

I'll give it another try without including any extensions, and see how it works.
void
Developer
Posts: 16754
Joined: Fri Oct 16, 2009 11:31 pm

Re: Problem indexing content

Post by void »

si: will search your Microsoft Windows system index.
No need to index content in Everything if you use windows indexing.

Searching the Windows index
si:


Tools -> Options -> Exclude -> Include only files.
I still had the bunch of file extensions (~190), which I thought I had removed!
After removing the filters, please try restarting Everything to see if the setting is saved:
File -> Exit
Restart Everything
Check your Tools -> Options -> Exclude -> "Include only files" setting.
PeterPan8
Posts: 23
Joined: Mon May 01, 2023 7:00 pm

Re: Problem indexing content

Post by PeterPan8 »

You said to index no more than 1 Gb of data. I am indexing the content of full hard drive (600 Gb), but I use the following setup:

Options - Indexes - Exclude - "Include only files" where I have only those extensions: *.doc;*.docx;*.pdf
and I am also using the same rule under Content - Include only files.

That hard drive may have about 1 or 2 Gb of data with those file types.

Would Everything limit the indexing to those file extensions only, or would it still index the whole hard drive?
If so, is there a way to limit the scope of the "Indexing properties"?
void
Developer
Posts: 16754
Joined: Fri Oct 16, 2009 11:31 pm

Re: Problem indexing content

Post by void »

Please try to keep the indexed size under 1GB (Tools -> Debug -> Statistics -> File data size)

You can index the full drive (600 GB) as long as this isn't 600 GB of plain text files ;)


Options - Indexes - Exclude - "Include only files" where I have only those extensions: *.doc;*.docx;*.pdf
and I am also using the same rule under Content - Include only files.
This is fine, this will help reduce the indexed size.
Everything indexes only plain text. The plain text inside doc, docx, pdf should be rather small compared to the file size as images and formatting is ignored.



Would Everything limit the indexing to those file extensions only, or would it still index the whole hard drive?
Only the files you specify under Tools -> Options -> Content -> "Include only files" are scanned.
Only indexed files will be scanned (Tools -> Options -> Exclude -> Include only files)
In this case, only doc, docx and pdf files will be scanned.
PeterPan8
Posts: 23
Joined: Mon May 01, 2023 7:00 pm

Re: Problem indexing content

Post by PeterPan8 »

I'm still trying to understand "deeper" search in Everything. For instance, I just found out how to do multiple and specific searches.

For those interested, that's now what I do to search for any file with extension .doc or .docx, including the word(s) I'm looking for:
*.doc* content:word1 content:word2 content:word3

To search for a string of characters using a similar argument which will find Word documents containing word1 with a specific string of text:
*.doc* content:word1 content:"string of characters"

I'm sure there are plenty of other tricks to use, but I didn't have (or didn't take) the time to find them!
The more I learn how to use Search Everything the more powerful it gets!

I have two questions:
1. Where can I find more info on the types of searches I can do with Everything?
2. You already mentioned that we should limit our search database to <1 Gb. But is there a way to speed up the Indexing process when I have folders containing thousands of Word documents, with a size of more than 50 Gb of data? Reason: I am a writer and translator, and I need all of those for my research. Those files could contain whole books in two languages, research papers, and more.
horst.epp
Posts: 1447
Joined: Fri Apr 04, 2014 3:24 pm

Re: Problem indexing content

Post by horst.epp »

PeterPan8 wrote: Fri Sep 27, 2024 2:08 pm 2. You already mentioned that we should limit our search database to <1 Gb. But is there a way to speed up the Indexing process when I have folders containing thousands of Word documents, with a size of more than 50 Gb of data? Reason: I am a writer and translator, and I need all of those for my research. Those files could contain whole books in two languages, research papers, and more.
This is a job for the Windows indexer.
It's not limited in size and can be searched using Everythings
si:
void
Developer
Posts: 16754
Joined: Fri Oct 16, 2009 11:31 pm

Re: Problem indexing content

Post by void »

For those interested, that's now what I do to search for any file with extension .doc or .docx, including the word(s) I'm looking for:
*.doc* content:word1 content:word2 content:word3
The following search does the same:
*.doc* content:<word1 word2 word3>
(should be easier to type)

Subexpresssions


1. Where can I find more info on the types of searches I can do with Everything?
Everything -> Help -> Search syntax for the basics.
Search functions for all searches with examples.


You already mentioned that we should limit our search database to <1 Gb. But is there a way to speed up the Indexing process when I have folders containing thousands of Word documents, with a size of more than 50 Gb of data? Reason: I am a writer and translator, and I need all of those for my research. Those files could contain whole books in two languages, research papers, and more.
Store the files on a good NVMe SSD drive.
No need to use Everything indexing.
Everything will read all the content without indexing in a few seconds.

If you are storing on a SSD, please make sure Everything is using multiple threads under
Tools -> Debug -> Statistics -> NTFS Index (C:) -> Multithreaded

Everything will index files on HDD as fast as possible.

Everything should only index once.
Post Reply