RSS 2.0
Sign In
# Friday, August 12, 2011

1. query.dll vs tquery.dll

We have installed Windows Search 4 on a Windows 2003 server. The goal was to index huge compressed xml files (see Windows Search Notifications). But for some reason it did not want to index content.

No "select System.ItemUrl from SystemIndex where contains('...')" has ever returned a row.

We thought that the problem was in our protocol handler, and tried to localize it, but finally have discovered that Windows Search is not able to find anything within text files.

Registry comparision has shown that *.txt extension was indexed by the IFilter defined in the query.dll, while on the other computers, where everything worked, the implementation was in the tquery.dll.

Both libraries were present on the Windows 2003 server, so we have corrected the registry and everything has started to work.

As far as we understand query.dll is part of legacy Indexing Service, and tquery.dll is up to date implementation.

2. Search index size

We have to index a considerable amout of data. But before we can do it we have to estimate the size of index.

In the past it seems we saw somewhere a statement that search index needs a storage that's about 10% of original data for its purposes. Unfortunatelly we cannot find this estimation at present, neither we cannot find any other estimation. This complicates our planning.

To get empirical estimate we've indexed several thousands *.xml-gz files, which are gz'ed big xmls. The total size of this files is about 4.5GB. Total uncompressed size of xmls ~50GB. Xml contained about 10 millions pages of data.

According to 10% criteria we had to arrive to ~5GB search index.

But what we have discovered is that the index has grown to more than 50GB. That's very disappointing. We cannot afford such expense, as we've commited test on a tiny part of data, which increases over time.

So, the solution is to find out what's wrong, and how can it be cured, or to fulltext index only most recent subset of data.

P.S. We have tried to mark folder with search index as compressed, but it did not work.

P.P.S. We have found the reference to Windows Search 4 index size estimation. It is in Windows Search Frequently Asked Questions, see answer on "What is average size of a user's index?" question.

Friday, August 12, 2011 9:20:18 AM UTC  #    Comments [0] -
Thinking aloud | Tips and tricks | Window Search
All comments require the approval of the site owner before being displayed.
Home page

Comment (Some html is allowed: a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

[Captcha]Enter the code shown (prevents robots):

Live Comment Preview
<August 2011>
Total Posts: 387
This Year: 3
This Month: 0
This Week: 0
Comments: 1289
Locations of visitors to this page
The opinions expressed herein are our own personal opinions and do not represent our employer's view in anyway.

© 2024, Nesterovsky bros
All Content © 2024, Nesterovsky bros
DasBlog theme 'Business' created by Christoph De Baene (delarou)