Re: question for FAQ, to creators of htdig search engine


Geoff Hutchison (Geoffrey.R.Hutchison@williams.edu)
Sun, 08 Nov 1998 15:57:17 -0500


Hello...

At 12:55 PM -0500 11/8/98, Sadhunathan Nadesan wrote:
>ok, well i looked for this configuration variable.. i didn't find it. sorry
>if i was looking in the wrong place. can you help?
>
>anyway, this variable mentioned is not in the config file i looked in
> (by which i assume you mean htdig.conf
>in the conf directory).

Just because the variable isn't in the default htdig.conf file doesn't mean
you can't put it there. :-) I guess I'll revise the text in the htdig.conf
to make that a little more obvious. The conf file installed is meant as an
example.

>anyway, i am thinking that the value of this max_doc_size variable may be
>the problem we
>are having.

This is a good bet. See below.

>the document mentioned by the editor is called march97.html. it is 159K
...
>now to my question: max size of 100k .. does this mean the files over 100k
>are ignored? or
>only indexed up to the first 100k? and what should i do if my situation is
...

Large files are only indexed up to the "max_doc_size" variable. So if you
have a lot of big files, you'll want to bump this past your largest file.
You don't need to recompile anything, you'll just need to put something
like:
"max_doc_size: 200000" into your htdig.conf file.

>be the problem that the directory file itself is too big? but probably
>not, because this
>word chaudhary is the only instance we know of, of a word that did not get
>indexed.

The size of the directory is only a problem when you use
automatically-generated directory listings. But I'm sure there are other
words that are not indexed in those files.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:46 PST