Re: [htdig] More ht://Dig Evaluation Questions


Geoff Hutchison (ghutchis@wso.williams.edu)
Mon, 29 Mar 1999 12:19:19 -0500 (EST)


On Mon, 29 Mar 1999, Elizabeth Carmack wrote:

> 1. When indexing the site on second and subsequent times, does it just
> search for new files and add those to what it has indexed so far, or does
> it recreate everything from scratch?

This depends. If you run the htdig program with -i, it will recreate
everything from scratch. If the old databases are there and you don't run
with -i, it will do the former.

> 2. Can you set it up so that the site will be indexed any time of day you
> like?

Yeah, use 'at' or 'cron.' If you haven't heard of these, type "man cron"
or talk to a sysadmin who knows a little more about UNIX.

> 3. Does it understand a search query involving number ranges?

No.

> 4. When documents are searched for a particular term, does it weigh the
> document content/structure? For instance, does it take into consideration
> relative word ordering, word proximity, and word position in text?

Position, yes. Structure (i.e. markup), yes. Ordering and proximity? Not
yet.

> 5. Will it allow you to create custom concept/acronym definitions?

You bet. See the synonym file.

> 6. Does it understand natural language?

No. But I've tried out "natural language" searches, and I'm not convinced
they're actually "understanding" anything. They seem to simply be ignoring
very common words! (Yes, I'm sure there are people who will tell me I'm
wrong. I'm just telling you what seemed to come out of the "black box.")
So putting words like "the" and "what" in the bad_words file will ignore
them and should produce pretty good results.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Mon Mar 29 1999 - 11:16:36 PST