Subject: [htdig] Problems with htdig 3.1.4
From: Phillip Morgan (firstname.lastname@example.org)
Date: Sat Jan 01 2000 - 20:11:12 PST
Hi fellow time travellers,
I have recently installed htdig 3.1.4 and I find that it now indexes
only 1300 of my 60,000+ documnents that the old v2.xx version I was
I have several urls like so...
and so on.. it only processes the first two. The first one of these has
a directory containing over 60,000 documents. There is a valid trail
leading from one doc to the next.. It used to work on the old version.
Second, The descriptions of some documents contain the word <TITLE>.
(Not the official title used for the html doc), and htdig spits the
dummy reporting that this may be search spamming. Is this just a
warning, and does it drop the doc from the index? How can I get rid of
the warning/problem without removing the <TITLE> description (since the
docs are automatically generated)?
Third, It seems to me, despite modifying the valid_punctuation and
extra_word_character commands, that any file starting with # is ignored.
In fact, it appears to throw htdig into a frenzy. What it does it report
that the entire directory cannot be found, after about a 30 second
For example, a file #dummy.zip lives at
http://www.netbiz.net.au/SEARCH/#dummy.zip. Htdig says it cannot find
I've tried as many variants of the configurations that I can think of,
but I can't get it to index all the listed urls and all of the docs for
each url. Can anyone offer some assistance?
btw: The system is a slackware 4.0 linux (kernel 2.2.6), 192 mb RAM
30gigs disk etc.
NetBiz Internet Services | ICQ: 12796450 P.O. Box 449, Croydon 3136 | FTN: 3:633/252 Email: email@example.com | Vox: +61 3 9876 5295
------------------------------------ To unsubscribe from the htdig mailing list, send a message to firstname.lastname@example.org You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Sat Jan 01 2000 - 20:25:14 PST