Re: [htdig] indexing full text documents


Subject: Re: [htdig] indexing full text documents
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Thu Mar 16 2000 - 08:06:53 PST


According to Brian Hancock:
> I've increased both max_doc_size and max_head_length. When I run htdig -v
> I see that it follows directories down only 2 levels. For instance
> from the root html directory it follows /tag/lwe/files.html but for
> /projects/spectator/text/files.html it stops at the spectator directory.
> Have I missed something in the configuration?

htdig does not follow directories. It follows hypertext links in
HTML documents. It's a subtle but extremely important difference.
You may need to check the documents where it doesn't seem to go any
deeper, when you think it should, to see why that's the case. It might
be that it just doesn't see any HTML tags linking to other documents.
Also, be aware that htdig doesn't follow JavaScript links.

As you mentioned that you installed htdig from an RPM, I'd also suggest
that you make sure you installed the correct one. There are three
different i386 binary RPMs, for three different C libraries. This is a
common problem. See http://www.htdig.org/files/binaries/README.RPMS.txt
for details.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Thu Mar 16 2000 - 07:04:27 PST