Re: [htdig] htdig-3.2.0b1 - htdig doesn't follow links


Subject: Re: [htdig] htdig-3.2.0b1 - htdig doesn't follow links
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Wed Apr 05 2000 - 17:01:45 PDT


At 11:25 AM +1200 4/6/00, glen.davies@cce.ac.nz wrote:
>I installed htdig-3.2.0b1 on a Dec Alpha running Debian Linux for
>testing.
>Configure ran ok and everything seemed to compile and install to
>the correct directories, but when I run htdig it only grabs the index
>page and doesn't follow the links (I have tried a few different servers
>that don't have any robots.txt files etc and get the same problem).
>Htmerge and htsearch run ok but I have a very small database of 1
>document.

My guess is that you've set start_url to point to a page and left
limit_urls_to to the default. In this case, only that page will be
indexed, because only that page matches the limit_urls_to attribute.
Example:

start_url: http://www.foo.com/bar.html

-> All links off of this page won't start with this URL, so they're rejected.

Better:

start_url: http://www.foo.com/

-> All links off of this page will likely fall within the same URL-space.

Best:

start_url: http://www.htdig.org/
limit_urls_to: htdig.org

This forces all links to fall within the subdomain. URLs like
http://dev.htdig.org/ work too.

>Anything obvious that I need to do? (other than sit back and wait
>for the stable release :-) )

There is a stable release: 3.1.5. Last I checked, it was the stable
package for Debian. All previous releases (including 3.2.0b1) have
the security hole. If you missed the details of the hole, see the
Debian security updates.

Cheers,

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Wed Apr 05 2000 - 16:08:59 PDT