RE: [htdig] puzzled by htdig

Subject: RE: [htdig] puzzled by htdig
From: Geoff Hutchison (
Date: Thu Oct 05 2000 - 15:31:54 PDT

On Thu, 5 Oct 2000, GYGAX,OTTO (HP-Corvallis,ex1) wrote:

> My limit_urls_to key is set as you have it below (default).
> My start_url is currently set to a list of urls such as http://>/,
http://>/arch.html, http://>/dir1, http://>/dir2,
http://>/dir3, ... where arch.html is a simple web page with a href
> pointer to
http://>/~arch, the cover page to the Mhonarc mailing tree
> that contains links to every single mailing archive page.

OK, but then ~arch won't fall into the limits as you've set them (since
it's not any of the patterns in start_url). If you want to index all
documents on the server, you may want a more liberal limit_urls_to
directive, e.g.

limit_urls_to: http://>/

> Before I extended the start_url key attr., I only had http://>/ and
http://>/arch.html, but htdig went as far as the few links off the
> server's index.html file, missing all other directories at the root. At one

OK, that was one of my points--it will follow the links it sees. So if you
index starting with
http://server/ then it will follow links from
index.html. Unless you add those directories (as you did) to start_url, it
won't even know they're there.

-Geoff Hutchison
Williams Students Online

