RE: [htdig] puzzled by htdig


Subject: RE: [htdig] puzzled by htdig
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Thu Oct 05 2000 - 15:31:54 PDT


On Thu, 5 Oct 2000, GYGAX,OTTO (HP-Corvallis,ex1) wrote:

> My limit_urls_to key is set as you have it below (default).
> My start_url is currently set to a list of urls such as http://>/,
>
http://>/arch.html, http://>/dir1, http://>/dir2,
>
http://>/dir3, ... where arch.html is a simple web page with a href
> pointer to
http://>/~arch, the cover page to the Mhonarc mailing tree
> that contains links to every single mailing archive page.

OK, but then ~arch won't fall into the limits as you've set them (since
it's not any of the patterns in start_url). If you want to index all
documents on the server, you may want a more liberal limit_urls_to
directive, e.g.

limit_urls_to: http://>/

> Before I extended the start_url key attr., I only had http://>/ and
>
http://>/arch.html, but htdig went as far as the few links off the
> server's index.html file, missing all other directories at the root. At one

OK, that was one of my points--it will follow the links it sees. So if you
index starting with
http://server/ then it will follow links from
index.html. Unless you add those directories (as you did) to start_url, it
won't even know they're there.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Thu Oct 05 2000 - 15:36:17 PDT