Re: [htdig] puzzled by htdig


Subject: Re: [htdig] puzzled by htdig
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Wed Oct 04 2000 - 19:30:14 PDT


At 2:43 PM -0700 10/4/00, GYGAX,OTTO (HP-Corvallis,ex1) wrote:
>Now it won't work. htdig is able to look up other web pages that reside at the
>root of the web server but cannot traverse down to the ~arch tree.

There are a few points here and it is perhaps better to explain how
htdig follows links rather than to directly address your question.

In the htdig.conf file, there are two key attributes for your question:
start_url: http://www.foo.com/
limit_urls_to: ${start_url}

As set, this would start indexing at www.foo.com and go from there.
The limit_urls_to attribute requires that any URLs it finds match
this pattern. In this case, this will limit indexing to everything
inside this server. (You could, for example, just set it to "foo.com"
to index all servers in that domain, etc.) But it will *only* follow
links. So if you don't have a link from a file at the start_url to a
certain file, it won't index it.

Your example is a little unclear to me. My guess is that you are
either not using limit_urls_to correctly or you don't have working
links to the files you're trying to index.

For more information:
http://www.htdig.org/attrs.html#start_url
http://www.htdig.org/attrs.html#limit_urls_to

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Wed Oct 04 2000 - 19:35:16 PDT