Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Tue, 9 Feb 1999 16:55:17 -0600 (CST)
According to mark williamson:
> After having installed and compiled htdig successfully (woohoo!), I came
> across an aspect of the program that's not covered in the docs.
>
> How does/should htdig handle these two urls:
>
> http://www.somedomain.com/
>
> http://www.somedomain.com/products.html
>
> My observation has it that it will spider the site if it sees just a domain,
> otherwise it will index the page if one is specified. The reason i ask
> this, is that i pointed it to a file which is basically a map of the html
> documents of a site, and it did not follow any links. but if i point it to
> the main domain name, it does indeed follow.
What is your limit_urls_to set to? By default it's ${start_url}, so if you
set start_url to something more restrictive, the whole dig will be thus
restricted. Try:
start_url: http://www.somedomain.com/products.html
limit_urls_to: http://www.somedomain.com/
This probably belongs on the htdig list, rather than htdig3-dev.
-- Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev@htdig.org containing the single word "unsubscribe" in the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Tue Feb 09 1999 - 15:16:46 PST