Re: [htdig3-dev] URL requirements


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Tue, 9 Feb 1999 16:55:17 -0600 (CST)


According to mark williamson:
> After having installed and compiled htdig successfully (woohoo!), I came
> across an aspect of the program that's not covered in the docs.
>
> How does/should htdig handle these two urls:
>
> http://www.somedomain.com/
>
> http://www.somedomain.com/products.html
>
> My observation has it that it will spider the site if it sees just a domain,
> otherwise it will index the page if one is specified. The reason i ask
> this, is that i pointed it to a file which is basically a map of the html
> documents of a site, and it did not follow any links. but if i point it to
> the main domain name, it does indeed follow.

What is your limit_urls_to set to? By default it's ${start_url}, so if you
set start_url to something more restrictive, the whole dig will be thus
restricted. Try:

start_url: http://www.somedomain.com/products.html
limit_urls_to: http://www.somedomain.com/

This probably belongs on the htdig list, rather than htdig3-dev.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Tue Feb 09 1999 - 15:16:46 PST