Re: [htdig] SQL handling start_url


Subject: Re: [htdig] SQL handling start_url
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Thu Dec 07 2000 - 08:12:37 PST


According to Curtis Ireland:
> Is there any way to have start_url get its list from an SQL back-end?
> Has anyone already built a patch to handle this?
>
> Here are a couple of solutions I can think of to bi-pass the problem,
> but I'm sure I'm not alone in desiring this feature.
>
> 1) Build a PHP link built with links to all the sites we want to index.
> Have htDig use this as its start_url
> 2) Before htDig starts its database build, dump all the links to a text
> file and have the htdig.conf include this file
>
> The one problem with these two solutions is how would the limit_urls_to
> variable work? I want to make sure the links are properly indexed
> without going past the linked site.

Either solution seems workable - it all depends on what your preference
is. For the first solution, you'd need to have a limit_urls_to setting
that's liberal enough to allow through all the links that the PHP script
will spit out. You should probably set your max_hop_count to 1 to avoid
having htdig go beyond the first hop, from the PHP output to the documents
it references.

For the second solution, you could probably just leave limit_urls_to as
the default, which is the same as the value of start_url, and set your
max_hop_count to 0.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Thu Dec 07 2000 - 08:22:09 PST