Subject: Re: [htdig] SQL handling start_url
From: Gilles Detillieux (firstname.lastname@example.org)
Date: Thu Dec 07 2000 - 08:12:37 PST
According to Curtis Ireland:
> Is there any way to have start_url get its list from an SQL back-end?
> Has anyone already built a patch to handle this?
> Here are a couple of solutions I can think of to bi-pass the problem,
> but I'm sure I'm not alone in desiring this feature.
> 1) Build a PHP link built with links to all the sites we want to index.
> Have htDig use this as its start_url
> 2) Before htDig starts its database build, dump all the links to a text
> file and have the htdig.conf include this file
> The one problem with these two solutions is how would the limit_urls_to
> variable work? I want to make sure the links are properly indexed
> without going past the linked site.
Either solution seems workable - it all depends on what your preference
is. For the first solution, you'd need to have a limit_urls_to setting
that's liberal enough to allow through all the links that the PHP script
will spit out. You should probably set your max_hop_count to 1 to avoid
having htdig go beyond the first hop, from the PHP output to the documents
For the second solution, you could probably just leave limit_urls_to as
the default, which is the same as the value of start_url, and set your
max_hop_count to 0.
-- Gilles R. Detillieux E-mail: <email@example.com> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------ To unsubscribe from the htdig mailing list, send a message to firstname.lastname@example.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
This archive was generated by hypermail 2b28 : Thu Dec 07 2000 - 08:22:09 PST