Subject: Re: [htdig] SQL handling start_url
From: Bill Carlson (wcarlson@vh.org)
Date: Thu Dec 07 2000 - 07:12:38 PST
On Wed, 6 Dec 2000, Curtis Ireland wrote:
> 2) Before htDig starts its database build, dump all the links to a text
> file and have the htdig.conf include this file
>
> The one problem with these two solutions is how would the limit_urls_to
> variable work? I want to make sure the links are properly indexed
> without going past the linked site.
This is the method I used, though in my case the backend was an email full
of links from the person directing the crawl. :)
Write 2 files, one for start_url and one for limit_urls, include both in
the conf file like so:
start_url: `/home/htdig/conf/start_url_file`
limit_urls_to: `/home/htdig/conf/limit_url_file`
The contents of both files are just links.
Good Luck,
Bill Carlson
-- Systems Programmer bill-carlson@uiowa.edu | Opinions are mine, Virtual Hospital http://www.vh.org/ | not my employer's. University of Iowa Hospitals and Clinics |------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
This archive was generated by hypermail 2b28 : Thu Dec 07 2000 - 07:21:56 PST