Re: [htdig] SQL handling start_url


Subject: Re: [htdig] SQL handling start_url
From: Bill Carlson (wcarlson@vh.org)
Date: Thu Dec 07 2000 - 07:12:38 PST


On Wed, 6 Dec 2000, Curtis Ireland wrote:

> 2) Before htDig starts its database build, dump all the links to a text
> file and have the htdig.conf include this file
>
> The one problem with these two solutions is how would the limit_urls_to
> variable work? I want to make sure the links are properly indexed
> without going past the linked site.

This is the method I used, though in my case the backend was an email full
of links from the person directing the crawl. :)

Write 2 files, one for start_url and one for limit_urls, include both in
the conf file like so:

start_url: `/home/htdig/conf/start_url_file`

limit_urls_to: `/home/htdig/conf/limit_url_file`

The contents of both files are just links.

Good Luck,

Bill Carlson

-- 
Systems Programmer    bill-carlson@uiowa.edu	|  Opinions are mine,
Virtual Hospital      http://www.vh.org/        |  not my employer's.
University of Iowa Hospitals and Clinics	|

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Thu Dec 07 2000 - 07:21:56 PST