Re: [htdig] SQL handling start_url

Subject: Re: [htdig] SQL handling start_url
From: Bill Carlson (
Date: Thu Dec 07 2000 - 07:12:38 PST

On Wed, 6 Dec 2000, Curtis Ireland wrote:

> 2) Before htDig starts its database build, dump all the links to a text
> file and have the htdig.conf include this file
> The one problem with these two solutions is how would the limit_urls_to
> variable work? I want to make sure the links are properly indexed
> without going past the linked site.

This is the method I used, though in my case the backend was an email full
of links from the person directing the crawl. :)

Write 2 files, one for start_url and one for limit_urls, include both in
the conf file like so:

start_url: `/home/htdig/conf/start_url_file`

limit_urls_to: `/home/htdig/conf/limit_url_file`

The contents of both files are just links.

Good Luck,

Bill Carlson

Systems Programmer	|  Opinions are mine,
Virtual Hospital        |  not my employer's.
University of Iowa Hospitals and Clinics	|

------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this. List archives: <> FAQ: <>

This archive was generated by hypermail 2b28 : Thu Dec 07 2000 - 07:21:56 PST