Re: [htdig] Admin Queries

Geoff Hutchison (
Tue, 26 Jan 1999 11:23:46 -0500 (EST)

On Tue, 26 Jan 1999, Peter Polkinghorne wrote:

> A: Is there any way of doing incremental updates? This is not a big issue as
> the full scan only takes an hour at the most and use of the -a flag makes the
> index available most of the time.

You bet. My full scans take several hours and the "update" take about 1/2
hr. You do this by NOT specifying '-i' to htdig and ensuring you keep the and files around. I can post my script if
people want it (it's going to go up in the contrib section of the website,
once it gets going).

> Do others not worry?
> Build a separate index (from in our case an external IP) and protect the
> internal index?

You can do that. I have a similar setup (some files not available outside We assume that search results don't return anything too
useful and following the link will fail for outsiders. If they all fall
into neat directories, you can also specify:

<input name="exclude" type="hidden" value="exclude1|exclude2|exclude3">

This will ensure directories exclude1/ exclude2/ and exclude3/ don't show
up in search results.

> C: Do people use the useful list of dangling refs htdig produces? If so how?
> Ours is a depressing 3000 or so in number!

You bet. One of my "blue sky" projects is to get a Perl script that reads
in a bunch of regexp and uses them to send broken link reports. Right now
I just have a simple script that goes through and reports broken links for

> D: Are people confident in htsearch as a CGI program in terms of security? My
> C++ is a bit rusty, but looks as though the variables accepted are secured and
> there should be no buffer overflows.

I've been through the code several times. I'd feel safer if a real
security guru went through it. I see no possibilities for buffer overflow
and the variables are pretty carefully screened. For example, the config
field cannot contain a period, so you can't do ../../../../../etc/passwd

To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Sun Jan 31 1999 - 10:43:20 PST