Re: [htdig] htdig - returning directory listings

Subject: Re: [htdig] htdig - returning directory listings
From: Bill Carlson (
Date: Fri Jul 21 2000 - 11:07:09 PDT

On Thu, 20 Jul 2000, Gilles Detillieux wrote:

> According to alan:
> > whenever i search for some word that exists in a filename htsearch
> > returns the proper results but also returns urls to just directories
> > that contain that file... this allows users to view the directory but i
> > dont want them to... i know u can disable directory listing on the
> > server hosting the site but then htdig isnt able to index the specified
> > directory (indicated in the htdig.conf) anymore since it is not allowed
> > the view the specified directory... so is there a way to prevent htdig
> > from indexing directory listings or preventing htsearch form displaying
> > directory listings??
> No, not that I know of. This has come up several times before, but
> there's no easy fix. I think the solution that some users go with is
> to generate a list of all the URLs they want to index, and feed this
> to htdig's start_url attribute, and use a max_hop_count of 0 to prevent
> indexing anything else. That way, they have more control over what is
> indexed and what isn't.

If you happen to control the server you are accessing, you can easily
accomplish this. Make a seperate configuration that points to the
documents with indexing turned off.

For example, I use Apache. When I dig my site, I run against a seperate
Virtual Host that has indexing and redirects turned off, this gives me a
clean dig. You need to setup url_part_aliases for htdig to handle the
different hostname, but it works very well. This approach also handles the
problem of dynamic footers on the pages (more of a link checker issue).

If you don't control the server, Gilles is right, not much you can do.

Bill Carlson
Systems Programmer | Opinions are mine,
Virtual Hospital | not my employer's.
University of Iowa Hospitals and Clinics |

To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Fri Jul 21 2000 - 01:07:54 PDT