htdig: Regular expressions in "exclude"? Really? (v3.1.0b4)

Gunnar Helliesen (
Mon, 4 Jan 1999 20:58:08 +0100


I've been using ht://Dig for some time, but I just joined this mailing
list today.

I have a question regarding using regular expressions in "exclude". I
searched the mailing list archives and found a conversation from August
98 where this was discussed.

Thu, 27 Aug 1998 10:06:39 -0600, Gordon Hopper ( wrote:
> Maren S. Leizaola wrote:
> > I've not tried this myself but you must enter a regular
> > expression for the exclusion.
> I wondered whether it was a regular expression, because it
> doesn't say anything about it in the documentation.
> Also, I don't believe '/' is a special character in regex,
> unless it's used as the delimiter.
> Gordon

This would indicate that regular expressions will work. However, I could
not get it to work on version 3.0.8b2. Then I found this:

On Thu, 27 Aug 1998 14:38:23 +0200 (MET DST), J. op den Brouw
( wrote:
> On Wed, 26 Aug 1998, Gordon Hopper wrote:
> > htdig version 3.0.8b2
> >
> > Exclude doesn't seem to work at all.
> >
> > (exclude specifies a url, right?) so something like restrict=/~
> > exclude=/~ should return nothing, right? I want to be able to
> > user home pages (which begin with a tilde) from my searches.
> Do you have a clean version og htdog 3.0.8b2? If so, the exclude
> is not working properly. There is a patch available at the
> htdig patch site (don't know it right now).
> --jesse

OK, after reading this I upgraded to version 3.1.0b4 today. "exclude"
now works, but not with regular expressions. I can get htsearch to
exclude a literal string anywhere in the URL, but it doesn't understand
regexps as far as I can tell.

Here's what I want to do:

I want to exclude all directory indices. In other words I do not want
the following document to be returned even if it does contain one or
more of the search words:

but I _do_ want all documents below that directory containing any of the
search words to be returned. For example, this document should be

I tried setting "exclude" to "/$" and "\/$" (the latter shouldn't really
be necessary, should it?) and ".*/$" with no effect. Directory indices
were still returned.

Now what? Clues, hints, pointers and help needed!


