htdig: Regular expressions in "exclude"? Really? (v3.1.0b4)


Gunnar Helliesen (gunnar@bitcon.no)
Mon, 4 Jan 1999 20:58:08 +0100


Hi!

I've been using ht://Dig for some time, but I just joined this mailing
list today.

I have a question regarding using regular expressions in "exclude". I
searched the mailing list archives and found a conversation from August
98 where this was discussed.

Thu, 27 Aug 1998 10:06:39 -0600, Gordon Hopper (gordon@byu.edu) wrote:
>
> Maren S. Leizaola wrote:
> > I've not tried this myself but you must enter a regular
> > expression for the exclusion.
>
>
> I wondered whether it was a regular expression, because it
> doesn't say anything about it in the documentation.
>
>
> Also, I don't believe '/' is a special character in regex,
> unless it's used as the delimiter.
>
> Gordon

This would indicate that regular expressions will work. However, I could
not get it to work on version 3.0.8b2. Then I found this:

On Thu, 27 Aug 1998 14:38:23 +0200 (MET DST), J. op den Brouw
(MSQL_User@st.hhs.nl) wrote:
>
> On Wed, 26 Aug 1998, Gordon Hopper wrote:
>
> > htdig version 3.0.8b2
> >
> > Exclude doesn't seem to work at all.
> >
> > (exclude specifies a url, right?) so something like restrict=/~
> > exclude=/~ should return nothing, right? I want to be able to
exclude
> > user home pages (which begin with a tilde) from my searches.
>
>
> Do you have a clean version og htdog 3.0.8b2? If so, the exclude
> is not working properly. There is a patch available at the
> htdig patch site (don't know it right now).
>
>
> --jesse

OK, after reading this I upgraded to version 3.1.0b4 today. "exclude"
now works, but not with regular expressions. I can get htsearch to
exclude a literal string anywhere in the URL, but it doesn't understand
regexps as far as I can tell.

Here's what I want to do:

I want to exclude all directory indices. In other words I do not want
the following document to be returned even if it does contain one or
more of the search words:

http://www.mydomain.com/archives/199808/

but I _do_ want all documents below that directory containing any of the
search words to be returned. For example, this document should be
returned:

http://www.mydomain.com/archives/199808/msg00003.html

I tried setting "exclude" to "/$" and "\/$" (the latter shouldn't really
be necessary, should it?) and ".*/$" with no effect. Directory indices
were still returned.

Now what? Clues, hints, pointers and help needed!

Gunnar

--
Gunnar Helliesen   | Bergen IT Consult AS  | NetBSD/VAX on a uVAX II
Systems Consultant | Bergen, Norway        | '86 Jaguar Sovereign 4.2
gunnar@bitcon.no   | http://www.bitcon.no/ | '73 Mercedes 280 (240D)
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Tue Jan 05 1999 - 12:42:14 PST