[htdig] Post-processing removal of dups?

Aaron Turner (aturner@linuxkb.org)
Fri, 18 Jun 1999 10:15:53 -0700 (PDT)

Sorry for the repost, but I haven't see my post on the list...

---------- Forwarded message ----------
Date: Mon, 14 Jun 1999 10:10:14 -0700 (PDT)
From: Aaron Turner <aturner@linuxkb.org>
To: Geoff Hutchison <ghutchis@wso.williams.edu>
Cc: htdig@htdig.org
Subject: Re: [htdig] more powerful restrict

Thanks Geoff. We'll add me to the list of 'very interested people in
backporting the regexp' to the 3.1.x series.

On a simular note, I'm having a major delima. Basically I have a SQL DB
with content that is accessed via PHP. Each "article" in the DB has a URL


where x, a, b, c, d are postive integers. Basically the id is a unique
identifier for the article, and loc is the location in the 'tree'. Each
article can be in 1 or more places in the tree. So:


point to the same content (actually the headers of the page change a
little, but that's not important), but in different places. Just like a
link in the Unix filesystem works.

We're using restrict to enable 'drill down' in searching. So you can
drill down the tree to 1.3 and run a search. Hence if you drill down to
1.3.4 you'll get one hit, but if you only go to 1.3 you'll get two hits.

Since I'm already wrapping htsearch in a mod_perl front end, I figured I'd
just let mod_perl strip out the duplicates. Problem is that while this is
easy to do, there's no way to properly set the total # of hits for a
search if the results span more than one page. Also there's no way for
the user to say matchesperpage=25 and be guaranteed 25 hits per page
since many of the hits may be duplicates. The result is that the user
sees something like:

Search results 1 - 25 of 54

But only sees 13 results on the page, and thinks something broke and keeps
reloading the page over and over again.

I'm at a lost of a good way to fix this. Any ideas are greatly


On Mon, 14 Jun 1999, Geoff Hutchison wrote:

> On Sun, 13 Jun 1999, Aaron Turner wrote:
> > Where you mention a regexp powered restrict for htsearch and that it was
> > "almost finished". I was wondering if you had any idea about when that
> > feature would be released. I'm hoping to avoid a really ugly hack of
> > using a htsearch parser to create this functionality.
> It is finished in the 3.2 development tree. I'm not sure how much I want
> to backport it, but if there's significant interest, I'll do so.
> -Geoff Hutchison

To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Fri Jun 18 1999 - 09:36:17 PDT