Re: [htdig] Post-processing removal of dups?

Gilles Detillieux (
Fri, 18 Jun 1999 13:11:17 -0500 (CDT)

According to Aaron Turner:
> Sorry for the repost, but I haven't see my post on the list...
> Thanks Geoff. We'll add me to the list of 'very interested people in
> backporting the regexp' to the 3.1.x series.
> On a simular note, I'm having a major delima. Basically I have a SQL DB
> with content that is accessed via PHP. Each "article" in the DB has a URL
> like:
> /articles/article.php3?id=x&loc=a.b.c.d
> where x, a, b, c, d are postive integers. Basically the id is a unique
> identifier for the article, and loc is the location in the 'tree'. Each
> article can be in 1 or more places in the tree. So:
> /articles/article.php3?id=11&loc=
> /articles/article.php3?id=11&loc=
> point to the same content (actually the headers of the page change a
> little, but that's not important), but in different places. Just like a
> link in the Unix filesystem works.
> We're using restrict to enable 'drill down' in searching. So you can
> drill down the tree to 1.3 and run a search. Hence if you drill down to
> 1.3.4 you'll get one hit, but if you only go to 1.3 you'll get two hits.
> Since I'm already wrapping htsearch in a mod_perl front end, I figured I'd
> just let mod_perl strip out the duplicates. Problem is that while this is
> easy to do, there's no way to properly set the total # of hits for a
> search if the results span more than one page. Also there's no way for
> the user to say matchesperpage=25 and be guaranteed 25 hits per page
> since many of the hits may be duplicates. The result is that the user
> sees something like:
> Search results 1 - 25 of 54
> But only sees 13 results on the page, and thinks something broke and keeps
> reloading the page over and over again.
> I'm at a lost of a good way to fix this. Any ideas are greatly
> appreciated.

I think your best bet would be to customize Display::buildMatchList() in
htsearch/ to do what you want. This would allow you to weed out
the duplicates you want to exclude before they're counted and paginated.

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW:
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Fri Jun 18 1999 - 10:31:16 PDT