Re: [htdig] Post-processing removal of dups?


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Fri, 18 Jun 1999 13:11:17 -0500 (CDT)


According to Aaron Turner:
> Sorry for the repost, but I haven't see my post on the list...
>
> Thanks Geoff. We'll add me to the list of 'very interested people in
> backporting the regexp' to the 3.1.x series.
>
> On a simular note, I'm having a major delima. Basically I have a SQL DB
> with content that is accessed via PHP. Each "article" in the DB has a URL
> like:
>
> /articles/article.php3?id=x&loc=a.b.c.d
>
> where x, a, b, c, d are postive integers. Basically the id is a unique
> identifier for the article, and loc is the location in the 'tree'. Each
> article can be in 1 or more places in the tree. So:
>
> /articles/article.php3?id=11&loc=1.3.4.10
> /articles/article.php3?id=11&loc=1.3.5.7
>
> point to the same content (actually the headers of the page change a
> little, but that's not important), but in different places. Just like a
> link in the Unix filesystem works.
>
> We're using restrict to enable 'drill down' in searching. So you can
> drill down the tree to 1.3 and run a search. Hence if you drill down to
> 1.3.4 you'll get one hit, but if you only go to 1.3 you'll get two hits.
>
> Since I'm already wrapping htsearch in a mod_perl front end, I figured I'd
> just let mod_perl strip out the duplicates. Problem is that while this is
> easy to do, there's no way to properly set the total # of hits for a
> search if the results span more than one page. Also there's no way for
> the user to say matchesperpage=25 and be guaranteed 25 hits per page
> since many of the hits may be duplicates. The result is that the user
> sees something like:
>
> Search results 1 - 25 of 54
>
> But only sees 13 results on the page, and thinks something broke and keeps
> reloading the page over and over again.
>
> I'm at a lost of a good way to fix this. Any ideas are greatly
> appreciated.

I think your best bet would be to customize Display::buildMatchList() in
htsearch/Display.cc to do what you want. This would allow you to weed out
the duplicates you want to exclude before they're counted and paginated.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Jun 18 1999 - 10:31:16 PDT