Re: [htdig] Post-processing removal of dups?


Aaron Turner (aturner@linuxkb.org)
Sat, 19 Jun 1999 01:51:37 -0700 (PDT)


On Fri, 18 Jun 1999, Gilles Detillieux wrote:

> I think your best bet would be to customize Display::buildMatchList() in
> htsearch/Display.cc to do what you want. This would allow you to weed out
> the duplicates you want to exclude before they're counted and paginated.

We'll I had a friend who knows C and some C++ and he's gaging on the code.
Any pointers? Any chance of convincing any of developers that this is
important? :-)

The way I would envision this is two new params to be passed to htsearch.
First, 'uniqueid' which would be the name of another param in the cgi
string. example:

/cgi-bin/file?id=124&c=1.2.4.6&uniqeid=id&uniqueroot=/cgi-bin/file&...
/cgi-bin/file?id=124&c=1.3.5.6&uniqeid=id&uniqueroot=/cgi-bin/file&...
/cgi-bin/file?id=127&c=1.3.5.6&uniqeid=id&uniqueroot=/cgi-bin/file&...

htsearch would use the value of uniqueid and uniqueroot to determine
uniqueness of a URL. Any two hits that start with uniqueroot and has
the same value for the value of uniqueid (in this case 'id') is considered
a duplicate. In the case above, #1 and #2 are dupes, but #3 is unique.

The whole point of this is for dynamic sites that use DB's as their
backend and something like PHP to access the content.

-Aaron

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Sat Jun 19 1999 - 01:14:22 PDT