htdig: Sorting results on date (3)


htDig user (htdig@as.westblaak.spirit.nl)
Wed, 16 Dec 1998 16:20:49 +0100 (CET)


I'm still working on my own fix for this date-sorting stuff :-)

I'm working with an index of about 12000 pages. I want to sort them by
date since it concerns the pages of a newspaper :-)

So, I used Gilles' patch (in combination with snapshot 111598).

> Memory: Real: 51M/122M act/tot Virtual: 45M/256M use/tot Free: 632K
> PID USERNAME PRI NICE SIZE RES STATE TIME CPU COMMAND
> 23979 user 42 0 47M 43M WAIT 0:12 17.90% htsearch

Size.. 47M (!) ....... free...632K ... *wowie* This happened when I tried
to retrieve ALL documents from the database. (12000). Htsearch isn't able
to sort this much results on date.
I think time_t (anyway, compareTime) is the problem.
BTW, I pressed CTRL-C when it reached 47M... I'm sure htsearch would
result in a core-dump otherwise too...

When I use htsearch off the prompt, it asks me a 'value for sort'. When I
use 'date' and I search on something what should return about 40
results, htsearch DOESN'T sort on date! (Does this have to do with the
snapshot release I use?)

In Display.cc::sort:

   char str[80];
   ResultMatch **array = new ResultMatch*[numberOfMatches];

+ if (numberOfMatches>1000) numberOfMatches=1000;

----

+ for(j=0; j < numberOfMatches; j++)
+ {
+ array[j]->setRef(docDB[array[j]->getURL()]);
+ }

   matches->Release();

   qsort((char *) array, numberOfMatches, sizeof(ResultMatch *),
          Display::compare);

In Display.cc::compare:

int
Display::compare(const void *a1, const void *a2)
{
 /* I use this to sort on date.. don't care about Scores or so...*/
    char buffer1[100];
    char buffer2[100];

    ResultMatch *m1 = *((ResultMatch **) a1);
    ResultMatch *m2 = *((ResultMatch **) a2);

    time_t t1 = m1->getRef()->DocTime();
    struct tm *tm1 = localtime(&t1);
    strftime(buffer1,sizeof(buffer1),"%Y%j",tm1);
    time_t t2 = m2->getRef()->DocTime();
    struct tm *tm2 = localtime(&t2);

    strftime(buffer2,sizeof(buffer2),"%Y%j",tm2);

    return (atol(buffer2)-atol(buffer1));
}
 
I know this is an ugly piece of code :-) Don't bother me with that!

What I do here is (maybe stupid) as follows: I take century and
number_of_day_of_the_year. (1998364 for example). Gilles' patch is
better on this I suppose..

Using the routine above, I'm able to QUICKLY sort about 1000 documents by
date. Therefore, I have to build a limit in htsearch so that it can only
display 1000 matches, even if it found 12000...

The 1000-limit is because time_t eats to much mem (not really sure, but
when I comment it out htsearch doesn't give me a core-dump)

HtDig is great :-)

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:29:52 PST