Re: [htdig] Indexing big files.


Subject: Re: [htdig] Indexing big files.
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Mon Dec 13 1999 - 13:53:14 PST


According to Petr Ferdus:
> On Mon, 13 Dec 1999, Gilles Detillieux wrote:
> > According to Petr Ferdus:
> > > Is there a way to force htdig to produce more than one result match per
> > > file? In my case I have indexed just a few rather big files
> > > (approx. 2MB each). Search often showed links to all of them with
> > > the excerpts showing first occurrence of searched string. More usable
> > > output would show all occurrences of searched string located within
> > > documents. (with reasonable upper limit)
> > > Can anyone recommend what might bring richer output.
> >
> > Nothing right now. This feature has been requested once before, but I
> > don't know if anyone is currently working on it.
>
> Thank you for quick answer.
>
> Does it mean, that you can actually 'dig' from the htdig databases only
> excerpts related to the first occurence of any given word or the feature
> of viewing other word occurences is not provided by htsearch? (but can be
> implemented, because necessary data are stored there)

All the words of a document get put into the word database. How much of
the document can be used for excerpt highlighting is controlled by the
max_head_length attribute - set it high enough and the entire document
(minus formatting tags) gets stored in the database for use in document
excerpt display.

The limitation right now is that htsearch will only show one
document excerpt per matching document in a search. It will show
the excerpt_length characters surrounding the first matched word
in the document. If there are other matching words in that excerpt,
it'll highlight them too, but it won't go further than that. You could
of course increase excerpt_length, to get one really big excerpt, but
that's rarely what you want.

The suggestion was made, back in October, to extend htsearch to display
more than one excerpt per document. Jim Cole said he'd give it a shot,
but I haven't heard back from him about this. You can find that thread
in the mailing list archives:

        http://www.htdig.org/cgi-bin/htsearch?words=multiple+hits+document

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Dec 13 1999 - 14:06:46 PST