Juergen Unger (j.unger@choin.net)
Fri, 19 Jun 1998 00:24:33 +0200
Hi !
> > I am trying to customize ht://Dig a bit for my usage and do have some
> > questions: Is it possible to tell ht://Dig that it should _not_
> > index text which is in between <a href...> and </a> tags ?
> No. To do that you will need to modify HTML.cc and set the weight of words
> there to 0.
I solved it a slightly other way. Here is the diff:
------------------------------------------------------------------------------
*** HTML.cc Fri Jun 19 00:38:53 1998
--- HTML.cc.orig Thu Jun 18 09:25:46 1998
***************
*** 238,244 ****
{
word.lowercase();
word.remove(valid_punctuation);
! if ((word.length() >= minimumWordLength) && (in_ref == 0))
{
retriever.got_word(word,
int(offset * 1000 / contents->length()),
--- 238,244 ----
{
word.lowercase();
word.remove(valid_punctuation);
! if (word.length() >= minimumWordLength)
{
retriever.got_word(word,
int(offset * 1000 / contents->length()),
------------------------------------------------------------------------------
we found that it doesn't make sense to put the contents of links into
the index too. If someone searched for a specific word he normaly
want to find the pages where the information is and not the pages
where are the links to the pages with the information are ;-)
But maybe it would be best to change this a bit too so that one
can switch the indexing of links on or off from the config file.
another important question for me:
what would be the best way to change the code so that the excerpt
is put from the 'description' meta-tag if it exists instead of
from the text-body. I need to implement this.
thnx in advance,
-Juergen-Unger-
-- CHOIN! HCT GmbH -- http://www.choin.net
This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:34 PST