Re: [htdig] Description-problem

Gilles Detillieux (
Wed, 10 Mar 1999 10:50:06 -0600 (CST)

According to Antti Rauramo:
> I'm confused: I have the following three test pages:
> with just links to the two latter pages on the index page, and the
> htsearch at
> A copy of the nm2.conf is at
> Now, if you try searching with a word like "iskusana", you'll see that
> the resulting $(DESCRIPTION) is empty, and the $(DESCRIPTIONS) has an
> empty slot, and (valid?) punctuation missing.
> It's obvious that there absolutely are no more than a single link
> pointing to the files, so it seems that the $(DESCRIPTION) is not
> showing the first link text as it should.
> The question is: WHY?!? Help is appreciated! Htdig is 3.1.0 on Solaris
> 2.6.

Very good question. I've looked over the code, and I can't make a lot
of sense out of it. The reason the spaces and punctuation are stripped
out of the $(DESCRIPTIONS) entry is because of the

        <meta name="robots" content="noindex">

tag in the index page. For whatever reason, htdig/ still collects
href tags and their description words when indexing is turned off, but
doesn't collect the spaces and punctuation between words in this case.
This seems inconsistent - it ought to collect either the whole description,
or none at all.

I couldn't figure out from the code why there was an empty slot in
$(DESCRIPTIONS). As far as I can see, it only adds empty descriptions
for tags like these:

        <meta http-equiv=refresh content="url=...">
        <frame src="...">
        <area href="...">

Also, for whatever reason, htsearch seems to set $(DESCRIPTION) to the
second entry in the $(DESCRIPTIONS) list. I really don't know why that
is. Maybe the author of the code for that feature could shed some light.

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW:
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Mon Mar 15 1999 - 08:57:45 PST