Gilles Detillieux (email@example.com)
Wed, 10 Mar 1999 10:50:06 -0600 (CST)
According to Antti Rauramo:
> I'm confused: I have the following three test pages:
> with just links to the two latter pages on the index page, and the
> htsearch at
> A copy of the nm2.conf is at
> Now, if you try searching with a word like "iskusana", you'll see that
> the resulting $(DESCRIPTION) is empty, and the $(DESCRIPTIONS) has an
> empty slot, and (valid?) punctuation missing.
> It's obvious that there absolutely are no more than a single link
> pointing to the files, so it seems that the $(DESCRIPTION) is not
> showing the first link text as it should.
> The question is: WHY?!? Help is appreciated! Htdig is 3.1.0 on Solaris
Very good question. I've looked over the code, and I can't make a lot
of sense out of it. The reason the spaces and punctuation are stripped
out of the $(DESCRIPTIONS) entry is because of the
<meta name="robots" content="noindex">
tag in the index page. For whatever reason, htdig/HTML.cc still collects
href tags and their description words when indexing is turned off, but
doesn't collect the spaces and punctuation between words in this case.
This seems inconsistent - it ought to collect either the whole description,
or none at all.
I couldn't figure out from the code why there was an empty slot in
$(DESCRIPTIONS). As far as I can see, it only adds empty descriptions
for tags like these:
<meta http-equiv=refresh content="url=...">
Also, for whatever reason, htsearch seems to set $(DESCRIPTION) to the
second entry in the $(DESCRIPTIONS) list. I really don't know why that
is. Maybe the author of the code for that feature could shed some light.
-- Gilles R. Detillieux E-mail: <firstname.lastname@example.org> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------ To unsubscribe from the htdig mailing list, send a message to email@example.com containing the single word "unsubscribe" in the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Mon Mar 15 1999 - 08:57:45 PST