htdig: htdig mishandles commas in META tag keyword list?


Brad Shelton (brad.shelton@centurytel.com)
Fri, 31 Jul 1998 10:01:34 -0500


Hello,

I'm new to the list, this isn't in the FAQ, and the list archive at
http://wormhole.eosys.com/mail-archives/htdig/ doesn't seem to be
accessible, so I'm in search of some input. =)

The info at http://htdig.sdsu.edu/meta.html gives the htdig-syntax for
meta tag keywords as not having commas between keywords. So far so
good-- some bots/spiders use the commas, some don't.

The problem is, htdig doesn't merely ignore or strip any commas that are
there, but rather lumps them in as part of the keyword (according to the
debug output we've seen). That is, the tag

<META NAME="keywords" CONTENT="guestbook, register, newsletter">

produces the four words

guestbook,
register,
newsletter

A search for the word 'newsletter' would have a positive result (no
trailing comma), but a search for 'guestbook' would not (because htdig
indexed it as 'guestbook,' complete with the comma).

A real live example can be found at http://www.centuryinter.net/. Use
the search tool on the front page to look for the word 'harold'-- no
match. Then, visit http://www.centuryinter.net/links_finance.html and
view the source-- 'harold' is one of the keywords listed (comma
delimited). Finally, use the search tool to search for 'dog'-- match.

We're running the latest version, obtained within the past couple of
weeks from http://www.htdig.org/files/htdig-3.0.8b2.tar.gz. It was
compiled and otherwise set up without incident on a Digital UNIX box.

Is this behavior considered buggy, or are most happy to leave their meta
tags comma-less? I could understand that if htdig would simply ignore
the commas-- but as it is, the commas (which are commonly used in
keyword lists) break the search as demonstrated above.

Is there a patch for this, or any other option to get this corrected?
Any thoughts? Thanks in advance for any information!

Brad Shelton
brad.shelton@centurytel.com
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:56 PST