Brad Shelton (firstname.lastname@example.org)
Fri, 31 Jul 1998 10:01:34 -0500
I'm new to the list, this isn't in the FAQ, and the list archive at
http://wormhole.eosys.com/mail-archives/htdig/ doesn't seem to be
accessible, so I'm in search of some input. =)
The info at http://htdig.sdsu.edu/meta.html gives the htdig-syntax for
meta tag keywords as not having commas between keywords. So far so
good-- some bots/spiders use the commas, some don't.
The problem is, htdig doesn't merely ignore or strip any commas that are
there, but rather lumps them in as part of the keyword (according to the
debug output we've seen). That is, the tag
<META NAME="keywords" CONTENT="guestbook, register, newsletter">
produces the four words
A search for the word 'newsletter' would have a positive result (no
trailing comma), but a search for 'guestbook' would not (because htdig
indexed it as 'guestbook,' complete with the comma).
A real live example can be found at http://www.centuryinter.net/. Use
the search tool on the front page to look for the word 'harold'-- no
match. Then, visit http://www.centuryinter.net/links_finance.html and
view the source-- 'harold' is one of the keywords listed (comma
delimited). Finally, use the search tool to search for 'dog'-- match.
We're running the latest version, obtained within the past couple of
weeks from http://www.htdig.org/files/htdig-3.0.8b2.tar.gz. It was
compiled and otherwise set up without incident on a Digital UNIX box.
Is this behavior considered buggy, or are most happy to leave their meta
tags comma-less? I could understand that if htdig would simply ignore
the commas-- but as it is, the commas (which are commonly used in
keyword lists) break the search as demonstrated above.
Is there a patch for this, or any other option to get this corrected?
Any thoughts? Thanks in advance for any information!
To unsubscribe from the htdig mailing list, send a message to
email@example.com containing the single word "unsubscribe" in
the body of the message.
This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:56 PST