Re: htdig: HTML within search strings


Colin Viebrock (cmv@privateworld.com)
Thu, 02 Jul 1998 12:21:03 -0400


Also sprach Andrew Scherpbier (at 08:58 AM 7/2/98 -0700) ...
>> Say I have a website where the code:
>>
>> Sample<I>Code</I>
>>
>> is all over. That's the brandname - including the italics. If I do an
>> htdig search for "SampleCode", I get no matches.
>>
>> Shouldn't htdig strip out all the HTML? Or is there a conf setting I need
>> to do this?
>
>htdig does strip out the HTML, but it has no knowledge of the semantics of
the
>HTML tags for those types of markups, so it assumes it is a word break.

Hrm... how about building in a list of which HTML tags should be considered
work breaks and which shouldn't? Or just those that shouldn't, which is
probably the shorter list.

>Just out of curiosity, how do other search engines deal with this problem?

Don't know. :)

.........................................................................
Colin Viebrock Creative Director - Private World Communciations
cmv@privateworld.com http://www.privateworld.com
ICQ: 11386088

                           Give a man a fish, and you feed him for a day.
             Teach him to use the Net, and he won't bother you for weeks.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:50 PST