Re: [htdig] weird results


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Thu, 19 Aug 1999 10:38:42 -0500 (CDT)


According to Robert Cerny:
> it's not secret ot something like that :))
>
> the word I was searching is an abbreviation - dlt and it was found on the
> page which contents isn't like this one.

OK, here's a possibility: I think that if the document contains a
string like
        d<

which would appear to you as d< but to htdig as d&lt (if translate_lt_gt
is false, which it is by default), then htdig would add dlt to the word
database for this document. That may be the cause of this false match.

There are some fixes to the HTML parsing in 3.2 that should make the
translate_* options largely unnecessary. If we get rid of these, or
turn them on by default, it should avoid this particular problem.

Other possibilities would be any string with the letters d, l and t
in that order, with any of these characters interspersed: .-_/!#$%^&'
(the characters in valid_punctuation). The excerpt matching should
really be fixed to ignore any of these characters in the text.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig@htdig.org containing the single word unsubscribe in the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu Aug 19 1999 - 08:40:52 PDT