Re: [htdig] 3.1.3 engine on 3.1.5 db


Subject: Re: [htdig] 3.1.3 engine on 3.1.5 db
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Fri Jan 12 2001 - 08:08:53 PST


According to Dave Salisbury:
> > If
> > you created your database with htdig 3.1.5, and want to search it with
> > htsearch 3.1.3, that's a bad idea. The most glaring bug in releases
> > before 3.1.5 is in htsearch, so you really should upgrade it.
>
> I take it one of the worst things is the security hole which allows
> a user to view any file with read permissions ( ouch! )

That's the one!

> Is there any way to correct for this with a wrapper around htsearch?
> Reading the indices using 3.1.3 that were created by a 3.1.5 engine
> seems to work just fine.

There would be, but it might be a tad tricky. The idea is to use a
backslash to quote any left quote (`), dollar sign ($) or backslash
(\) in the query string that is part of an input parameter value that
will get added to the config object as an internal attribute setting.
The lines in htsearch/htsearch.cc that do this are (from a grep):

        config.Add("match_method", input["method"]);
        config.Add("template_name", input["format"]);
            config.Add("matches_per_page", input["matchesperpage"]);
        config.Add("config", input["config"]);
        config.Add("restrict", input["restrict"]);
        config.Add("exclude", input["exclude"]);
        config.Add("keywords", input["keywords"]);
        config.Add("sort", input["sort"]);
        config.Add(form_vars[i], input[form_vars[i]]);

The last one above is the tricky one, as it can be any input parameter
name that you use in allow_in_form. Rather that limiting the backslash
escaping of special characters to only the values of these parameters,
it might be better to do the whole query string, but exclude a few
parameters where this might be undesirable. I'd recommend NOT doing
this for the "words" input parameter, for instance, but I can't think
of any others right off-hand where you would not want to do this.

> Anyone out there want to bash Glimpse before I look into it.
> I'm hoping to get it at least to compile on an SGI.

I won't do any bashing, but if htdig is your preference, I'd suggest not
giving up on it too quickly. Did you have a look at David Adams' recent
post about an "IRIX compile fix"? In it, he forwarded a message from
Bob MacCallum that explains a workaround to some problems on IRIX 6.5,
using cc, not gcc. If you haven't already, you ought to try that before
abandoning htdig.

> > On the other hand, if you have an existing database built with version
> > 3.1.3, and want to use it with the latest htsearch, that should work
> > without any difficulty. However, you'll lose out on several benefits
> > in the latest htdig (better parsing of meta tags, parsing img alt text,
> > fixed parsing of URL parameters, etc.),
>
> Couldn't find what "fixed parsing of URL parameters" means.
> The query string is part of what's indexed??

The query string isn't indexed, but it's part of the URL. 3.1.3 mangled
bare ampersands (&) in the query string in an URL, and versions before
that didn't decode sequences like é within an URL. I think the
ChangeLog explains it better than the release notes.

Tue Nov 23 19:52:27 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>

        * htdig/HTML.cc(transSGML), htdig/SGMLEntities.cc(translateAndUpdate):
        Fix the infamous problem in htdig 3.1.3 of mangling URL parameters that
        contain bare ampersands (&), and not converting &amp; entities in URLs.
...
Wed Sep 1 15:39:41 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>

        * htdig/HTML.h, htdig/HTML.cc(do_tag, transSGML): Fix the HTML parser
        to decode SGML entities within tag attributes.

> > which you'll only get if you
> > reindex with htdig 3.1.5. Maybe none of these matter for your site,
> > though. See the release notes and ChangeLog for details.
>
> I don't think they're essential.

Except for the URL parameter mangling fix, of course.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Fri Jan 12 2001 - 08:22:53 PST