Re: [htdig3-dev] Regex


Geoff Hutchison (ghutchis@wso.williams.edu)
Wed, 5 May 1999 13:21:54 -0400


>However, it's up to us as developers to decide whether exceptions like
>this are warranted in htdig and htsearch. My feeling is that we shouldn't
>make an exception in htdig (i.e. for limit_urls_to, exclude_urls, etc.,
>because if the user wants an anchored match, that can be specified
>explicitly).

Right. And that's the current behavior with these attributes. Some
incompatibility for future versions is expected. But major changes to
limits (i.e. requiring escaping or suddenly using anchored regex) would be
silly IMHO.

>so that may change the user's expectations. Also, economy of syntax
>is more important in a search dialog than in a configuration file.
>My feeling is that for the regex fuzzy algorithm, an anchored match by
>default may make more sense. For an unanchored match, you can add ".*"
>before or after the pattern.

I'd still like to see if we can figure out a nice way to allow searches for
those of us who can't remember how to do POSIX-style regex. People have
asked for searches like "gdbm*" or "ho?se" and can be done with regexp.
Should we just divide these into "naive" and "full" regexp? (I admit to
falling into the former category.) If so, what do we allow in naive regex?
Do we just say that a '?' really means '.?' and a '*' really means '.*'?

Finally we'll need some pretty significant changes to htsearch's use of
punctuation to get this to work properly. Remember that valid_punctuation
by default includes a few regex control characters. :-(

-Geoff

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Wed May 05 1999 - 10:34:05 PDT