Re: [htdig3-dev] Regex


Torsten Neuer (tneuer@inwise.de)
Wed, 5 May 1999 09:43:05 +0200


According to Peter D. Gray:
>On Tue, May 04, 1999 at 09:37:15PM -0400, Geoff Hutchison wrote:
>>
>> On Wed, 5 May 1999, Peter D. Gray wrote:
>>
>> > This should (assuming I understand what you want)
>> >
>> > limit_urls_to: .*\.htdig\.org/.*\.html
>>
>> Beautiful! I'm a fool.
>>
>> But what are we going to do about the typical config file:
>>
>> start_url: http://www.htdig.org/
>> limit_urls_to: ${start_url}
>>
>
>I think you will need an option on htdig
>to enable the regular expression stuff or a build option
>with appropriate warnings to installers that they
>will need to update their config files if they
>enable the regex code.
>

Not necessarily.

Since start_url cannot contain a regexp, making it one on-the-fly
should do the job. And since this would be a trivial regexp (i.e.
without any real regexp) this should be fairly easy, too.. just
something like "/\./s//\./g" on a sed(1) command line.

That way you could mix up start_url and regexp in limit_urls_to
without bigger changes.

Another possibility would be to add a type to those configuration
variables, either automatically or maybe forced by directive syntax.
The configuration file however, should stay backward compatible to
avoid trouble with users.

We could do so simply by putting regexp in double quotes.. anything
else will be handled as usual, e.g.

start_url: http://www.foo.com/
limit_urls_to: ${start_url} \ # as in start_url
                "\.*.html" \ # regexp match
                /bar/ # again, a normal match

Internally, each entry gets a "type descriptor" that dispatches the
value to the correct handler, i.e. a virtual method.

I'd regard the first solution as a quick hack while the latter would
be a proper solution for a production system.

cheers,
  Torsten

--
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstraße 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: info@inwise.de            Internet: http://www.inwise.de
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Wed May 05 1999 - 01:08:34 PDT