Torsten Neuer (tneuer@inwise.de)
Wed, 5 May 1999 15:12:23 +0200
According to Geoff Hutchison:
>At 3:43 AM -0400 5/5/99, Torsten Neuer wrote:
>>We could do so simply by putting regexp in double quotes.. anything
>>else will be handled as usual, e.g.
>>
>>start_url: http://www.foo.com/
>>limit_urls_to: ${start_url} \ # as in start_url
>> "\.*.html" \ # regexp match
>> /bar/ # again, a normal match
>>
>>Internally, each entry gets a "type descriptor" that dispatches the
>>value to the correct handler, i.e. a virtual method.
>
>I like this idea. Quotes aren't a bad choice, but it would be nice to pick
>a character that could be used in the htsearch fields too. Maybe:
>
>start_url: http://www.foo.com/
>limit_urls_to: ${start_url} \ # as in start_url
> [\.*.html] \ # regexp match
> /bar/ # again, a normal match
>
>Internally, both types of limits would be regexp, but we'd escape those
>that weren't enclosed in brackets (or whatever). So in the above case, the
>limit becomes:
>
>limit_urls_to: http://www\.foo\.com/|\.*.html|/bar/
>
>Does this make sense?
Well.. AFAIK brackets are used in regexp to define sets, so you'll
run into trouble with a regexp like [^-%]* (which matches anything
but '-' and '%'). Parenthesis are AFAIK used to create groups, so
they are reserved for regexp as well. I'm not sure about {} though.
cheers,
Torsten
-- InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH Waldhofstraße 14 Tel: +49-4101-403605 D-25474 Ellerbek Fax: +49-4101-403606 E-Mail: info@inwise.de Internet: http://www.inwise.de------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev@htdig.org containing the single word "unsubscribe" in the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Wed May 05 1999 - 06:28:47 PDT