Re: [htdig] match part of URL?

Torsten Neuer (
Mon, 21 Jun 1999 19:17:32 +0200

According to Geoff Hutchison:
>Daniel Naber wrote:
>> can you say how difficult it is to add this feature? If you point me to
>> the files
>> to change, and if it's not to difficult, I could try to add this.
>I did send a response, and it's not too difficult. But see below.
>> An example of what I mean: Someone searches for "foobar" and gets
>> as a result, even if that file
>> doesn't
>> contain the string "foobar".
>Now the initial request was more along these lines (which is easier):
>The request was to match "foo" or "bar" or "blah." For your example,
>you'd have to decide if "~" is to be stripped out (I'd say yes) and
>whether you'll just go with prefix matching to get "foobar" from
>If someone submits a function that splits a URLs into words, I'll finish
>it. It's a matter of a time tradeoff--I'd rather work on things other
>than that function and it's probably faster for me to put in the correct
>place (in

To add a few quick thoughts on that URL splitting function:
- I'll assume the protocol identifier and the server name to be
  stripped out.
- I'll assume the file extension of the document to be stripped

This could easily be achieved for trivial URLs with the upcoming
regexp support ;-)

However.. let's think of some more complex URLs:
  ^^^^ ^^^ ^^^ ^^^ !!!! !! ^^^ ^^^^ ?? ??? ??? !!!! ^^^^

(^ = stripped out / ! = included / ? = included, but confusing)

If we like to have HTTP GET parameters included in this function,
we could run into trouble. But without the parameters the search
method might not be useful for sites with dynamic contents.

So what?


