Re: [htdig] patch to parse URLs?


Torsten Neuer (tneuer@inwise.de)
Thu, 12 Aug 1999 10:44:34 +0200


According to Geoff Hutchison:
>On Wed, 11 Aug 1999, Leonard J. Hunt wrote:
>
>> http://www.learn2.com/cgi-bin/learnline?23@^3290@14%40
>> I am looking for a patch for htdig to take the user id
>> (^3290 in this case) out of the URL before it gets indexed
>> as a unique url. I set the server_max_docs so that htdig
>
>No offense, but it's not entirely obvious what the "user id" is from that
>particular URL. While I think there are people who'd be willing to provide
>such a patch, w/o knowing what part of a query string is to be removed,
>it's not particularly easy.
>
>Most query strings I'm famililar with take the form
>?key1=value&key2=value.... In this form, it's fairly easy to figure out
>what fields are "user id" or whatnot.

It might be better to restructure the page instead.

With Apache, you can use the ForceType directive to force execution
of a script for a directory location, thus providing you with a vir-
tual directory. Then you can use the exclude/restrict parameters of
htsearch to do whatever you like.
  With this method, your URL might look like
    http://www.learn2.com/cgi-bin/learnline/23/3290/14/
instead, which is much easier to handle (not only for ht://Dig, but
for any other software, too) ;-)

cheers,
  Torsten

--
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstraße 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: info@inwise.de            Internet: http://www.inwise.de

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig@htdig.org containing the single word unsubscribe in the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu Aug 12 1999 - 01:51:26 PDT