Re: htdig: Rewriting URL's in db.docs.index possible?


Marjolein Katsma (HSH@taxon.demon.nl)
Tue, 12 Jan 1999 07:41:00 +0100


Doug,

At 16:49 1999-01-11 -0800, you wrote:
>Doug wrote:
>>
>> Greetings, :)
>>
[snip]
>> a variety of reasons, after you access our site for the first time you
>> are redirected from "www.simplenet.com" to "www1.simplenet.com." Part of
>> the reason for this is to append a tracking number, like
>> ...html?000.lotsmorenumbershere. This isn't the end of the world,
>> however what we'd really like to do is rewrite indexed url's of the form
>> "http://www1.simplenet.com/path/file.html?" to
>> "http://www.simplenet.com/path/file.html".
>
> Digging into this some more (Ok, bad pun :) it looks like my best bet
>is to tell htdig to create a text version of the database, modify that,
>then either find a way to make htmerge use it (doesn't look likely) or
>convert the text db to the BDM db that htmerge expects. So.... any hints
>on that? :)

I'd take another approach. Define a configuration parameter that pairs
old-string new-string (maybe any number of pairs)
and apply these replacements to the URL just before adding it to the
database. (SWISH-E has a similar mechanism.)

Haven't looked in the code where this last is happening though.

>
>Any suggestions welcome,
>
>Doug
>----------------------------------------------------------------------
>To unsubscribe from the htdig mailing list, send a message to
>htdig-request@sdsu.edu containing the single word "unsubscribe" in
>the body of the message.

Marjolein Katsma
Java Woman - http://javawoman.com
HomeSite Help - http://hshelp.com/
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Wed Jan 13 1999 - 09:13:05 PST