htdig: Rewriting URL's in db.docs.index possible?


Doug (DougB@simplenet.com)
Mon, 11 Jan 1999 13:01:52 -0800


Greetings, :)

        I am evaluating htdig for use in our company and I have to say that
after reviewing multiple search engine products I am quite impressed
with htdig. Barring influence from cosmic rays or some such, this is the
product we are going with.

        I just signed up for this list, so my apologies if this has been
covered. I did search the archives but didn't find anything similar. For
a variety of reasons, after you access our site for the first time you
are redirected from "www.simplenet.com" to "www1.simplenet.com." Part of
the reason for this is to append a tracking number, like
...html?000.lotsmorenumbershere. This isn't the end of the world,
however what we'd really like to do is rewrite indexed url's of the form
"http://www1.simplenet.com/path/file.html?" to
"http://www.simplenet.com/path/file.html".

        I first tried writing a perl script that modifies the database file
directly after the index is complete, however when it rewrites the url's
it also corrupts the database. :-/ So, next step is to start digging
into the code and see what I can do, but I figured that before I did so
I'd ask whether or not someone has tackled a similar problem in the
past.

        Any suggestions or comments welcome. I have some facility with C, sh
and perl; so hints, patches, etc. would be fine. :) Of course, anything
we come up with will be contributed back.

TIA,

Doug
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Wed Jan 13 1999 - 09:13:04 PST