Re: [htdig] url_part_aliases


Subject: Re: [htdig] url_part_aliases
From: Jim Cole (greyleaf@yggdrasill.net)
Date: Fri Sep 08 2000 - 00:52:43 PDT


Gilles Detillieux's bits of Thu, 7 Sep 2000 translated to:

>According to Jim Cole:
>> If I have this right, then I believe the last bit of the trick to get
>> things working the way I want is to run htdig on database A just as I
>> have been doing, but run htdig on database B using the url_part_aliases
>> settings in the search config file. Or maybe I am just deluding myself :)
>> Off to dig again.
>
>This sounds like the correct approach, given your description of the
>situation. Please let us know if this does the trick. If it does, then
>it would appear that Stefan's problem, though similar in symptoms, might
>have a different cause, unless he's been doing multiple digs and merges
>and not mentioning that.

Hi - Using the search version of the url_part_aliases for digging the second
database appears to have solved part of the problem. Documents from the
second database are now showing up on the result pages. However, it appears
that the main database also suffers the from the same problems that were
previously being introduced by the second. That is, if htdig comes across
a URL such as /~name/fileA that should be mapped to /newname/fileA and there
is also a URL such as /newname/fileB that should be left alone, then when it
comes to search time, the URL that was correct to begin with is rewritten
using url_part_aliases and can't be found in the database. Maybe this is
the same type of problem Stefan is running into.

I suspect I am just trying to do something with url_part_aliases that it
is neither intended for nor capable of :( All I really want to do is
selectively rewrite some URL's. We have a large number of accounts that,
for reasons I won't go into, have two valid URL's for each page. And a lot
of the pages have links that mix the two forms. But in order to correctly
filter the results, I need to make sure the resulting URL's all have a
consistent form.

Is there maybe a single point in the htdig code where each new URL can
be accessed as it is added? I am thinking it might be best if I just
hack in my own rewrite code to get me by until 3.2 is ready for prime
time.

Thanks.

Jim

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Fri Sep 08 2000 - 00:54:44 PDT