Re: [htdig] url_part_aliases

Subject: Re: [htdig] url_part_aliases
From: Geoff Hutchison (
Date: Wed Sep 06 2000 - 10:31:13 PDT

On Wed, 6 Sep 2000, Gilles Detillieux wrote:

> > As I understand it, url_part_aliases work by substituting a string
> > portion within the URL with a corresponding numeric value. If now
> > 2 databases get merged into a new one and both have different defi-
> > nitions for url_part_aliases, this should cause a conflict.

Yes, this is correct. When I wrote the merge code for 3.1.0, I thought
about the url_part_aliases, but decided to leave it alone. In retrospect,
this may seem a bit unclear, but it would take additional care to make
sure all the URLs are recoded correctly. Since documents are stored by URL
in 3.1.5, you would essentially have to rebuild a new document DB from the
ground up with a new encoding after decoding all URLs in the component
DBs. (This is the worst-case scenario, but I didn't like the prospect of
that either.) This is all one reason storing the documents keyed by URLs
isn't as useful as storing by DocID.

> The author (I think it was Hans-Peter) merely suggested using values
> like *1, *2, etc., to avoid conflicts with existing patterns in URLs,
> which would then have to be encoded themselves using up more space.

Correct on both counts.

> As for maintaining the same values for url_part_aliases, I'd agree that
> this is a good idea for all the htdig and htmerge operations, though I'm
> not 100% certain the author intended for this to be necessary.

No, but Hans-Peter and I were writing the merge code and url_part_aliases
in parallel. So I didn't think much about his code because I couldn't see
it and vice-versa.

N.B. This shouldn't be an issue in 3.2 since the databases are keyed
differently, but I'll make sure to test this.

In any case, it would be very useful to get platform information to winnow
down the possibilities.

-Geoff Hutchison
Williams Students Online

