[htdig] Re: htdig: htdig URL duplication

Danny Birchall (D.P.Birchall@sussex.ac.uk)
Tue, 26 Jan 1999 15:59:43 +0000 (GMT)

# > For historical reasons we have in our server configuaration a number of
# > server aliases which serve to cut out part of the URL to make it
# > shorter: eg www.sussex.ac.uk/Units/foo/ becomes www.sussex.ac.uk/foo/.
# >
# > This causes a problem when running htdig, because inevitably somewhere
# > within our document tree, pages will be referenced both as /Units/foo/
# > and as /foo/. Result: ht://Dig indexes the same pages twice, with
# > different URLs, and when a htsearch is run, each pages is returned
# > twice, once with each URL.
# Configure your Web-server to make http://www.sussex.ac.uk/Units/foo/ a
# redirect to http://www.sussex.ac.uk/foo/ instead of an alias.

It would be the other way round (/Units/foo/ is the 'real' directory, /foo/ is
the alias). But we can't do this either (we tried already), because the
configuaration of other services depend on this.

Does htdig create a list of URLs before it creates its databases? Could this
list be doctored to remove duplicates before the database is built?


Danny Birchall
University of Sussex Information Service

Tel: (0)1273 678745
Fax: (0)1273 678441

To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Tue Jan 26 1999 - 08:10:38 PST