[htdig] Server Aliases, redirects etc.

Walter Hafner (hafner@informatik.tu-muenchen.de)
Fri, 12 Feb 1999 16:50:18 +0100 (MET)


First of all: I just installed 3.1.0 (final version) on my box (FreeBSD
2.2.8 STABLE, P-II/300, 128 MB) and it seems to run flawlessly. It's
still in the indexing stage, but this works ok.

However, I'm still struggling with my ht://Dig setup ...

- more then 300 machines with WWW servers
- more than 200.000 physical pages
- The majority of the servers have one ore more aliases.

Several of the machines listen to as much as 4 names and the WWW Server
responds to each of them! Even worse, absolute URLs in these servers
point from one alias to a different one _and_ each server contains
documents with relative links. So I end up, indexing servers with
several hundred documents up to four times.

This kind of environment brings ht://Dig to its knees. Disabling virtual
hosts would be a solution, unfortunately I have lots of _real_ virtual
hosts to index. So I just _have_ to "allow_virtual_hosts" which in turn
results in lots of unnecessary queries.

The "server_aliases" option is _not_ what I want. It is impossible to
maintain such a big namespace manually.

Last time I counted, ht://Dig reported 450 servers, despite a _long_
list of mappings in the "server_aliases" section of my configuration.

Some time ago I wrote about Netscape Compass Server and the way it deals
with server aliases: If two pages with different URLs share the same IP
number, it checks the contents of the pages. If they are the same,
Compass assumes it to be a server alias.

Is such a behaviour in the queue for ht://Dig? If yes, when will it be
available (I know, I know ...)? Unfortunately I don't know C++ at all...

Today the following idea occured to me: What if I'd install a Squid
proxy and configure ht://Dig for proxy usage. Since squid is written in
C (hehe), I could patch it to return "301 Moved Permanently" for certain
URLs. This way I could do pretty much anything I want. Well, I have to
code it of course, but I see no problem there.

Question: How dows ht://Dig react to 301 ? Does it discard the URL? Does
it follow the new URL?



Walter Hafner__________________________________ hafner@in.tum.de
         <A href=http://www.tum.de/~hafner/>*CLICK*</A>
  "Multiple exclamation marks," he went on, shaking his head,
"are a sure sign of a diseased mind."  (Terry Pratchett, "Eric")
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Wed Feb 17 1999 - 10:10:03 PST