Summary and clarification [was: Re: [htdig] Opinion of Microsoft Index Server and Netscape Compass]

Walter Hafner (
Thu, 20 May 1999 17:03:31 +0200 (METDST)

Albert Desimone jr writes:

> Anyway, I just had the occasion to look at Infoseek Ultraserver. The
> cost, even with an educational discount, was a bit rich for our blood.
> We might have gone for the initial cost of 24K (70% educational discount;
> you .com folks do the math), but the 12K yearly upgrade/maintenance cost
> was way too much.

I got the same offer (70% off). Couldn't afford it either.

> Sure, if I could have gotten the money, and did not feel that I was
> being fiscally irresponsible, I might have pushed harder for
> Infoseek Ultraserver.


Geoff Hutchison writes:

> What Walter Hafner is asking for is probably better termed server aliases,
> or duplicate hostnames. He has multiple names for the same host
> (essentially 'soft' virtual hosting), but these do not always correspond to
> different sites. So he'd like to have ht://Dig realize that the names are
> different, but the sites are the same. Yes, we already have a
> server_aliases feature, but what he wants (rightly so) is for a more
> automatic solution to the problem. (Please correct me if I'm wrong here
> Walter.)

Yes, that pretty much sums it up! Perhabs my english is too limited to
give a clear impression of my needs.

> Never fear--It's not as easy as we'd like, but I, for one, want general
> duplicate page elimination in the next release. So it will be in there. :-)

I keep my fingers crossed. :-)

Thorsten Neuer writes:

> As for server aliases, I regard them as just a kludge for misconfigured
> servers in most cases. The only cases I can think of where server aliases
> are useful is, where you have multiple hosts with the same contents in
> different, i.e. remote, networks which you cannot control directly.
> Alas, this can be handled quite flexible in a Perl or PHP wrapper using
> regular expression search and replace. I think it will also quite easy
> to implement in the new ht://Dig versions where regex will be available.

Here is a _real_ example from our university.

The WWW servers of the chairs in the computer science faculty are named
after the official number of the chair:

Some professores weren't satisfied by these names and so the first alias
was introduced:

Then, a shorter domain was introduced for most of the universities

For orthogonality, the "name" versions had to be available, too:

So, all the CS webservers are known under _at_least_ 4 different names.

All in all, the TU Muenchen has about 350 webservers (physical servers,
aswell as hard and soft virtual servers), that respond to more than 600

In your next mail you write:

> IMHO aliases for a server which deal with the same host and the same
> machine should generally be handled by the server itself by automa-
> tically redirecting one base URL to the other.

The servers are administrated in the faculties. While I agree with you,
I can't force the admins in any way.

> People should think
> of this more, since they cannot configure the "big ones" on the inter-
> net like they can do with their "local ones", i.e. any search robot
> on the web will regard those sites as being "different" since they
> cannot be told that they are not.

As a matter of fact, the robots _can_ be told - or rather, the robots
can figure it out by themselves. It's just rather time consuming to
program it. Scan the archive for my description, how it is done in

Gilles Detillieux writes:

> Could you elaborate on what you mean by "better URL filtering for index
> occlusion?"

Sure. ht://Dig does occlusion based on URLs only.

I think of features like occlusion based on mime type, file size (ok,
ht://Dig does that, too), file age, etc.

Compass applies "rules" to URLs to determine, if they are to be indexed
of not. These rules can be tuned by the Compass admin. You can use
boolean expressions based on the above features to create your own
filter functions.


-Walter Hafner

Walter Hafner__________________________________
         <A href=>*CLICK*</A>
  "Multiple exclamation marks," he went on, shaking his head,
"are a sure sign of a diseased mind."  (Terry Pratchett, "Eric")
To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Thu May 20 1999 - 07:16:26 PDT