Re: Summary and clarification [was: Re: [htdig] Opinion of Microsoft Index Server and Netscape Compass]


Shantonu Sen (ssen@mit.edu)
Thu, 20 May 1999 12:18:51 -0400 (EDT)


my two cents on ultraseek:

we have an installation of ultraseek on our campus, and i've had more bad
experiences with it than good. i won't argue for the search capabilities,
since i'm fairly certain that the infoseek people are paid to make it
good.

however, the problem i've run into probably pertains to you. our web
server has four alias (all of which point to the same "site"). when the
server was first set up, it indexed the site by the canonical name (which
wasn't what we wanted). after a while, it switched everything so that all
search hits pointed to the alias i had originally wanted. then after a few
more weeks, it realized my web server had another alias, and switched to
that. a few weeks ago, i got two more aliases, and all ready ultraseek is
treating all of my pages as being on one of the NEW aliases!!!.

in any event, ultraseek has the capability to remove duplicate pages from
its index, however you never know what alias it's going to index your
machine as. if it finds an alias to your machine, it might just reindex
everything, throw out the old duplicate files, and keep cycling through
this.

my personal solution (which i implemented too late) is to have two
separate of virtual hosts. one virtual host will be what you want to
present to the outside world, and what you want pages to be indexed as.

the second virtual host container will contain all your serveralias
directives to respond to all the aliases, and will have the same
configuration, in general. HOWEVER, the important difference is that it
should have:

Alias /robots.txt /wwwfake/robots.txt

and this fake robots.txt should be EXTREMELY restrictive. so basically,
when any spider tries to access your machine by one of the aliases, it
will be disallowed by the robots.txt. when it accesses your machine by the
intended machinename, it'll get a real robots.txt file.

i'm not sure if this will work for you, in that you might not be running
unix web-servers. i'm not sure if MS IIS has any sort of support for this.
(specifically, having a different robots.txt when the machine is
referenced by a different name.

Shantonu Sen

On Thu, 20 May 1999, Walter Hafner wrote:

>
> Albert Desimone jr writes:
>
> > Anyway, I just had the occasion to look at Infoseek Ultraserver. The
> > cost, even with an educational discount, was a bit rich for our blood.
> > We might have gone for the initial cost of 24K (70% educational discount;
> > you .com folks do the math), but the 12K yearly upgrade/maintenance cost
> > was way too much.
>
> I got the same offer (70% off). Couldn't afford it either.
>
> > Sure, if I could have gotten the money, and did not feel that I was
> > being fiscally irresponsible, I might have pushed harder for
> > Infoseek Ultraserver.
>
> Yup.
>
> Geoff Hutchison writes:
>
> > What Walter Hafner is asking for is probably better termed server aliases,
> > or duplicate hostnames. He has multiple names for the same host
> > (essentially 'soft' virtual hosting), but these do not always correspond to
> > different sites. So he'd like to have ht://Dig realize that the names are
> > different, but the sites are the same. Yes, we already have a
> > server_aliases feature, but what he wants (rightly so) is for a more
> > automatic solution to the problem. (Please correct me if I'm wrong here
> > Walter.)

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu May 20 1999 - 08:34:56 PDT