Re: htdig: redirected url


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Tue, 13 Oct 1998 17:18:27 -0500 (CDT)


According to Phillip Morgan:
>
> More info on the redirectr I'm suffering..
>
> when htdig.conf specifies limit_urls as...
>
> limit_urls: http://www.ehcs.com.au/
>
> I get '37:37:3:http://www.ehcs.com.au/~thera: redirect'
>
> and thera's pages are not indexed.
>
> If I specify 'limit_urls: http://www.ehcs.com.au/ http://clam.ehcs.com.au/' I get
>
> New server: clam.ehcs.com.au, 80
> 38:720:3:http://clam.ehcs.com.au/~thera/: +-+ size = 4120
> 40:721:4:http://clam.ehcs.com.au/~thera/review1.htm: -+ size = 6513
> 42:722:4:http://clam.ehcs.com.au/~thera/therawrp.htm:
> -------------------+----------------* size = 8982
>
> and the pages are indexed.
>
> If I have both www.ehcs.com.au and clam.ehcs.com.au in the url list
> am I going to get every page indexed twice?
>
> Another user, ajayh, is not being redirected. They index fine. The
> only difference is thera has a robots.txt file in her homedir,
> whereas ajayh does not. The robots.txt file was added because thera
> used to be a customer of Netspace, and when she left they apparently
> got quite upset and appear to have left her page up so that search
> engines won't catalog her new pages at the new location. At least,
> that's her opinion given that several communications from her to them
> have not resulted in the removal of her pages from their systems.
>
> btw: the robots.txt file simple states 'disallow: http://www.netspace.com.au/~therea'

OK, a couple point about robots.txt are worth clarifying. First, spider
normally look for this only in the document root directory of the server,
not in individual users' home directories, so that file is probably never
being seen. Secondly, the "disallow" statements apply only to the server
on which the robots.txt file is located. You can't disallow pages on other
servers. Whether Netspace leaves her old pages up or not, that shouldn't
prevent search engines from cataloguing her new pages. However, the
redirects that are occurring might pose a problem, if the search engines
can't follow the redirects.

> I'm going to set the limit_urls to 'http://clam.ehcs.com.au/' only and
> see what I get. However, I would really like to know what's causing
> the redirect.

Look at the server's srm.conf file, as well as any .htaccess files under
thera's home directory and subdirectories. Any of these could potentially
contain redirect statements.

> I suspect it's not a problem of htdig as thera has reported that other
> search engines are also being redirected, but htdig is the one I'm
> using, hence the posting here.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:30 PST