Re: [htdig] Not all domains indexed.


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Thu, 4 Mar 1999 10:44:02 -0600 (CST)


According to info@edoc.co.za:
> I'm running a number of virtual domains on server and use htdig to
> index them.
>
> Unfortunately Htdig does not index all of them. Only the domains
> up (and including) antiquemall are indexed.
>
> I do believe I've done somtehing stupid, but can not find it.
>
> Thanks for your help in advance.
>
> Nico
>
> My htdig.conf contains the following lines:
>
> start_url: http://www.edoc.co.za/\
> http://smithfield.co.za\
> http://iwd.co.za\
> http://booyens.co.za\
> http://giftacres.co.za\
> http://antiquemall.co.za\
> http://ossewa.co.za\
> http://unclesmiths.co.za\
> http://iad.co.za
>
> limit_urls_to: ${start_url}\
> edoc.co.za\
> smithfield.co.za\
> iwd.co.za\
> booyens.co.za\
> giftacres.co.za\
> antiquemall.co.za\
> ossewa.co.za\
> unclesmiths.co.za\
> iad.co.za
>
> local_urls:http://www.edoc.co.za/=/usr/wwwusers/edoc/edoc/\
> http://www.smithfield.co.za/=/usr/wwwusers/iwd/smithfield/ \
> http://www.iwd.co.za/=/usr/wwwusers/iwd/iwd/\
> http://www.booyens.co.za/=/usr/wwwusers/iwd/booyens/\
> http://www.giftacres.co.za/=/usr/wwwusers/iwd/giftacres/\
> http://www.antiquemall.co.za/=/usr/wwwusers/iwd/antiquemall/\
> http://www.ossewa.co.za/=/usr/wwwusers/iwd/ossewa/\
> http://www.unclesmiths.co.za/=/usr/wwwusers/iwd/unclesmiths/\
> http://www.iad.co.za/=/usr/wwwusers/edoc/iad/

It's a good habit to always put a space before the backslash at the end
of the line. Not only does it make it more apparent, but I believe that
if there's no space or tab before or after a backslash, as is the case
in many lines of your local_urls declaration, the strings will get
concatenated.

It's also not a bad idea to always put the trailing forward slash after
the server name in an URL, in the start_url list (as you did on the
first line). This avoids having to get and process a redirect from the
server. That shouldn't prevent indexing, though, as long as the server
is running.

That last point may be the key. Even if you're using local_urls, htdig
still must make an initial connection with the http server for every
domain (or virtual domain) you index. If your system isn't responding
to http requests to the last 3 virtual domains, that would prevent them
from being indexed.

If this doesn't help, try running htdig with -vvv to see if it gives any
feedback as to why it's skipping these domains. (This produces LOTS of
output.)

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu Mar 04 1999 - 09:09:19 PST