Re: [htdig] Not all domains indexed.


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Fri, 5 Mar 1999 14:56:12 -0600 (CST)


According to info@edoc.co.za:
>
> Thanks for the advice.
>
> I tried it and got the following message
>
> Rejected: URL not in the limits!
> url rejected: (level 1)http://www.iwd.co.za/
>
> This is part of the start url definition.
>
> What am I missing?
>
> Thanks
>
> Nico

These error messages suggest the URL wasn't found in limit_urls_to.
There were errors in the StringMatch class in 3.1.0b*, which may account
for this problem. If you're not running 3.1.1, you may want to try
upgrading, to see if that fixes the problem.

Another thing you may want to try is to set

limit_urls_to: ${start_url}

and nothing else in the list. The URL being rejected above is in your
start_url list, so that should be enough to have it accepted. Still,
unless there's a problem in your StringMatch class, I can't see why
it would reject something that clearly matches an entry in the list.

If the problem persists with htdig 3.1.1, please post your whole
htdig.conf, and hopefully someone on the list with more experience in
the StringMatch class can help out.

> > According to info@edoc.co.za:
> > > I'm running a number of virtual domains on server and use htdig to index
> > > them.
> > >
> > > Unfortunately Htdig does not index all of them. Only the domains
> > > up (and including) antiquemall are indexed.
> > >
> > > I do believe I've done somtehing stupid, but can not find it.
> > >
> > > Thanks for your help in advance.
> > >
> > > Nico
> > >
> > > My htdig.conf contains the following lines:
> > >
> > > start_url: http://www.edoc.co.za/\
> > > http://smithfield.co.za\
> > > http://iwd.co.za\
> > > http://booyens.co.za\
> > > http://giftacres.co.za\
> > > http://antiquemall.co.za\
> > > http://ossewa.co.za\
> > > http://unclesmiths.co.za\
> > > http://iad.co.za
> > >
> > > limit_urls_to: ${start_url}\
> > > edoc.co.za\
> > > smithfield.co.za\
> > > iwd.co.za\
> > > booyens.co.za\
> > > giftacres.co.za\
> > > antiquemall.co.za\
> > > ossewa.co.za\
> > > unclesmiths.co.za\
> > > iad.co.za
> > >
> > > local_urls:http://www.edoc.co.za/=/usr/wwwusers/edoc/edoc/\
> > > http://www.smithfield.co.za/=/usr/wwwusers/iwd/smithfield/ \
> > > http://www.iwd.co.za/=/usr/wwwusers/iwd/iwd/\
> > > http://www.booyens.co.za/=/usr/wwwusers/iwd/booyens/\
> > > http://www.giftacres.co.za/=/usr/wwwusers/iwd/giftacres/\
> > > http://www.antiquemall.co.za/=/usr/wwwusers/iwd/antiquemall/\
> > > http://www.ossewa.co.za/=/usr/wwwusers/iwd/ossewa/\
> > > http://www.unclesmiths.co.za/=/usr/wwwusers/iwd/unclesmiths/\
> > > http://www.iad.co.za/=/usr/wwwusers/edoc/iad/
> >
> > It's a good habit to always put a space before the backslash at the end of
> > the line. Not only does it make it more apparent, but I believe that if
> > there's no space or tab before or after a backslash, as is the case in
> > many lines of your local_urls declaration, the strings will get
> > concatenated.
> >
> > It's also not a bad idea to always put the trailing forward slash after
> > the server name in an URL, in the start_url list (as you did on the first
> > line). This avoids having to get and process a redirect from the server.
> > That shouldn't prevent indexing, though, as long as the server is running.
> >
> > That last point may be the key. Even if you're using local_urls, htdig
> > still must make an initial connection with the http server for every
> > domain (or virtual domain) you index. If your system isn't responding to
> > http requests to the last 3 virtual domains, that would prevent them from
> > being indexed.
> >
> > If this doesn't help, try running htdig with -vvv to see if it gives any
> > feedback as to why it's skipping these domains. (This produces LOTS of
> > output.)
> >
> > --
> > Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca>
> > Spinal Cord Research Centre WWW:
> > http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba
> > Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax:
> > (204)789-3930
>
>

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Mon Mar 15 1999 - 08:57:45 PST