Re: [htdig] robots.txt results in not indexing a whole site?


Subject: Re: [htdig] robots.txt results in not indexing a whole site?
From: boerio@arocknid.com
Date: Fri Aug 18 2000 - 08:32:44 PDT


>According to boerio@arocknid.com:
>> I'm using ht://Dig 3.1.4 on a Linux platform, and noticed that only one
>> single site from some of my URL entries were getting indexed. I turned on
>> all the debugging information, and this appears throughout:
>>
>> Rejected: Item in the exclude list: item # 3 length: 1
>
>This error message refers to the third item in the exclude_urls attribute.
>Unfortunately, there is no clear documentation explaining which error
>messages correspond to which config attributes, and the error messages
>themselves are not clear enough, so the only sure way to track down
>some of these errors right now is to search for the messages in the
>source code.
>
>> url rejected: (level 1)http://www.DOMAIN.com/index.html
>
>Again, an unclear message. Level 1 refers to the first round of
>tests the URL must pass, based on bad_extensions, valid_extensions,
>accepted protocol (http only for 3.1.x), exclude_urls, limit_urls_to, and
>bad_querystr. This message appears at verbosity of 2 (-vv) or greater.
>You need verbosity of at least 3 (-vvv) to get a better explanation,
>which you did get in the earlier message above.
>
>This is unrelated to robots.txt, which is checked at a later stage, and
>gives a message about the URL being discarded, i.e.:
>
> robots.txt: discarding http://whatever...
>
>> My problem is likely in this "exclude list" but I don't know where that's
>> coming from. There's nothing in the htdig.conf file that would indicate
>> such a list, and I don't think I'm intentionally doing anything.
>
>The htdig.conf file doesn't come anywhere close to including all possible
>configuration attributes. There are tons of them, and they all have
>compiled-in defaults, so just because an attribute isn't in htdig.conf
>it doesn't it's not set or used. You need to check the documentation
>for the default settings of attributes not in your config file.
>
>> I perused htdig.org and the faq, and perhaps I missed something,
>
>http://www.htdig.org/attrs.html
>
>> or perhaps
>> its fixed in a different version, or more likely, is just something I don't
>> have a clue about :-)
>
>Well, in this case it's not fixed in the latest version, because it's not
>a bug. However, there are some important bug fixes (including a major
>security hole which is patched) in 3.1.5.
>
>See:
>http://www.htdig.org/RELEASE.html
>http://www.htdig.org/ChangeLog
>
>and for some fixes and enhancements since 3.1.5's release:
>
>http://www.htdig.org/FAQ.html#q2.5

Thanks for the responses. In my htdig.conf file, there was a rogue "n" in
my exclude line. This was causing nearly everything to not get indexed!

     - Jeff

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Fri Aug 18 2000 - 08:32:54 PDT