Re: [htdig] Suse 6.2 + htdig 3.1.5 problems ;-)


Subject: Re: [htdig] Suse 6.2 + htdig 3.1.5 problems ;-)
From: Dave Lers (dave@dalrun.com)
Date: Sat Apr 29 2000 - 08:17:31 PDT


On Sat, 29 Apr 2000, Peter L. Peres wrote:

> by changing permissions as before. The loop was here (note that I have
> cut off some part of the output. These are output lines from htdig -v):
>
> 54531:54531:4:http://myhost/doc/susehilf/gnu/gcc/?N=D:
> 54532:54532:4:http://myhost/doc/susehilf/gnu/gcc/?M=A:
> 54533:54533:4:http://myhost/doc/susehilf/gnu/gcc/?S=A:
> 54534:54534:4:http://myhost/doc/susehilf/gnu/gcc/?D=A:

From a recent post by Geoff:

> *How does Htdig handle those foo/?=D type auto indexes (an Apache thing?)?
> Watching dig I seem to remember a long run of *'s (I ran one search script that
> indexed these as separate URL's)

Sigh. If you have Apache's FancyIndexing turned on, you'll get links at
the top. Since these are links to "new pages" you'll get essentially
duplicate copies of these indexes, though the pages linked from them
aren't affected.

I usually add "?" to exclude_urls to get rid of these. There's not much
the indexer can do since they really are different pages.

From a recent post by Gilles:

If you need to index any CGI scripts with URL parameters, and therefore
can't exclude all URLs containing a "?", you can add more specific patterns
to exclude_urls to exclude the duplicate index pages. E.g.:

exclude_urls: ?D=A ?D=D ?M=A ?M=D ?N=A ?N=D ?S=A ?S=D

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Sat Apr 29 2000 - 05:54:36 PDT