Re: [htdig] htdig won't ignore the files I want ignored


Subject: Re: [htdig] htdig won't ignore the files I want ignored
From: Paul Johnson (pauljohn@ukans.edu)
Date: Fri Dec 17 1999 - 09:31:10 PST


OK, this advice worked. Inserting the <META...> command in the HEAD of
the index.html files stops the htdig indexing from including those
files. GREAT!

I found the other part about automating this by editing mhonarc
resources helpful as well.

But the solution Nathaniel proposes is general and will work for anybody
who wants to stop htdig from including a particular file in an index and
excluding it in the htdig.conf file is not enough. Could I suggest this
is a FAQ, like so:

Q: How can I prevent htdig's index from including files like index.html
that are automatically found by the web browser when doing a crawl over
a directory structure.

A: In each index.html file you want to exclude, add the following
between the <HEAD> and </HEAD> tags:

  <META NAME="robots" CONTENT="noindex, follow">

The insertion of this line can be made automatic in MhonArc by inserting
that line in the resource file in the sections IDXPGBEGIN and
TIDXPGBEGIN

Nathaniel Irons wrote:
>
> On 12/16/99 at 6:41 PM, pauljohn@ukans.edu (Paul E. Johnson) wrote:
>
> > Of course, when you click on that link, and open that directory, you
> > end up reading the index.html file. But even the browser does not
> > include "index.html" as the file being read, it just has
> >
> > http://raven.cc.ukans.edu/~kups/maillist/polsannounce/
> >
> > at the top.
>
> What you want to do is to modify your IDXPGBEGIN and TIDXPGBEGIN
> resources (as well as any other index pages you may have specified) to
> include
>
> <META NAME="robots" CONTENT="noindex, follow">
>
> in the html head. Links on the page will be followed, but the page
> itself will be safely deleted from the index. You'll have to rebuild
> your mhonarc archives (or at least run them over with EDITIDX), and then
> rebuild your htdig archive.
>
> -nat

-- 
Paul E. Johnson                       email: pauljohn@ukans.edu
Dept. of Political Science            http://lark.cc.ukans.edu/~pauljohn
University of Kansas                  Office: (785) 864-9086
Lawrence, Kansas 66045                FAX: (785) 864-5700

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Fri Dec 17 1999 - 09:46:05 PST