Re: [htdig] indexing subdirectories


Subject: Re: [htdig] indexing subdirectories
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Tue Feb 22 2000 - 07:55:13 PST


According to karin kosina:
> thank you so much - that was, indeed, the point that i had missed.
>
> > It won't
> > find documents that aren't directly or indirectly referenced by <a
> href=...>
> > tags in your start_url document(s). If you want to index all documents
> > on your site, whether linked or not, you'll need to produce a list of them
> > and use that as your start_url
> that list - does that have to be a list of <a href=...>filename</a> 's ?
> and if yes, how do i get that easily?

There are two ways of doing this, and only one of them requires hrefs as
above.

1) you could generate a file containing merely URLs (not hrefs), one per line,
for each of the documents you want indexed. E.g.:

  find /home/httpd/html -type f -name \*.html -print | \
        sed 's|/home/httpd/html|http://www.mydomain.org|' \
> /etc/htdig/urls_to_index

and then put this in your htdig.conf:

  start_url: `/etc/htdig/urls_to_index`

2) alternatively, you could generate a proper HTML document that contains
the href=... for each and every document you want indexed, and then use
the URL of that generated file as your start URL, e.g.:

  start_url: http://www.mydomain.org/data/linkstoindex.html
  limit_urls_to: http://www.mydomain.org/

In this second case, you need to override limit_urls_to, because it normally
will take the same value as start_url, but in my example that would be too
restrictive.

See http://www.htdig.org/attrs.html for a description of these and other
config file attributes, to get a better understanding of how they work.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Feb 22 2000 - 07:58:43 PST