Re: [htdig] indexing subdirectories

Subject: Re: [htdig] indexing subdirectories
From: Gilles Detillieux (
Date: Tue Feb 22 2000 - 07:55:13 PST

According to karin kosina:
> thank you so much - that was, indeed, the point that i had missed.
> > It won't
> > find documents that aren't directly or indirectly referenced by <a
> href=...>
> > tags in your start_url document(s). If you want to index all documents
> > on your site, whether linked or not, you'll need to produce a list of them
> > and use that as your start_url
> that list - does that have to be a list of <a href=...>filename</a> 's ?
> and if yes, how do i get that easily?

There are two ways of doing this, and only one of them requires hrefs as

1) you could generate a file containing merely URLs (not hrefs), one per line,
for each of the documents you want indexed. E.g.:

  find /home/httpd/html -type f -name \*.html -print | \
        sed 's|/home/httpd/html||' \
> /etc/htdig/urls_to_index

and then put this in your htdig.conf:

  start_url: `/etc/htdig/urls_to_index`

2) alternatively, you could generate a proper HTML document that contains
the href=... for each and every document you want indexed, and then use
the URL of that generated file as your start URL, e.g.:


In this second case, you need to override limit_urls_to, because it normally
will take the same value as start_url, but in my example that would be too

See for a description of these and other
config file attributes, to get a better understanding of how they work.

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW:
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Tue Feb 22 2000 - 07:58:43 PST