Subject: Re: [htdig] indexing subdirectories
From: Gilles Detillieux (firstname.lastname@example.org)
Date: Tue Feb 22 2000 - 07:55:13 PST
According to karin kosina:
> thank you so much - that was, indeed, the point that i had missed.
> > It won't
> > find documents that aren't directly or indirectly referenced by <a
> > tags in your start_url document(s). If you want to index all documents
> > on your site, whether linked or not, you'll need to produce a list of them
> > and use that as your start_url
> that list - does that have to be a list of <a href=...>filename</a> 's ?
> and if yes, how do i get that easily?
There are two ways of doing this, and only one of them requires hrefs as
1) you could generate a file containing merely URLs (not hrefs), one per line,
for each of the documents you want indexed. E.g.:
find /home/httpd/html -type f -name \*.html -print | \
sed 's|/home/httpd/html|http://www.mydomain.org|' \
and then put this in your htdig.conf:
2) alternatively, you could generate a proper HTML document that contains
the href=... for each and every document you want indexed, and then use
the URL of that generated file as your start URL, e.g.:
In this second case, you need to override limit_urls_to, because it normally
will take the same value as start_url, but in my example that would be too
See http://www.htdig.org/attrs.html for a description of these and other
config file attributes, to get a better understanding of how they work.
-- Gilles R. Detillieux E-mail: <email@example.com> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------ To unsubscribe from the htdig mailing list, send a message to firstname.lastname@example.org You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Tue Feb 22 2000 - 07:58:43 PST