RE: [htdig] Words and files not being found or indexed


Subject: RE: [htdig] Words and files not being found or indexed
From: Duncan Brannen (dbb@st-andrews.ac.uk)
Date: Fri Dec 15 2000 - 01:09:54 PST


At 18:46 14/12/2000 -0500, Geoff Hutchison wrote:
>You can list as many URLs as you want in the start_url attribute, or you
>can also include a file into the htdig.conf. e.g.:
>
>start_url: `/path/to/urls.txt`

I guess this would be the way to do it, excuse me if I'm stating the obvious.

Go to your root directory (For your web docs) eg /news/archive
ls -R > temp.file

you'l get eg

/news/archive:

file1 file2 file3

write a short script to parse temp.file

find a line that ends in :
strip the :

write to urls.txt

the line that ended in : (- the colon)/file1
the line that ended in : (- the colon) /file2
...
till you find another line that end in :

Actually i think there's a far easier way to do this
in perl but I can't think of it off the top of my head.

Maybe a Feature Request? - Ability to give a start directory
and index the files in the directory tree? (Or was that another ht:// product)

         Dunk

>--
>-Geoff Hutchison
>Williams Students Online
>http://wso.williams.edu/
>
>
>------------------------------------
>To unsubscribe from the htdig mailing list, send a message to
>htdig-unsubscribe@htdig.org
>You will receive a message to confirm this.
>List archives: <http://www.htdig.org/mail/menu.html>
>FAQ: <http://www.htdig.org/FAQ.html>

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Fri Dec 15 2000 - 01:20:01 PST