RE: [htdig] Words and files not being found or indexed


Subject: RE: [htdig] Words and files not being found or indexed
From: Heriberto Cantu (elinux@elinux.com.mx)
Date: Fri Dec 15 2000 - 16:10:07 PST


Ok.

I supouse that your web server is run by Linux or Unix.
And that you login and get in Linux/Unix.
That be your account prompt.

$ _

And that you have access to write a file on the main directory of
your server.

Then you change to that directory with the command :
$cd /home/httpd/html

I supose that this is the main directory of your web server.

Now I going to create a html file that contain all files in all
directories down the main directory /home/httpd/html and I'm going
to name it all_links_of_my_web_site.html with the command.

$ find . -depth -print | awk '{ print "<a href="$1">"$1"</a>"}' - >
all_links_of_my_web_site.html

That command must be typed in one line and the symbol $ is your prompt.

Check the file size and adjust htdig.conf max_doc_size to have a
greater value so it be read til the end.

Now you have a file that points to every file in your site. Now tell
htdig to index this file in the start_url.

start_url: http://your.domain/all_links_of_my_web_site.html

Good Luck

----
Heriberto Cantu
http://www.elinux.com.mx
Monterrey, Mexico
Tel: (8)129-1121
Cel: 0448-256-8807

At 05:10 p.m. 15/12/00 +0000, you wrote: >Trying to understand the last message from Heriberto: > >Are you saying that you can create a file which contains all URLs for the site and >thereby aid in indexing the site? > >What does $ find . -depth -print mean? Is it supposed to be typed somewhere? >If so, where? > >When you say "complete path to file" do you actually mean "files" (plural)? > >Where would you enter or use it? In the server? Under a given sub-directory? >At the prompt on the browser? > >What does "awk" mean? > >What do you mean by "pipe the output"? > >What do you mean by "print the link"? > >What is the significance of "<a href=$1>$1</a>? > >What is someone supposed to do with this? Type it? Insert it? >If so, where? > >If you could be more specific, I'll try to follow. > >Thanks. > >At 04:49 PM 12/15/00 -0600, you wrote: >>Maybe a better idea is to use find to create such file. >>$ find . -depth -print >> >>And now you have the complete path to the file. >> >>You just need to pipe the output to awk and print the >>link "<a href=$1>$1</a> >> >>Good Luck >> >>At 09:09 a.m. 15/12/00 +0000, you wrote: >>>At 18:46 14/12/2000 -0500, Geoff Hutchison wrote: >>>>You can list as many URLs as you want in the start_url attribute, or you >>>>can also include a file into the htdig.conf. e.g.: >>>> >>>>start_url: `/path/to/urls.txt` >>> >>> >>>I guess this would be the way to do it, excuse me if I'm stating the obvious. >>> >>>Go to your root directory (For your web docs) eg /news/archive >>>ls -R > temp.file >>> >>>you'l get eg >>> >>>/news/archive: >>> >>>file1 file2 file3 >>> >>>write a short script to parse temp.file >>> >>>find a line that ends in : >>>strip the : >>> >>>write to urls.txt >>> >>>the line that ended in : (- the colon)/file1 >>>the line that ended in : (- the colon) /file2 >>>... >>>till you find another line that end in : >>> >>>Actually i think there's a far easier way to do this >>>in perl but I can't think of it off the top of my head. >>> >>>Maybe a Feature Request? - Ability to give a start directory >>>and index the files in the directory tree? (Or was that another ht:// >>product) >>> >>> Dunk >>> >>> >>> >>>>-- >>>>-Geoff Hutchison >>>>Williams Students Online >>>>http://wso.williams.edu/ >>>> >>>> >>>>------------------------------------ >>>>To unsubscribe from the htdig mailing list, send a message to >>>>htdig-unsubscribe@htdig.org >>>>You will receive a message to confirm this. >>>>List archives: <http://www.htdig.org/mail/menu.html> >>>>FAQ: <http://www.htdig.org/FAQ.html> >>> >>> >>>------------------------------------ >>>To unsubscribe from the htdig mailing list, send a message to >>>htdig-unsubscribe@htdig.org >>>You will receive a message to confirm this. >>>List archives: <http://www.htdig.org/mail/menu.html> >>>FAQ: <http://www.htdig.org/FAQ.html> >>> >>> >>---- >>Heriberto Cantu >>http://www.elinux.com.mx >>Monterrey, Mexico >>Tel: (8)129-1121 >>Cel: 0448-256-8807 >> >> >> >>------------------------------------ >>To unsubscribe from the htdig mailing list, send a message to >>htdig-unsubscribe@htdig.org >>You will receive a message to confirm this. >>List archives: <http://www.htdig.org/mail/menu.html> >>FAQ: <http://www.htdig.org/FAQ.html> > >------------------------------------------------------------- >The Nationalist Movement >PO Box 2000 >Learned MS 39154 >(601) 885-2288 >Clinic: http://www.nationalist.org/board/html/index.php >Crosstarlist: http://www.nationalist.org/docs/resources/list.html >E-mail: mailto:crosstar@nationalist.org >Forum: http://www.nationalist.org/forum/index.php >Home Page: http://www.nationalist.org >ICQ: 5429992 >Newsgroup: alt.national >Views not necessarily those of The Nationalist Movement > 2000 by The Nationalist Movement >------------------------------------------------------------- > >END > > >------------------------------------ >To unsubscribe from the htdig mailing list, send a message to >htdig-unsubscribe@htdig.org >You will receive a message to confirm this. >List archives: <http://www.htdig.org/mail/menu.html> >FAQ: <http://www.htdig.org/FAQ.html> > >

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Fri Dec 15 2000 - 16:19:17 PST