RE: [htdig] Words and files not being found or indexed


Subject: RE: [htdig] Words and files not being found or indexed
From: Matthes, Fred (Fred.Matthes@compaq.com)
Date: Thu Dec 14 2000 - 11:08:53 PST


If I can try to help here. First time writing mail to this type of mailing
so please forgive me.

Jason, (I hope that I get these names correct), you are talking and thinking
directory structures. Starting at the directory where your home page is and
perusing files.

Dennis is talking web structures which I think is how htdig works. Imagine
a spider crawling around a web perusing the files (links) that it discovers.
It 'recurses' down a link it discovers until it does all possible links.
The only difference between this and the directory tree walk is that you
have to keep track of the links you've visited since many pages might have a
link back to your home page for example. If you blindly followed all links,
you'd just go round and round.

Now there are many webs out there, just as there are many directories on
your disks. You start somewhere in a directory tree by supplying a
directory to a program that walks through (recurses) the tree visiting all
the leaves below that directory.
 
You start a spider on a particular web by supplying a url. The spider
begins to walk a web branch (recurses) when it discovers a link on the
current page. When that branch is exhausted, it then continues until it
discovers another link (url). It does this until it has walked all links in
that particular web.

Now just because you have files in your directory that you use for your web
files does not necessarily mean that those files are on your web. Yes, you
can access these with a browser. But can you go to one page and by just
clicking on hyperlinks visit all of these files. As soon as you have to
type in a url, you have discovered a file that htdig will not find.

These files do not have to be listed on the home page. But they do need to
be accessible via links on your web site. They have to be in a url in one
of the pages that the spider began crawling through. I define web site by
all of those files connected to some page. Usually, I think most people
think of that page as the home page but it doesn't have to be. So those
files that you want htdig to find must exist somewhere on the web in a
link.+

I hope that this helps.

-----Original Message-----
From: crosstar [mailto:crosstar@nationalist.org]
Sent: Thursday, December 14, 2000 7:39 AM
To: htdig@htdig.org
Cc: Dennis Director
Subject: Re: [htdig] Words and files not being found or indexed

Thanks, Dennis:

Let me make this quick, then.

We have approximately 3,000 files on our site.

Are you saying that we must list all 3,000 by name, path and
directory on our starting page in order for them to be indexed?

If so, where and how on the "starting page?" Would this refer
to our "index.html" on our site (which is our default home page)?

Or, do you mean to list them all in the htdig.conf file?

If I misunderstand, kindly advise me (at your convenience).
Sorry it is unclear, at this point. But I've never seen another
search engine operate in this manner. Usually, this is all done
automatically.

Thanks, again.

 

t 12:44 PM 12/14/00 -0600, you wrote:
>I'm not sure Im making myself clear, and unfortunately, I am swamped
>with work. I can't spend any more time on this right now.
>
>You don't have to have files called index.html, you just have to have
>a path of references from the start page, to any other page that you
>wnat indexed.

-------------------------------------------------------------
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:crosstar@nationalist.org
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
2000 by The Nationalist Movement
-------------------------------------------------------------

END

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Thu Dec 14 2000 - 11:19:22 PST