Re: [htdig] Words and files not being found or indexed

Subject: Re: [htdig] Words and files not being found or indexed
From: crosstar (
Date: Thu Dec 14 2000 - 02:56:42 PST

Hi, one additional point regarding "index" files, if I understand your point

Are you saying that I must have files called "index.html" in
order for the program to index the files under a given subdirectory?

Here is a brief explanation.

In a sub-directory, at present, I have a name of the sub-directory,
itself, which serves as an index.

For instance, if the sub-directory is named "dogs" I have a file
in it called "dogs.html" which then lists the various files in
that subdirectory, with links to them (such as "fleas.html,"
"doghouse.html," etc.).

Now, are you saying that I must change "dogs.html" to
"index.html" in order for htdig to index the site completely?
Or, am I misreading what you said?

This could be done, if necessary, I suppose, but remember, we have
thousands of files and hundreds of sub-directories. The task would
be tantamount to a Florida recount!

Please advise. Thanks.

OK, thanks, Dennis. This explanation is a lot clearer.

I was under the impression that the program simply
indexed the entire site, minus any files or sub-directories
which were specifically excluded.

At least, that appeared to be the implication.

OK, well, I'm glad to know that I was wrong on this.

So, the question now is, how to make it operate to index and
find ALL files? (All of my files are html files, by the way).

Per your example, I do not have many links to other pages on
the main page. In fact, most pages (there are thousands of
them) on my site do not have any links on them at all.

So, per your explanation, these files would not be indexed.
It appears that many are indexed, however, at present,
somehow, which throws me off, a bit.


My main (start-up) page is contained in an "index.html"
page, but that is the only one which has an "index.html" file.
The others just have names like you suggested, cat.html, dog.html, etc.
And, they are all under many, many subdirectories.

So... as to solution...

Are you saying that I should expressly list ALL of my subdirectories in
htdig.conf at:


I want to be sure on this because I have HUNDREDS of subdirectories.

If the answer is "yes," then I will proceedto list the entire directory and
sub-directory structure, there...


One more point.

Can I get by by listing the MAIN sub-directories, or do I need to list
all sub- sub -sub directories (some of them going quite deep)?

Hope I'm getting close to a solution.

Thanks for the tips and patience.

At 10:34 AM 12/14/00 -0600, you wrote:
>On Thu, 14 Dec 2000, crosstar wrote:
>> Thanks for taking the time to reply. :)
>> I might sound a bit dense (sorry), but bear with me, as I hope
>> to get this thing operable.
>> What do you mean by:
>> "Path of links on pages?"
>> "Starting at the start page?"
>> I am trying to understand your analysis, but could you,
>> perhaps, simply tell me what exactly to do (such as,
>> "type this," "cut and paste" that, or something practical
>> (rather than just theoretical).
>Well, I can't give you cut and paste steps, but let me try again ..
>If your start page is and on that page
>is a link to say, then those 2
>pages get indexed.
>If the start page has a link to
>, and the pets subdirectory has an
>index.html page which includes links to dogs.html and cats.html, then
>you will index and
>BUT ! if in the pets subdirectory is a page called fish.html and there is
>no link to fish.html in the pets/index.html or the
>, then fish.html will not be indexed, because
>htdig never saw it.
>In other words, htdig gets the start page, looks inside for links, get
>those pages, looks inside for links, get those pages, etc.
>Just because a page is accessable by your server is not enough.
>That's the best I can do..

