RE: [htdig] Words and files not being found or indexed

Subject: RE: [htdig] Words and files not being found or indexed
From: crosstar (
Date: Thu Dec 14 2000 - 06:03:23 PST

This is a message for Gilles or anyone who is "senior" enough with the
program to answer.

I had written to Gilles, earlier, and he had said to post the questions here.

The last message I got from Fred indicated that htdig would not index and find
files unless they are listed in some upper-level file which includes
an <a href=".... link to them.

We have thousands of files with no such link to them.

We do have some links by way of a java applet directory; however, Fred
indicates that that will not work in this application.

Fred concludes that there is no way that htdig will work for us, therefore.

Before abandoning the program, altogether, I just wanted to ask one more
time if anyone might know of a way to make it work for us?

The facts are that we have thousands of files in hundreds of sub-directories.
Most are not "linked" in any way to higher up or other files.

I have tried listing the absolute URL path-statements to where these files are
located, as a test (listing these in the htdig.conf file). But this has not been

Any help or guidance appreciated.


Well, I'm just digging into htdig to solve my own problem. I seriously
doubt that any spider is going to discover a url by executing a java applet.
I think it has to in the form of <a href="... </a>

I think you are out of luck with using htdig in this fashion.


-----Original Message-----
From: crosstar []
Sent: Thursday, December 14, 2000 8:33 AM
Cc: Matthes, Fred
Subject: RE: [htdig] Words and files not being found or indexed

Hi, Fred:

Well, I tried adding URLs and path statements, and no luck.

Let me give a quick example.

I have a file called rogues.html

The URL is:


In this file is information about "rogues." The title, even, is
"Rogues Gallery."

However, when I do a keyword search with htdig, the file or text
does not come up. And this appears to be the same with most files
on the site.

Now, there is a file called archives.html

located at:


which lists rogues.html in a java applet (as well as all files in that
particular sub-directory):

Rogues' Gallery;right_frame;/news/archives/2000/rogues.html|

That is the only "link" to the file, however.

Does this help you?

Is there any solution... or do I need to abandon the project
(I would hate to after coming so far!).

I am embarrassed I have to ask all these questions, but
thanks for bearing with me.


Hi, thanks, Fred!

I'm fighting a splitting headache over this, so pardon if I am a bit dense.

OK, I think I see the THEORY you are speaking of. It's still a bit fuzzy in

my mind.

Nonetheless, the question now is: What is the practical SOLUTION (if any)?

In other words, what can I do to have my files indexed and found?

Meanwhile, I am attempting to enter more path and URL statements in
the htdig.conf file and running rundig, once again.

I'll then test to see if more files and text was picked up that way.

If you could kindly afford me a specific solution (such as "type this"
or "add this") that would be a big help. Or, possibly refer me to some
That is about all I can do, insofar as I am not very technical.

We do have links from an "index" of sorts in many sub-directories,
but it is contained in a java applet and I have doubts that it would be
picked up by the "spider" (although I am not sure and am testing now
to determine this).

Thanks, again. I really would like to use this thing!

If I can try to help here. First time writing mail to this type of mailing
so please forgive me.

Jason, (I hope that I get these names correct), you are talking and thinking
directory structures. Starting at the directory where your home page is and
perusing files.

Dennis is talking web structures which I think is how htdig works. Imagine
a spider crawling around a web perusing the files (links) that it discovers.
It 'recurses' down a link it discovers until it does all possible links.
The only difference between this and the directory tree walk is that you
have to keep track of the links you've visited since many pages might have a
link back to your home page for example. If you blindly followed all links,
you'd just go round and round.

Now there are many webs out there, just as there are many directories on
your disks. You start somewhere in a directory tree by supplying a
directory to a program that walks through (recurses) the tree visiting all
the leaves below that directory.
You start a spider on a particular web by supplying a url. The spider
begins to walk a web branch (recurses) when it discovers a link on the
current page. When that branch is exhausted, it then continues until it
discovers another link (url). It does this until it has walked all links in
that particular web.

Now just because you have files in your directory that you use for your web
files does not necessarily mean that those files are on your web. Yes, you
can access these with a browser. But can you go to one page and by just
clicking on hyperlinks visit all of these files. As soon as you have to
type in a url, you have discovered a file that htdig will not find.

These files do not have to be listed on the home page. But they do need to
be accessible via links on your web site. They have to be in a url in one
of the pages that the spider began crawling through. I define web site by
all of those files connected to some page. Usually, I think most people
think of that page as the home page but it doesn't have to be. So those
files that you want htdig to find must exist somewhere on the web in a

I hope that this helps.

-----Original Message-----
From: crosstar []
Sent: Thursday, December 14, 2000 7:39 AM
Cc: Dennis Director
Subject: Re: [htdig] Words and files not being found or indexed

Thanks, Dennis:

Let me make this quick, then.

We have approximately 3,000 files on our site.

Are you saying that we must list all 3,000 by name, path and
directory on our starting page in order for them to be indexed?

If so, where and how on the "starting page?" Would this refer
to our "index.html" on our site (which is our default home page)?

Or, do you mean to list them all in the htdig.conf file?

If I misunderstand, kindly advise me (at your convenience).
Sorry it is unclear, at this point. But I've never seen another
search engine operate in this manner. Usually, this is all done

Thanks, again.


t 12:44 PM 12/14/00 -0600, you wrote:
>I'm not sure Im making myself clear, and unfortunately, I am swamped
>with work. I can't spend any more time on this right now.
>You don't have to have files called index.html, you just have to have
>a path of references from the start page, to any other page that you
>wnat indexed.
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Home Page:
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
2000 by The Nationalist Movement


To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.
List archives: <>
FAQ: <>

This archive was generated by hypermail 2b28 : Thu Dec 14 2000 - 12:17:16 PST