Re: htdig: Missing Files in Search Results

Jeff Hill (
Sun, 08 Nov 1998 12:41:39 -0500

Geoff Hutchison wrote:
> Sure. It won't index the new one unless there's a link to the new file as
> well. Unlike some spiders (notably SWISH), ht://Dig will only follow links
> from some source document, the start_url. So if there isn't a link, it
> can't find it.
> Well if the above explanation sounds reasonable, check to see if there are
> links to everything. If there *are* and it's skipping them... that's a
> bug.

I originally thought it didn't need the links: it doesn't need the
user-id and
password, so I assumed. However, I tried again, with half a dozen files
-- no
success. I did copy a previously indexed file from the directory where
found the problem to a higher directory, and it was indexed. The higher
has about 20 files; the original directory has 328. All the files are
quite small
(2K to 10K).

The permissions on files indexed and on files not indexed are identical,
that seems a dead end. I'm invoking rundig as root, so I would think
wouldn't have permission's problems anway.

I've made exact duplicates of files that htdig has indexed and ensured
are links to them. The new copy, in the same directory, is not indexed
by htdig
(or it just isn't showing up properly in the hit list). The old copy is

A new copy in the directory with only about 20 files is indexed, so the
might have something to do with a large number of files in one
directory, but
that isn't holding true in other directories where I have several
hundred files
and everything (seems to be) indexed.

Rebuilding the complete index each times takes about an hour and a half,
there's a limit to how many different tests I can try in a day. And I am
out of ideas.

Any suggestions appreciated.

Jeff Hill

********* HR On-Line: The Network for Workplace Issues ********
** Ph:416-604-7251 -- Fax:416-604-4708 ** **
To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the body of the message.

This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:46 PST