[htdig] Unable to contact server and non-standard extensions


Subject: [htdig] Unable to contact server and non-standard extensions
From: Larry Moss (moss@balloonhq.com)
Date: Sun Jan 16 2000 - 21:14:03 PST


I'm a fairly new user of ht://dig, but I like what I've seen of the package
so far. Before I ask my questions, I'd like to thank everyone that's
contributed to its development.

I'm using htdig 3.1.4. At the current time, I'm indexing only a local web
server. I expect that will change in the future, but for now remote
servers are one complication I don't have to worry about. I have
local_urls set so that most of the site is indexed via the local file
system. I can't set local_urls_only because a few URLs require processing
by the server.

I have about 40,000 indexable (non-binary) files on the server to be
indexed. Of those about 13,000 have non-standard file extensions. That's
for historical reasons. Those files are no longer being generated with the
odd names so this problem won't happen with future files. They're of the
form fileprefix.[1-9]*. In other words, everything ends with a number.
htdig sees this as a strange extension and goes to the server for the
content type. Those files contain just plain text. I'd prefer to have
them indexed locally, but I could live with the alternative of requesting
them from the server if it worked properly. What's happening is that about
9,000 of those file requests result in "Unable to contact server" messages.

The first thought I had was that the server was busy and that requests were
timing out. I set the timeout on htdig to 300 seconds, thinking that if it
was a timeout problem, that would solve it. I raised the number of
allowable server processes can be running on the machine in case I was just
swamping the machine with requests, but the number of server processes
running never hits the maximum I set.

My preferred way of solving this would be to make htdig index them all
locally. The documentation states that only a fixed number of file
extensions can be recognized by the server and there's no way around that.
Second to that, I'd like to solve the failure to connect problem. Now,
interestingly, if I run htdig with -vvv to try to get as much debugging
information as I can to find out what's going on, I never get any "Unable
to contact" messages. So, I could just always run htdig with debugging on
and send the information to /dev/null, but that doesn't seem like the way
to fix a problem.

The last option I see is renaming all the files and references to the files
that I have a problem with. That's certainly doable since there are no new
files being created in that form. If no one else can give me another
option, this is probably what I'll do. But rather than start messing
around with my data, I thought I'd see if I was missing something. And,
even if I can avoid the network problem now, it could come up in the future
as I start indexing other servers so I'd rather know what's going on.

I appreciate any suggestions anyone can offer. I hope I haven't just
missed something obvious in the documentation.

Thanks in advance,
Larry Moss
BalloonHQ.com

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Sun Jan 16 2000 - 21:11:17 PST