Re: [htdig] htdig program hangs on one particular URL


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Thu, 11 Mar 1999 14:16:09 -0600 (CST)


According to Dan Dexter:
> I created an configuration file that would index only that one URL
> and the htdig program still freezes with no core dump. I used -vvvvvvv as
> you suggested and got the following output with the cursor just siting on
> a line by its self at the end of these verbose comments:
>
> Header line: HTTP/1.1 200 OK
> Header line: Date: Thu, 11 Mar 1999 03:56:27 GMT
> Header line: Server: Apache/1.3.0 (Unix)
> Header line: Last-Modified: Tue, 06 Oct 1998 15:28:01 GMT
> Translated Tue, 06 Oct 1998 15:28:01 GMT to Tue, 06 Oct 1998 15:28:01 (98)
> And converted to Tue, 06 Oct 1998 15:28:01
> Header line: ETag: "7068f-da0-361a3701"
> Header line: Accept-Ranges: bytes
> Header line: Content-Length: 3488
> Header line: Connection: close
> Header line: Content-Type: text/html
> Header line:
> returnStatus = 0
> Read 3488 from document
> Read a total of 3488 bytes
> Tag: HTML>, matched -1
> Tag: HEAD>, matched -1
> Tag: META name=description content=NASA/JSC Inspection98 Johnson Space
> Centerís
> Student Development Programs>, matched 20
>
> It looks like it is hanging on the second META tag. As you have noticed, the
> META tags do not use quotes around the content field which I believe is what
> is causing the htdig program to hang.

Well, if it gets past the first META tag, it should also output these lines:

META Description: NASA/JSC
meta description: NASA/JSC
word: NASA/JSC@1

and possibly the "Tag:" line for the next meta tag, which it outputs
before it begins processing it. I have another hunch, which you might be
able to confirm by getting a core dump when the program hangs, and doing
a backtrace in the debugger. You can usually get a core dump by using
Ctrl-\, or whatever character on your system generates a QUIT signal.

I think it's hanging on the first meta tag, with name=description.

My hunch is that on your system, the isalnum() function, called
from Configuration::Add(), does not recognise the ISO-8859-1 i-acute
character in Centerís as alphanumeric, but isalpha() does recognise
it as alphabetic. This would cause Configuration::Add() to go into an
infinite loop, and it's about the only thing I can find that would cause
the behaviour you describe. (I know that in this document, the í is
supposed to be an apostrophe, but that doesn't mean it'll be interpreted
that way - it depends on the character set selected by htdig's locale.)

If this is indeed what's happening, the isalnum(*str) in that function
should be changed to isalpha(*str) || isdigit(*str) to ensure consistency.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Mon Mar 15 1999 - 08:57:46 PST