Re: [htdig] htdig program hangs on one particular URL


Dan Dexter (ddexter@lincom-asg.com)
Mon, 15 Mar 1999 15:27:11 -0600


At 02:16 PM 3/11/99 -0600, Gilles Detillieux wrote:
>According to Dan Dexter:
>> I created an configuration file that would index only that one URL
>> and the htdig program still freezes with no core dump. I used -vvvvvvv as
>> you suggested and got the following output with the cursor just siting on
>> a line by its self at the end of these verbose comments:
>>
>> Header line: HTTP/1.1 200 OK
>> Header line: Date: Thu, 11 Mar 1999 03:56:27 GMT
>> Header line: Server: Apache/1.3.0 (Unix)
>> Header line: Last-Modified: Tue, 06 Oct 1998 15:28:01 GMT
>> Translated Tue, 06 Oct 1998 15:28:01 GMT to Tue, 06 Oct 1998 15:28:01 (98)
>> And converted to Tue, 06 Oct 1998 15:28:01
>> Header line: ETag: "7068f-da0-361a3701"
>> Header line: Accept-Ranges: bytes
>> Header line: Content-Length: 3488
>> Header line: Connection: close
>> Header line: Content-Type: text/html
>> Header line:
>> returnStatus = 0
>> Read 3488 from document
>> Read a total of 3488 bytes
>> Tag: HTML>, matched -1
>> Tag: HEAD>, matched -1
>> Tag: META name=description content=NASA/JSC Inspection98 Johnson Space
>> Centerís
>> Student Development Programs>, matched 20
>>
>> It looks like it is hanging on the second META tag. As you have
noticed, the
>> META tags do not use quotes around the content field which I believe is
what
>> is causing the htdig program to hang.
>
>Well, if it gets past the first META tag, it should also output these lines:
>
>META Description: NASA/JSC
>meta description: NASA/JSC
>word: NASA/JSC@1
>
>and possibly the "Tag:" line for the next meta tag, which it outputs
>before it begins processing it. I have another hunch, which you might be
>able to confirm by getting a core dump when the program hangs, and doing
>a backtrace in the debugger. You can usually get a core dump by using
>Ctrl-\, or whatever character on your system generates a QUIT signal.
>
>I think it's hanging on the first meta tag, with name=description.
>
>My hunch is that on your system, the isalnum() function, called
>from Configuration::Add(), does not recognise the ISO-8859-1 i-acute
>character in Centerís as alphanumeric, but isalpha() does recognise
>it as alphabetic. This would cause Configuration::Add() to go into an
>infinite loop, and it's about the only thing I can find that would cause
>the behaviour you describe. (I know that in this document, the í is
>supposed to be an apostrophe, but that doesn't mean it'll be interpreted
>that way - it depends on the character set selected by htdig's locale.)
>
>If this is indeed what's happening, the isalnum(*str) in that function
>should be changed to isalpha(*str) || isdigit(*str) to ensure consistency.

Gilles,

I upgraded to htDig 3.1.1 and still had the problem described above.
My computer is a production machine and so it does not have gdb on
it. So I tried changing isalnum(*str) to isalpha(*str) || isdigit(*str) as
you
suggested and this fixed the problem of htdig hanging completely.

I noticed in the htdig program output that it said that the locale was
not recognized. Using isalpha(*str) || isdigit(*str) over isalnum(*str)
as you suggested worked when the locale was not defined correctly.
Since then, I have specified the correct locale in the configuration file
for my system and the isalpha(*str) || isdigit(*str) patch still works just
fine. I have not tested the htdig program when the locale was properly
defined when using the original code of isalnum(*str).

Thank you for hour help.
Dan
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Wed Mar 17 1999 - 10:05:12 PST