Re: [htdig] htdig program hangs on one particular URL


Dan Dexter (ddexter@lincom-asg.com)
Thu, 11 Mar 1999 09:37:31 -0600


At 12:00 PM 3/10/99 -0600, Gilles Detillieux wrote:
>According to Dan Dexter:
>> I'm running htDig 3.1.0b4 on a Digital UNIX 4.0D system.
>>
>> The htdig program hangs when it tries to index the document
>> http://inspection.jsc.nasa.gov/I98Exhibit/421.html
>>
>> I think it might be caused by the META tags in this document. My solution
>> to htdig hanging on this document is simply to exclude it in the htdig
>> configuration file.
>>
>> I will be upgrading to htDig 3.1.1 soon, but I would like to know if anyone
>> with htDig 3.1.1 can successfully index this particular document.
>>
>> If v3.1.1 can not index this document, then htDig might need to be
updated to
>> make it more robust to the broken HTML in this document.
>
>I've tried both htdig 3.1.1 and htdig 3.1.0b4 (Red Hat Linux) on the
>URL above, and neither of them hangs! The META tags are strange in
>that document. Because the content= strings for the meta description
>and meta keywords tags aren't quoted, htdig doesn't grab the whole
>thing (only up to the first space), but that doesn't cause it to hang.
>
>Can you run htdig to index only that document on your system, and if so,
>does it still hang? (Use a temporary config file that sets start_url &
>limit_urls_to to just the one URL, and give htdig the config file name
>with the -c option. Add in many -v's for good measure.)
>
>If it still hangs, use -vvvvvvv to see how far it gets before hanging.
>If you can get a stack backtrace (by running htdig from the debugger,
>or trigerring a core dump when it hangs), that may be useful too.

I created an configuration file that would index only that one URL
and the htdig program still freezes with no core dump. I used -vvvvvvv as
you suggested and got the following output with the cursor just siting on
a line by its self at the end of these verbose comments:

Header line: HTTP/1.1 200 OK
Header line: Date: Thu, 11 Mar 1999 03:56:27 GMT
Header line: Server: Apache/1.3.0 (Unix)
Header line: Last-Modified: Tue, 06 Oct 1998 15:28:01 GMT
Translated Tue, 06 Oct 1998 15:28:01 GMT to Tue, 06 Oct 1998 15:28:01 (98)
And converted to Tue, 06 Oct 1998 15:28:01
Header line: ETag: "7068f-da0-361a3701"
Header line: Accept-Ranges: bytes
Header line: Content-Length: 3488
Header line: Connection: close
Header line: Content-Type: text/html
Header line:
returnStatus = 0
Read 3488 from document
Read a total of 3488 bytes
Tag: HTML>, matched -1
Tag: HEAD>, matched -1
Tag: META name=description content=NASA/JSC Inspection98 Johnson Space
Centerís
Student Development Programs>, matched 20

It looks like it is hanging on the second META tag. As you have noticed, the
META tags do not use quotes around the content field which I believe is what
is causing the htdig program to hang.

Later,
Dan

voice: 281.461.2109 fax: 281.488.0191
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Mon Mar 15 1999 - 08:57:46 PST