Re: htdig: Berkeley DB2 and 3.1.0b1


Frank Richter (Frank.Richter@hrz.tu-chemnitz.de)
Tue, 13 Oct 1998 08:43:15 +0200 (MEST)


Hi,

[...]
> it appears as if the data from previous records is "infecting" later
> records--i.e. the description from an early record seems to become the
> default.

I think this is the same bug I tried to explain in this mail:

 Date: Fri, 7 Aug 1998 19:26:02 +0200 (MET DST)
 From: Frank Richter <fri@hrz.tu-chemnitz.de>
 To: "ht://Dig mailing list" <htdig@sdsu.edu>
 Subject: Bug in handling wrong HTML

 I think there is a bug in htdig (3.0.8b2, Solaris 2.6). When it parses a
 document containing wrong HTML - in my example an unclosed comment - it
 stores the beginning content from the document parsed before. Of course,
 wrong HTML is a bad thing, but I think htdig should store no content (or a
 warning) instead of other content for this wrong page.

 Example:start_url: http://www.tu-chemnitz.de/~fri/htdigtest/t1.html
 It contains a link to .../t2.html with wrong HTML.

 The resulting db.docs is:
0 u:http://www.tu-chemnitz.de/~fri/htdigtest/t1.html t:Title 1 a:0
m:902509994 s:130 h: HEAD 1 Link to t2 some text l:902509999 L:1
I:130 d: A:

1 u:http://www.tu-chemnitz.de/~fri/htdigtest/t2.html t:Title 2 a:0
m:902509903 s:183 h: HEAD 1 Link to t2 some text l:902509999 L:0
I:183 d:Link to t2 A: ^^^^^^^^^^^^^^^^^^^^^^^ that's wrong!

 You see in the second entry (1) for t2.html the content of t1.html
 (h: HEAD 1 instead of HEAD 2).
 Does anyone has a fix or a suggestion where to look in the code?

- Frank

-- 
Email: Frank.Richter@hrz.tu-chemnitz.de  http://www.tu-chemnitz.de/~fri/
Work:  Computing Services, Technical University, 09107 Chemnitz, Germany

---------------------------------------------------------------------- To unsubscribe from the htdig mailing list, send a message to htdig-request@sdsu.edu containing the single word "unsubscribe" in the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:30 PST