Re: [htdig3-dev] Segmentation fault in long run


Subject: Re: [htdig3-dev] Segmentation fault in long run
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Mon Feb 28 2000 - 12:27:43 PST


According to Geoff Hutchison:
> On Mon, 28 Feb 2000, Gilles Detillieux wrote:
> > > I'm a bit confused here. Why is it running ExternalParser for a file
> > > named .htm--shouldn't it be going through the HTML parser? What are
> > > you using for your external parser?
> >
> > Those functions are clearly not in ExternalParser.cc. The line number
> > reported is the very last line of source in that file, so it's probably
>
> I *know* the functions weren't in ExternalParser. My question was why
> ExternalParser was coming up for a .htm file at all! The functions are in
> the Berkeley code, but before we go pointing a finger, I want to know how
> it got to this situation. Is the ExternalParser somehow spitting up
> something the rest of the code doesn't like?

Please have a close look at the full backtrace again, as Valdas posted...

#0 0x8084294 in __bam_cmp () at ExternalParser.cc:440
#1 0x808b4c9 in __bam_search () at ExternalParser.cc:440
#2 0x80867b1 in __bam_c_search () at ExternalParser.cc:440
#3 0x8084d42 in __bam_c_get () at ExternalParser.cc:440
#4 0x8066459 in __db_put () at ExternalParser.cc:440
#5 0x80839f0 in Db::put () at ExternalParser.cc:440
#6 0x40099de4 in WordList::Put () at WordList.cc:762
#7 0x40050cff in HtWordList::Flush () at HtWordList.cc:149
#8 0x40050073 in DocumentRef::AddDescription () at DocumentRef.cc:250
#9 0x805c09e in Retriever::got_href (this=0xbffff614, url=@0x88ab5e8,
    description=0x8126f78 "Viršun", hops=1) at Retriever.cc:1352
#10 0x80510e3 in HTML::do_tag (this=0x8199da0, retriever=@0xbffff614,
tag=@0x8199e08) at HTML.cc:541
#11 0x8050342 in HTML::parse (this=0x8199da0, retriever=@0xbffff614,
baseURL=@0x8358338)
    at HTML.cc:321
#12 0x805875d in Retriever::RetrievedDocument (this=0xbffff614,
doc=@0x8122560, ref=0x8399090)
    at Retriever.cc:738
#13 0x8057d70 in Retriever::parse_url (this=0xbffff614,
urlRef=@0x83b7e90) at Retriever.cc:591
#14 0x8057168 in Retriever::Start (this=0xbffff614) at Retriever.cc:405
#15 0x8060e98 in main (ac=3, av=0xbffffaa4) at htdig.cc:289

There's not a single _function_ listed in that backtrace that's in
ExternalParser.cc, so the external parser wasn't even being called
at this point. The document was being parsed by HTML.cc's parse()
function, which called do_tag(), which called got_href, which called
AddDescription(), which called Flush(), and so on. The appearance of
the ExternalParser.cc source file name in the list is probably just an
artifact of incomplete debugging information for modules linked after
ExternalParser.o, most likely because only htdig/*.cc was compiled with
-g, while the db package was not. I'm a bit unclear on the Db::put ()
line above, unless it's mistranscribed by Valdas, and is really
DB2_db::Put (), but it would seem that htlib was compiled without -g,
while htword had it. I'd recommend a clean rebuild of the whole package
from scratch, using -g for everything, before investigating further.

It may also be informative to build with --disable-shared, as it may
be because of shared libraries that the debugging information seems
incomplete, not to mention the possibility of another strange bug related
to shared library support.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Feb 28 2000 - 12:32:08 PST