htdig bug?


Webmaster (web@greencedars.com.lb)
Fri, 10 Apr 1998 18:50:42 +0200


Hi Andrew! Greetings from Lebanon.

I would like to thank you for contributing htdig to the world. We have
found it to be a great tool.

I am having a little bit of a difficulty, but first, the numbers:

   Red Hat Linux 4.2, kernel 2.0.37 (but rebuilt by me for certain devices)
   g++ version 2.7.2.1
   htdig 3.0.7

Well here's the problem: when I send htdig to dig out the following URL,
it dumps core: http://www.ezorder.com.lb/
For some reason it does not like this URL, and it dumps core at the same
spot consistently. Although the makefiles do compile -g, I can't get to
the symbol table. I would venture to guess that it is attempting to free
a null or uninitialized pointer...
I should mention that I made the binaries with no modification to the
source code, so I would suspect that you should be able to reproduce
the core dump locally...

[...]
11:11:2:http://www.ezorder.com.lb/joehaddad/joe.htm: ++*** size = 9264
12:12:2:http://www.ezorder.com.lb/linord: redirect
13:13:2:http://www.ezorder.com.lb/laba/: --- size = 4950
14:14:2:http://www.ezorder.com.lb/femmesduliban/: +++-- size = 1667
15:15:2:http://www.ezorder.com.lb/valueplus: redirect
16:16:2:http://www.ezorder.com.lb/eltek: redirect
17:17:2:http://www.ezorder.com.lb/hart: redirect
18:18:2:http://www.ezorder.com.lb/utc/: -------- size = 2870
19:19:2:http://www.ezorder.com.lb/watermaster/: Segmentation fault (core dumped)

[web@liban bin]$ gdb htdig
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
 There is absolutely no warranty for GDB; type "show warranty" for details.
 GDB 4.16 (i586-unknown-linux), Copyright 1996 Free Software Foundation, Inc...
 (gdb) core-file core
 Core was generated by `htdig -c ../conf/lebhost.conf -v -i'.
 Program terminated with signal 11, Segmentation fault.
 Reading symbols from /usr/lib/libstdc++.so.27.1.4...done.
 Reading symbols from /lib/libm.so.5.0.6...done.
 Reading symbols from /lib/libc.so.5.3.12...done.
 Reading symbols from /lib/ld-linux.so.1...done.
 #0 0x4007b459 in __libc_free ()
 (gdb) where
 #0 0x4007b459 in __libc_free ()
 #1 0x80ba87c in ?? ()
 #2 0x60 in ?? ()
 (gdb)

Any help with this is greatly appreciated! I will be glad to help in any
way possible.

At the bottom of this message is the conf file that I used, in case it
may be helpful.

Regards,
 
   ramzi

######################### begine conf file #####

# Specify where the database files need to go. Make sure that there is
# plenty of free disk space available for the databases. They can get
# pretty big.
#
database_dir: /usr/local/bin/htdig/db/lebhost

#
# This specifies the URL where the robot (htdig) will start. You can specify
# multiple URLs here. Just separate them by some whitespace.
# The example here will cause the ht://Dig homepage and related pages to be
# indexed.
#
#start_url: `/home/web/lib/htdig/lebhost/start_url`
start_url: http://www.ezorder.com.lb

# This attribute limits the scope of the indexing process. The default is to
# set it to the same as the start_url above. This way only pages that are on
# the sites specified in the start_url attribute will be indexed and it will
# reject any URLs that go outside of those sites.
#
# Keep in mind that the value for this attribute is just a list of string
# patterns. As long as URLs contain at least one of the patterns it will be
# seen as part of the scope of the index.
#
limit_urls_to: ${start_url}

#
# If there are particular pages that you definately do NOT want to index, you
# can use the exclude_urls attribute. The value is a list of string patterns.
# If a URL matches any of the patterns, it will NOT be indexed. This is
# useful to exclude things like virtual web trees or database accesses. By
# default, all CGI URLs will be excluded. (Note that the /cgi-bin/ convention
# may not work on your web server. Check the path prefix used on your web
# server.)
#
.exclude_urls: /cgi-bin/

#
# The excerpts that are displayed in long results rely on stored information
# in the index databases. The compiled default only stores 512 characters of
# text from each document (this excludes any HTML markup...) If you plan on
# using the excerpts you probably want to make this larger. The only concern
# here is that more disk space is going to be needed to store the additional
# information. Since disk space is cheap (! :-)) you might want to set this
# to a value so that a large percentage of the documents that you are going
# to be indexing are stored completely in the database. At SDSU we found
# that by setting this value to about 50k the index would get 97% of all
# documents completely and only 3% was cut off at 50k. You probably want to
# experiment with this value.
# Note that if you want to set this value low, you probably want to set the
# excerpt_show_top attribute to false so that the top excerpt_length characters
# of the document are always shown.
#
max_head_length: 512

#
# Depending on your needs, you might want to enable some of the fuzzy search
# algorithms. There are several to choose from and you can use them in any
# combination you feel comfortable with. Each algorithm will get a weight
# assigned to it so that in combinations of algorithms, certain algorithms get
# preference over others. Note that the weights only affect the ranking of
# the results, not the actual searching.
# The available algorithms are:
# exact
# endings
# synonyms
# soundex
# metaphone
# By default only the "exact" algorithm is used with weight 1.
# Note that if you are going to use any of the algorithms other than "exact",
# you need to use the htfuzzy program to generate the databases that each
# algorithm requires.
#
# search_algorithm: exact:1 synonyms:0.5 endings:0.1
search_algorithm: exact:1 synonyms:0.5 endings:0.3 metaphone:0.1

################ the rest of the conf file is same as in distribution ####



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:01 PST