[htdig] HTDig can't dig www.dbmsmag.com

Subject: [htdig] HTDig can't dig www.dbmsmag.com
From: Manuel Lemos (mlemos@acm.org)
Date: Sat Jun 17 2000 - 20:24:29 PDT


I am trying to get a local searchable index of the site www.dbmsmag.com but
htdig stops right in the first page despite I am able to leech the whole
site with wget. I am using the following configuration with htdig 3.1.14.
Is there anything wrong or missing in it or could this be a bug in htdig?

start_url: http://www.dbmsmag.com/index.shtml
limit_urls: http://www.dbmsmag.com/
search_algorithm: exact:1 endings:0.5
exclude_urls: ?
valid_extensions: .html .htm .shtml
noindex_start: <html>
noindex_end: </html>
database_dir: /extra/dbms/db
bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi
max_head_length: 10000
max_doc_size: 200000
no_excerpt_show_top: true
valid_punctuation: : .-_/!#$%^&*+;
template_map: htdig htdig /extra/dbms/htdig_template.html
search_results_header: /extra/dbms/htdig_header.html
nothing_found_file: /extra/dbms/htdig_nomatch.html
syntax_error_file: /extra/dbms/htdig_syntaxerror.html

My dig, merge and fuzzy script outputs this:

2000-06-18 01:11:34 Starting htdig... (/usr/local/htdig/bin/htdig -v 3 -s -a -c /extra/dbms/conf/htdig.conf)
New server: www.dbmsmag.com, 80
0:0:0:http://www.dbmsmag.com/index.shtml: -------------------- size = 5040
htdig: Run complete
htdig: 1 server seen:
htdig: www.dbmsmag.com:80 1 document
2000-06-18 01:11:52 htdig done...
2000-06-18 01:11:52 Starting htmerge... (/usr/local/htdig/bin/htmerge -v -s -a -c /extra/dbms/conf/htdig.conf)
htmerge: Sorting...
htmerge: Merging...
htmerge: 100:orcommentsab
htmerge: Total word count: 149
htmerge: Total documents: 1
htmerge: Total doc db size (in K): 4
2000-06-18 01:11:52 htmerge done...
2000-06-18 01:11:52 Starting htfuzzy... (/usr/local/htdig/bin/htfuzzy -c /extra/dbms/conf/htdig.conf endings)
2000-06-18 01:12:12 htfuzzy done...
2000-06-18 01:12:12 Updating htdig database files
2000-06-18 01:12:12 Updated htdig database files

Manuel Lemos

PS: I did a traceroute to the www.htdig.org site and it seems that the
route was looping between contigo-gw.sndgca.pacific.verio.net
( a1-5-0-0-49.a02.sndgca02.us.ra.verio.net (,
so I suspect that this message will bounce unless the responsible carrier
fixes their routers or else you will be seeing this message, so never mind! :-)

Web Programming Components using PHP Classes.
Look at: mlemos@acm.org">http://phpclasses.UpperDesign.com/?user=mlemos@acm.org

E-mail: mlemos@acm.org
URL: http://www.mlemos.e-na.net/
PGP key: http://www.mlemos.e-na.net/ManuelLemos.pgp

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Sun Jun 18 2000 - 13:02:32 PDT