[htdig] htdig doesn't index the whole site


Kai Krebber (Kai.Krebber@syseca.de)
Thu, 30 Sep 1999 12:47:58 +0200


Howdy!

After successfully installing htdig, and indexing one single
intranetserver (that is not the docserver), I'm currently encountering
the following problem:
Symptom: htsearch doesn't find words that are definitly in the pages and
definitly not in the bad-word-list.
Fllow-up-symptom: the files in the db - directory are surprisingly
small.
What I found out so far:
Running htdig in verbose mode shows me a good start, but after a while,
it stops pushing resolved URLs, although these pages are not yet
indexed:

1:0:http://intra1/
New server: intra1, 80
Retrieval command for http://intra1/robots.txt: GET /robots.txt HTTP/1.0

User-Agent: htdig/3.1.2 (unconfigured@htdig.searchengine.maintainer)
Host: intra1

Header line: HTTP/1.1 404 Not found - file doesn't exist or is read
protected [even tried multi]
Header line: Server: Lotus-Domino/Release
:
returnStatus = 1
 pushed
pick: intra1, # servers = 1
0:0:0:http://intra1/: Retrieval command for http://intra1/: GET /
HTTP/1.0
:
href: http://intra1/header_homepage.html ()
resolving 'http://intra1/header_homepage.html'

   pushing http://intra1/header_homepage.html
+Tag: FRAMESET BORDER=0 FRAMESPACING=0 FRAMEBORDER=0 COLS="136,*">,
matched -1
Tag: FRAME NAME="left" SRC="./left_homepage.html" SCROLLING="auto"
MARGINWIDTH="2" MARGINHEIGHT="1" FRAMEBORDER="no" BORDER="0" NORESIZE>,
matched 21
href: http://intra1/left_homepage.html ()
resolving 'http://intra1/left_homepage.html'

   pushing http://intra1/left_homepage.html
:
href: http://intra1/cf/header_index.html ()
resolving 'http://intra1/cf/header_index.html'
Tag: FRAMESET BORDER=0 FRAMESPACING=0 FRAMEBORDER=0 COLS="136,*">,
matched -1
Tag: FRAME NAME="left" SRC="./left_index.html" SCROLLING="auto"
MARGINWIDTH="2" MARGINHEIGHT="1" FRAMEBORDER="no" BORDER="0" NORESIZE>,
matched 21
href: http://intra1/cf/left_index.html ()
resolving 'http://intra1/cf/left_index.html'
Tag: FRAME NAME="body" SRC="./body_index.html" SCROLLING="auto"
MARGINWIDTH="2" MARGINHEIGHT="1">, matched 21
href: http://intra1/cf/body_index.html ()
resolving 'http://intra1/cf/body_index.html'
Tag: /FRAMESET>, matched -1

No wonder, I can't find the searched words: the corresponding pages
don't get indexed. Why?
Does anybody know a place on the net, where to find Information about
reading the verbose output / debugging problems with it?
ad<thanx>vance,
    Kai

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word unsubscribe in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu Sep 30 1999 - 03:52:27 PDT