htdig: Can't index web server


James Slater (james@zignzag.demon.co.uk)
Fri, 10 Jul 1998 11:05:56 +0000


Hi all,

Right, still banging my head against the wall with this little problemo.
I've tried setting the start_url to http://web server>/index.htm and
commenting out the limit_urls_to flag & still no joy.
So, the only difference I can spot between the two logs files (first is
the one that doens't work, second log does) is:

1) Server: Apache/1.2.6 mod_perl/1.10

I can't see why this could cause a problem, it's only an additional
module for apache.

2) The server that currently can't be indexed is missing lines similar
to:

Header line: HTTP/1.1 200 OK
Header line: Date: Fri, 10 Jul 1998 08:47:11 GMT
Header line: Server: Apache/1.2.6

etc, from where it tries to retrieve index.htm. So, this seems to be
where the problem is occuring, but I can't squeeze out any more
debugging info from htdig. I'll shout a lager lager lager to anyone that
can point me in the right directions ;) I know there's nothing wrong
with the index.htm as I brought it down to my server & htdig indexed it
just fine.

Thanks for your time,

James.

# ./htdig -vvvvis

New server: <web server>, 80
Retrieval command for http://<web server>/robots.txt: GET /robots.txt
HTTP/0
User-Agent: htdig/3.0.8b2 (andrew@contigo.com)
Host: <web server>
Header line: HTTP/1.1 404 File Not Found
Header line: Date: Fri, 10 Jul 1998 09:26:01 GMT
Header line: Server: Apache/1.2.6 mod_perl/1.10
Header line: Connection: close
Header line: Content-Type: text/html
Header line:
returnStatus = 1
pick: <web server>:80, # servers = 1
0:0:0:http://<web server>/: Retrieval command for http://0
User-Agent: htdig/3.0.8b2 (andrew@contigo.com)
Host: <web server>

Header line: <html>
Header line: <head><title>Home Page</title></head>
Header line: <frameset rows="65,*" frameborder=yes border=1
framespacing=1>
Header line: <frame src="header.htm" name="headerframe" marginwidth=0
marginh>
Header line: <frame src="tech_support/index.html" name="bodyframe"
marginwidt>
Header line: </frameset>
Header line: </html>
Header line: <p><table border=1 cellspacing=0 cellpadding=3
width=100%><tr><td>>
Header line:
returnStatus = 1
 not found
pick: <web server>:80, # servers = 1
htdig: Run complete
htdig: 1 server seen:
htdig: <web server>:80 1 document

htdig: Errors to take note of:
Not found: http://<web server>/ Ref:

-----

And the server that works....

New server: <web server>, 80
Retrieval command for http://<web server>/robots.txt: GET /robots.tx
t HTTP/1.0
User-Agent: htdig/3.0.8b2 (andrew@contigo.com)
Host: <web server>

Header line: HTTP/1.1 404 File Not Found
Header line: Date: Fri, 10 Jul 1998 08:47:11 GMT
Header line: Server: Apache/1.2.6
Header line: Connection: close
Header line: Content-Type: text/html
Header line:
returnStatus = 1
pick: <web server>:80, # servers = 1
0:0:0:http://<web server>/: Retrieval command for http://<web server>/:
GET / HTTP/1.0
User-Agent: htdig/3.0.8b2 (andrew@contigo.com)
Host: <web server>

Header line: HTTP/1.1 200 OK
Header line: Date: Fri, 10 Jul 1998 08:47:11 GMT
Header line: Server: Apache/1.2.6
Header line: Last-Modified: Fri, 10 Jul 1998 08:46:50 GMT
Translated Fri, 10 Jul 1998 08:46:50 GMT to Fri, 10 Jul 1998 08:46:50
(98)
And converted to Fri, 10 Jul 1998 08:46:50
Header line: ETag: "34eec-bb-35a5d4fa"
Header line: Content-Length: 187
Header line: Accept-Ranges: bytes
Header line: Connection: close
Header line: Content-Type: text/html
Header line:
returnStatus = 0
Read 187 from document
Read a total of 187 bytes
Tag: HTML>, matched -1
Tag: HEAD>, matched -1
Tag: TITLE>, matched 0
<...snip...>
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:52 PST