Re: [htdig] Search results


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Mon, 16 Aug 1999 14:29:00 -0500 (CDT)


According to peter karlsson:
> > Well, Peter, I don't know what to say. Neither Geoff nor I can
> > reproduce the error from here, so the problem must lie on your system.
>
> I just noticed something, because a web browser I tried (w3m) didn't show
> the pages correctly, either, that somehow the Squid proxy seems to remove
> the Content-Type header from some of the pages on the server:
[snip]
> This is strange, though, since the previous indexing was *not* done through
> a proxy. It might be a problem with the web server, though
> (phttpd/0.99.72.1). But when I try to connect directly, I do get a
> Content-Type header:

There are two strange things about this. First of all, as you point out,
the problem started before you started indexing through Squid. Secondly,
if htdig doesn't receive a Content-Type header, it shouldn't even attempt
to index the document at all.

> What headers are htdig sending to the server? It might be one of those that
> interfere with what headers phttpd sends back.

htdig sends these headers, in this order:

GET url-path HTTP/1.0
User-Agent: htdig/3.1.2 (maintainer)
Referer: url <- if referring document is known
If-Modified-Since: date <- if document previously indexed
Authorization: Basic username/password <- if given with -u option
Host: url-host <- unless allow_virtual_hosts disabled
<blank-line>

Each line ends with a CR/LF. When you run htdig -vvv, it shows the entire
retrieval command used, with all headers. They're all sent in a single
write operation.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig@htdig.org containing the single word unsubscribe in the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Mon Aug 16 1999 - 12:30:11 PDT