Re: [htdig] A few beginner questions

Gilles Detillieux (
Mon, 14 Jun 1999 09:52:35 -0500 (CDT)

Hi, just adding a few remarks to what Geoff already said...

According to Mitchell Marks:
> 1. Will the htdig program look on other ports? I have material on :9673
> (which some of you will recognize as Zope), and in htdig.conf I include a
> URL to it:
> start_url:
> I've also tried using the (unusual) default document name:
> start_url:
> Either way, nothing from that port shows up in the indexs, and when I save
> a URL list the URL with alternate port is mentioned only twice, which I
> think is from its mention in links on the standard-port root page.
> If it makes any difference, there aren't pre-made separate files there,
> but they're generated to look like HTML files.

Does the server at port 9673 emit "Content-type: text/html" headers? These
are neccessary for htdig to accept and parse the pages as HTML.

> 2. Part of our site is intentionally devoid of directory-default documents,
> and material under them is not being caught. Does htdig strictly only
> follow links found *in documents* by starting at the specified
> start_url? Or is there an option for it to accept the server's
> file-listing of a directory as documents it should also grab and continue
> from.

HTTP servers generate directory listings as HTML documents, so htdig doesn't
even know the difference. No options are needed on htdig's side - as far
as it's concerned, it's just receiving and parsing another HTML page.

> 5. (More on local_urls) I've got this in htdig.conf:
> local_urls:
> First off, does that look syntactically correct?

Yes. As long as /pub on your local filesystem is the DocumentRoot for your
HTTP server, and the files have ".html" or ".htm" suffixes, htdig should
access them locally. If it fails to find the files, or they have other
suffixes, it will fall back to the HTTP server. For directories that
don't have a default index document (e.g. index.html), htdig will also
go to the server for that index, but will follow the links therein just
as any other page, using the local filesystem if it can.

