Re: [htdig] A few beginner questions


Geoff Hutchison (ghutchis@wso.williams.edu)
Mon, 14 Jun 1999 08:33:14 -0400 (EDT)


On Sun, 13 Jun 1999, Mitchell Marks wrote:

> 1. Will the htdig program look on other ports? I have material on :9673

Yes.

> Either way, nothing from that port shows up in the indexs, and when I save
> a URL list the URL with alternate port is mentioned only twice, which I
> think is from its mention in links on the standard-port root page.

Try running htdig with a command-line option of -vvvv and you'll see more
than you care to see. It *will* show you the HTTP headers and gory detail
about why it's doing what it's doing.

> 2. Part of our site is intentionally devoid of directory-default documents,
> and material under them is not being caught. Does htdig strictly only
> follow links found *in documents* by starting at the specified

Yes. No more, no less.

> Does this mean that if I change the start_url entry in htdig.conf, the
> change will not affect subsequent runs unless I erase the existing database
> fles?

No. It will index that document. But as many people have noted on the
list, if you run update digs, htdig will also index the URLs in the
databases.

> 4. Another htdig.conf point I don't quite get: does use of a local_urls
> entry only define the filesystem equivalents, or does it also tell htdig to
> dig there? That is, if the LHS is something that *would* go in start_url,

It defines the filesystem equivalence rules for HTTP URLs. It does *not*
specify anything with start_url.

> Any suggestions on how to help htdig find these through the filesystem and
> not have to switch to http so readily?

It switches to HTTP any time it cannot find the file through the
local_urls rules. This often includes default directory files (see
local_default_doc).

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Mon Jun 14 1999 - 04:48:11 PDT