Re: [htdig] accessing sites whose entry pages are not index.html


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Thu, 29 Apr 1999 16:17:48 -0500 (CDT)


According to Gabriel Fenteany:
> Hi. I am indexing a large number of different servers. Some of the URLs
> point to http://foo.com/ but apparently the index file is not index.html but
> index.htm or default.html. Will htdig dig a site right if http://foo.com/
> uses "index.htm" and not "index.html" If it would NOT dig the site with
> the the more standard index filename, what is the switch I'd use in the
> htdig.conf Point is, I don't want to have to check what the name of the
> entry page of all these kinds of sites are.
>
> I indexed a big list of sites, and most come up...but so far of the ones
> I've checked, only the ones that deviate from "index.html" are not showing
> up when the URL I have for them is http://foo.com/

As Torsten said, a lot of this depends on how the HTTP servers are set up.
In Apache, you can set the DirectoryIndex parameter in srm.conf, to
indicate which files are valid as directory indexes. E.g.:

DirectoryIndex index.html index.shtml index.cgi index.htm default.htm default.html

If you allow all these, then you may want to make the corresponding
change in your htdig.conf, to the remove_default_doc attribute:

remove_default_doc: index.html index.shtml index.cgi index.htm default.htm default.html

But if you're digging multiple sites, you only want to remove the names
that are allowed as DirectoryIndex on ALL of the sites you dig, i.e. the
intersection of all the sets. Otherwise you may end up stripping off
names that aren't really directory indexes on some of the sites.

Finally, if you use the local_urls attribute, you should set the
local_default_doc attribute to the one name that is most commonly used for
directory indexes on the local file system. For any local directories
that don't have this file, htdig will fall back to the HTTP server to
get the directory index.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu Apr 29 1999 - 14:26:44 PDT