Re: [htdig] htdig - returning directory listings


Subject: Re: [htdig] htdig - returning directory listings
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Thu Jul 20 2000 - 11:59:42 PDT


According to alan:
> whenever i search for some word that exists in a filename htsearch
> returns the proper results but also returns urls to just directories
> that contain that file... this allows users to view the directory but i
> dont want them to... i know u can disable directory listing on the
> server hosting the site but then htdig isnt able to index the specified
> directory (indicated in the htdig.conf) anymore since it is not allowed
> the view the specified directory... so is there a way to prevent htdig
> from indexing directory listings or preventing htsearch form displaying
> directory listings??

No, not that I know of. This has come up several times before, but
there's no easy fix. I think the solution that some users go with is
to generate a list of all the URLs they want to index, and feed this
to htdig's start_url attribute, and use a max_hop_count of 0 to prevent
indexing anything else. That way, they have more control over what is
indexed and what isn't.

> here is an example of one of the results i DO NOT want shown:
>
> Index of /../../..
> ... - [TXT] xxxxxxxx.html 20-Jul-2000 11:29 13k [TXT]
> xxxxxxxx.html 20-Jul-2000 11:30 16k [TXT]
> xxxxxxxx.html 20-Jul-2000 11:30 18k [TXT]
> xxxxxxxx.html 20-Jul-2000 11:31 9k [TXT] xxxxxxxx.html
> 20-Jul-2000 11:31 8k [TXT] xxxxxxxx.html
> 20-Jul-2000 11:31 3k [TXT] xxxxxxxx ...
> http://www.....com/../../../?D=A
> , 2512 bytes
>
> this has a link to http://www.....com/../../../?D=A which basically goes
> to the directory and i dont want it to display any of these kinds of
> results... do u think there is a way to disable this?... and any idea on
> what the ?D=A means?

The ?D=A is something Apache adds to directory listings it generates, so
you can get the directory sorted in different ways. To supress these,
you can add "?D=A ?D=D ?M=A ?M=D ?N=A ?N=D ?S=A ?S=D" to exclude_urls,
or perhaps even just "/?", but you'd still get the original directory
listing without sort options.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Thu Jul 20 2000 - 02:00:17 PDT