[htdig] Re: Problem with exclude_url


Subject: [htdig] Re: Problem with exclude_url
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Wed Jan 17 2001 - 09:15:45 PST


According to Ulrich.Rebmann@dsv-gruppe.de:
> we have htdig 3.15.
> we wanted to index a big directory of the SAP-documentation
> the structure is as follows:
>
> directory1
> directory2
> directory3
> directory4
> content.html
> frameset.html
>
> directory1
> directory2
> directory3
> other_directory4
> content.html
> frameset.html
>
> and so on.......
>
> We want to exclude all (!) files named frameset.htm in all directories.
> when i made: exclude_url: frameset.htm - nothing happend
> I think, that you must take the qualified path - but there are so many different
> paths in this case.
>
> I nedd something like:
> exclude_url: /directory1/ directory2/directory3/*/frameset.htm (the asterix is
> important)
> Is this possible?

First of all, please see http://www.htdig.org/FAQ.html#q1.16
Such questions should go to the list, not to me personally. This isn't a
one-man show.

Secondly, could you elaborate on what you mean by "nothing happened"?
Do you mean that htdig didn't index anything, or that the frameset.htm
or frameset.html files were not excluded? Also, is the above
a typo, or did you really omit the "s" from exclude_urls? See
http://www.htdig.org/attrs.html for correct spellings of attribute names.

Thirdly, there is no wildcard support for exclude_urls. In version 3.2,
we're adding support for regular expressions to exclude_urls and other
attributes, which will be like wildcards only more powerful, but with
a somewhat more complicated syntax. This is still a work in progress,
however.

You shouldn't need wildcards for this case, though, because it's a
pretty simple exclusion you're trying to do here. However, if the only
links to some of your files, such as the content.html files, are in the
frameset.html, then you may not want to exclude them, or you'll end up
missing a whole lot more besides. This is why I asked what you mean by
"nothing happened". If not of the files were indexed, this may be why.
Remember that htdig only follows HTML links from one document to the
next.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Wed Jan 17 2001 - 09:30:23 PST