Re: [htdig3-dev] how big a list can exclude_urls manage?


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Wed, 7 Jul 1999 10:25:11 -0500 (CDT)


According to S. Hayles:
> > How were you setting exclude_urls?
>
> I tried a variety of approaches. Initially I created a file starting:
>
> exclude_urls: /cgi-bin/ .cgi \
> /ad/gem/gem.html \
> /adultedu/gem/gem.html \
> /ad/ars1/ \
> /adultedu/ars1/ \
> /ad/info/ \
> /adultedu/info/ \
> /ad/rs50/rs50.html \
> /adultedu/rs50/rs50.html \
> /ad/rs50/index.html \
> /adultedu/rs50/index.html \
> /ad/jrs12/jrs12.html \
> /adultedu/jrs12/jrs12.html \
> /ad/adflag \
> /adultedu/adflag \
> /ad/test1.html~ \
> /adultedu/test1.html~ \
> /ad/test.html \
> /adultedu/test.html \
>
> and used
>
> include: file
>
> I also tried embedding the data in the config file, removing the back
> slashes and putting everyting on one line, and including the file list
> using
>
> exclude_urls: `file`
>
> I never saw it reject any URL after the first 9, but in most cases it
> didn't seem to match anything beyond the first 2.
>
> If you can see no reason why it shouldn't work, I'll check everything
> and give it one more go.

If you're using the

   exclude_urls: `file`

approach, then all the URLs in the file should NOT be all on one line.
Each line gets folded every 1000 characters, so you want lines to remain
shorter than that. You should have one URL per line, and leave it to
the getFileContents method to rejoin the lines into one string.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Wed Jul 07 1999 - 07:41:51 PDT