Gilles Detillieux (firstname.lastname@example.org)
Tue, 6 Jul 1999 11:26:58 -0500 (CDT)
According to S. Hayles:
> This is with ht://Dig 3.1.2 under IRIX 6.5
> I used a script to build a list of all files on our server that were not
> externally accessible - so they could be excluded from the externally
> accessible index. It ended up ~250Kb with ~5000 entries. I wasn't too
> surprised that it didn't seem to work.
> On a quick examination, the only limit in this area I could find was the
> buffer length in Configuration::Read - but increasing this didn't seem to
> help. I tried using robots.txt to restrict indexing, and once max_doc_size
> was adjusted this worked fine - but it seems an unwieldy solution.
> Since they both appear to use StringMatch for comparisons I would have
> expected exclude_urls to work if robots.txt works. Has anyone else had
> problems with exclude_urls?
Any list of about 250 Kb is going to be somewhat unwieldy, but I can't
see anything in the code that would prevent it. The buffer length
in Configuration::Read only limits the length of individual lines
in your config file to that length. If you continue your lines onto
separate lines with backslashes, the total length should be unlimited.
Similarly, if in your config file you use the `file` mechanism to set
the exclude_urls attribute from the contents of another file, then the
buffer size in ParsedString::getFileContents will limit the length of any
individual line in that file to 1000 bytes (longer lines get "folded",
i.e. the string will be split in two). Other than that, I think the
string length is limited only by available virtual memory. Using the
numbers you gave above, it would seem the average URL length in your
exclude_urls is about 50 bytes, so it seems unlikely that the problem
would be URLs over 1000 bytes getting split in two.
When you say it didn't seem to work, do you mean it wasn't excluding
what you wanted it to, it was excluding stuff you wanted included, or
was it failing in some other way? How were you setting exclude_urls?
-- Gilles R. Detillieux E-mail: <email@example.com> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to firstname.lastname@example.org containing the single word "unsubscribe" in the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Tue Jul 06 1999 - 08:43:32 PDT