Subject: [htdig] Can't exclude URLs with ? with htdig?
From: Manuel Lemos (mlemos@acm.org)
Date: Tue Feb 08 2000 - 20:54:01 PST
Hello,
I want htdig to exclude URLs that contain the ? question mark query
separator. I have the following configuration file but URLs like that
are still being indexed. I am using htdig 3.1.4 . Is this a bug?
I know I can exclude URLs like that in htsearch by setting the exclude
query string argument, but I also noticed that if I have it set to
"? /graphics/" the exclusing no longer works.
Anybody knows what is the problem?
The command line called by PHP like this:
REQUEST_METHOD=GET QUERY_STRING="words=forms&format=htdig&exclude=%3F+%2Fgraphics%2F&matchesperpage=10&method=or&page=1&sort=score" /usr/local/htdocs/htdig/cgi-bin/htsearch -c setup/htdig.conf
The configuration is this:
database_dir: /usr/local/htdig/db/test
start_url: http://local.test.org/test/
maintainer: info@local.test.org
search_algorithm: exact:1 synonyms:0.5 endings:0.1
exclude_urls: ?
limit_urls_to: http://local.test./test/
bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi
max_head_length: 10000
max_doc_size: 200000
no_excerpt_show_top: true
valid_punctuation: : .-_/!#$%^&*«»
template_map: htdig htdig library/htdig_template.html
search_results_header: library/htdig_header.html
search_results_footer:
nothing_found_file: library/htdig_nomatch.html
syntax_error_file: library/htdig_syntaxerror.html
Regards,
Manuel Lemos
Web Programming Components using PHP Classes.
Look at: mlemos@acm.org">http://phpclasses.UpperDesign.com/?user=mlemos@acm.org
-- E-mail: mlemos@acm.org URL: http://www.mlemos.e-na.net/ PGP key: http://www.mlemos.e-na.net/ManuelLemos.pgp -------------------------------------- To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Tue Feb 08 2000 - 21:03:50 PST