[htdig] Can't exclude URLs with ? with htdig?


Subject: [htdig] Can't exclude URLs with ? with htdig?
From: Manuel Lemos (mlemos@acm.org)
Date: Tue Feb 08 2000 - 20:54:01 PST


Hello,

I want htdig to exclude URLs that contain the ? question mark query
separator. I have the following configuration file but URLs like that
are still being indexed. I am using htdig 3.1.4 . Is this a bug?

I know I can exclude URLs like that in htsearch by setting the exclude
query string argument, but I also noticed that if I have it set to
"? /graphics/" the exclusing no longer works.

Anybody knows what is the problem?

The command line called by PHP like this:

REQUEST_METHOD=GET QUERY_STRING="words=forms&format=htdig&exclude=%3F+%2Fgraphics%2F&matchesperpage=10&method=or&page=1&sort=score" /usr/local/htdocs/htdig/cgi-bin/htsearch -c setup/htdig.conf

The configuration is this:

database_dir: /usr/local/htdig/db/test
start_url: http://local.test.org/test/
maintainer: info@local.test.org
search_algorithm: exact:1 synonyms:0.5 endings:0.1
exclude_urls: ?
limit_urls_to: http://local.test./test/
bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi
max_head_length: 10000
max_doc_size: 200000
no_excerpt_show_top: true
valid_punctuation: : .-_/!#$%^&*
template_map: htdig htdig library/htdig_template.html
search_results_header: library/htdig_header.html
search_results_footer:
nothing_found_file: library/htdig_nomatch.html
syntax_error_file: library/htdig_syntaxerror.html

Regards,
Manuel Lemos

Web Programming Components using PHP Classes.
Look at: mlemos@acm.org">http://phpclasses.UpperDesign.com/?user=mlemos@acm.org

--
E-mail: mlemos@acm.org
URL: http://www.mlemos.e-na.net/
PGP key: http://www.mlemos.e-na.net/ManuelLemos.pgp
--

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Feb 08 2000 - 21:03:50 PST