[htdig] [3.2.0b2] AND operator not working as it should?

Subject: [htdig] [3.2.0b2] AND operator not working as it should?
From: Arthur Prokosch (prokosch@aptima.com)
Date: Wed Aug 02 2000 - 11:53:14 PDT

Hi, all. Thanks developers for working on such an ambitious project!

In testing htdig, 3.2.0b2, with just one html file, the AND operator is
working like OR, as far as I can tell. Whether I select "method=all" or
"method=boolean" with ands in the query string, a query like "web fluble"
incorrectly returns the document (which contains "web" but not "fluble"). I
compiled 3.1.5 to see if I was doing anything really stupid, but with the
same document and an essentially identical config file, 3.1.5 returns the
correct results. (However, I want to use phrase matching, so 3.1.5 isn't a
permanent solution for me.)

I've already changed permissions on the _weakcmpr database as before, and
simple searches work as expected ("web design" matches the document, "design
web" doesn't, "web" matches, "fluble" doesn't).

Has anyone bumped into this before? I checked thru the archives of this
list and the Changelog from April 12 to May 30, and didn't find anything
similar. My htdig.conf follows; the sample search page is at
<http://www.aptima.com/~cta/search-3.2.html> (although command line searches
return the same results); the one document indexed is index.html.

Also, I noticed that the attribute list in htdoc lists "version" (that an
attribute first appeared), while www.htdig.org doesn't. Is there a reason
for this?

Thanks for any help with this...

Arthur Prokosch, <prokosch@aptima.com>
Usability/Web Intern
Aptima, Inc. <http://www.aptima.com/>
781-935-3966 x26

-- begin htdig.conf (most comments stripped) -- start_url: http://www.aptima.com/~cta/

# use file access for all URLs indexed # local_urls: http://www.aptima.com/~cta/=/home/cta/public_html/

# don't fall back to HTTP, as www.aptima.com is unreachable from here # local_urls_only: true

limit_urls_to: ${start_url}

exclude_urls: /cgi-bin/ search.html

bad_extensions: .cgi .wav .gz .z .sit .au .zip .tar .hqx .exe .com \ .gif .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi

maintainer: htdig-robot-invalid-address@cta-resource.com

#max_head_length: 10000

max_doc_size: 200000

no_excerpt_show_top: true

#search_algorithm: exact:1 synonyms:0.5 endings:0.1 search_algorithm: exact:1

# disable backlink weighting (which is on by default?) # backlink_factor: 0

# we could use synonyms (misspellings, really) when we start enabling # text-box searches?

template_map: Long long ${common_dir}/long.html \ Short short ${common_dir}/short.html \ Custom custom ${common_dir}/custom.html template_name: custom

next_page_text: '[ Next &gt; ]' no_next_page_text: prev_page_text: '[ &lt; Prev ]' no_prev_page_text: page_number_text: 1 2 3 4 5 6 7 8 9 10 no_page_number_text: &gt;1&lt; &gt;2&lt; &gt;3&lt; &gt;4&lt; &gt;5&lt; \ &gt;6&lt; &gt;7&lt; &gt;8&lt; &gt;9&lt; &gt;10&lt;

# local variables: # mode: text # eval: (if (eq window-system 'x) (progn (setq font-lock-keywords (list '("^#.*" . font-lock-keyword-face) '("^[a-zA-Z][^ :]+" . font-lock-function-name-face) '("[+$]*:" . font-lock-comment-face) )) (font-lock-mode))) # end:

-- end htdig.conf ---

-- begin redirected output from rundig -vvvvvv -- ht://dig Start Time: Wed Aug 2 11:32:38 2000 1:0:http://www.aptima.com/~cta/ New server: www.aptima.com, 80 - Persistent connections: enabled - HEAD before GET: disabled - Timeout: 30 - Connection space: 0 - Max Documents: -1 - TCP retries: 1 - TCP wait time: 5 Trying to retrieve robots.txt file pushed pick: www.aptima.com, # servers = 1 > www.aptima.com supports HTTP persistent connections (infinite) 0:2:0:http://www.aptima.com/~cta/: Trying local files found existing file /home/cta/public_html/index.html Read 43 from document Read a total of 43 bytes Tag: blink, matched -1 word: hi.@1 word: this@2 word: bad@3 word: web@4 Tag: /blink, matched -1 word: design.@5 head: hi. this is bad web design. size = 43 pick: www.aptima.com, # servers = 1 > www.aptima.com supports HTTP persistent connections (infinite) ht://dig End Time: Wed Aug 2 11:32:38 2000 ID: 2 URL: http://www.aptima.com/~cta/ -- end redirect --

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Wed Aug 02 2000 - 01:51:34 PDT