Re: [htdig] Searching for "All" versus "Any"]

Subject: Re: [htdig] Searching for "All" versus "Any"]
Date: Wed Apr 05 2000 - 11:03:44 PDT

On Wed, Apr 05, 2000 at 12:43:44PM -0500, Gilles Detillieux wrote:
> According to
> > 'Littérature' returns 54 results, none of which is the page entitled 'Littérature francophone virtuelle' BUT almost all of which contain the target string...
> A few possibilities to look into:
> 1) the page entitled 'Littérature francophone virtuelle' contains a slightly
> different spelling of 'Littérature' than your search string. Check the
> HTML source for the page carefully, to make sure there isn't some difference
> in accents or spelling.

Double-checked this. Search string and version in the page is the same. In fact, copied and pasted directly from my browser window into the search page.
> 2) the SGML entity for the 'é' in the title isn't being converted correctly.
> There were problems with numeric entities in many 3.2 snapshots and the last
> beta.

hmmmm... the only problem with this line of thought is that if é isn't being properly converted, it wouldn't be converted across the entire website, so we'd never see the search string in htdig's results... Also, I'm running 3.1.5, not any of the 3.2 snapshots.

> 3) that page was indexed before you had the locale configured correctly,
> and never reindexed, so the accented letter was lost. Try touching the
> page's source file and reindexing it, or reindexing from scratch.

Actually, I didn't index this particular site until after reconfiguring its locale. I reindexed the site (just to be on the safe side) using first htdig -i -c /path/to/config and then htmerge -c /path/to/config. The results of an "ALL" search for 'Littérature francophone virtuelle' remain the same - 54 results, without the target page entitled 'Littérature francophone virtuelle'.

To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Wed Apr 05 2000 - 10:02:27 PDT