Re: [htdig] Searching for "All" versus "Any"]


Subject: Re: [htdig] Searching for "All" versus "Any"]
From: ccouple1@swarthmore.edu
Date: Wed Apr 05 2000 - 11:41:57 PDT


On Wed, Apr 05, 2000 at 01:25:55PM -0500, Gilles Detillieux wrote:

> "htdig -ivvvvc newconfig.conf" to see what htdig is doing when in parses
> the title of this page. Take a look at the resulting db.wordlist as well,
> to see if "littérature" (or some mangled form of it) is getting into the
> database.

okay, here goes....

from log of htdig session:

Tag: HTML>, matched -1
Tag: HEAD>, matched -1
Tag: TITLE>, matched 0
word: Littérature@6
word: francophone@9
word: virtuelle@12
word: ClicNet@15
Tag: /TITLE>, matched 1

title: Littérature francophone virtuelle (ClicNet)
Tag: /HEAD>, matched -1
Tag: BODY BGCOLOR="#FFFFFF" LINK="#060433" ALINK="#060433"
VLINK="#0E294B">, matched -1
Tag: center>, matched -1
Tag: IMG SRC="litterature.gif">, matched 18
image: http://clicnet.swarthmore.edu/litterature/litterature.gif
Tag: BR>, matched -1

--------
so, it matched the title from the header. then:

Tag: H2>, matched 5
word: ClicNet@52
word: Littérature@54
word: francophone@57
word: virtuelle@60
Tag: /H2>, matched 11
Tag: /center>, matched -1

it seems to match the title inside of the <H2></H2> tags

Littérature does appear in the wordlist database, as well (only it is non-cap'd):

littérature i:0 l:6 w:105469 c:5

Is any of this helpful, at all?

> Are your title_factor
> and/or heading_factor_1 non-zero?

I'm not sure what you mean by this last bit...

thanks again for all your help,

chris

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Wed Apr 05 2000 - 10:40:35 PDT