Subject: Re: [htdig] 2 questions: and bad_words
From: Gilles Detillieux (email@example.com)
Date: Wed Mar 01 2000 - 07:30:15 PST
According to Geoff Hutchison:
> At 12:07 PM +0100 3/1/00, Peter Kruijt wrote:
> >initials are on one line and the last name on the next, I put 's
> >in between. ht://Dig does not interpret these as spaces. Is there a way
> >to make ht://Dig interpres hard spaxes as ordinary spaces?
> Yes, it won't treat these as spaces because they're not really
> spaces. But, of course, they are white-space! To make the change you
> want, edit htdig/SGMLEntities.cc and change the nbsp line to have a
> space on the right-hand side. The downside is that since this
> involves how the indexer creates the database, you'll need to reindex
> after making the change.
Hacking the code may not be necessary. Right now, is mapped
to 160 - the ISO-8859-1 non-breaking space character. A properly
defined locale should treat this as space, so if you set your locale
to something that uses that character set, rather than just US-ASCII,
and locales work properly on your system, then the HTML parser should
treat as space. On the other hand, in the default C locale,
160 would be a non-space control character, which I believe the parser
would treat as punctuation.
> >In other words, what takes precedence bad_words or htdig-keywords?
> I'm afraid you can't really do this. The bad_words has precedent.
Probably your only option would be to index the documents with the polymer
keyword separately, with a different bad_words file, and then merge the
results. Another kludgy approach would be to add entries for the word
"polymer" to db.wordlist, referring to the docIDs of the documents you
want, between runs of htdig and htmerge - this won't work with 3.2 though.
-- Gilles R. Detillieux E-mail: <firstname.lastname@example.org> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------ To unsubscribe from the htdig mailing list, send a message to email@example.com You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Wed Mar 01 2000 - 07:34:38 PST