Re: [htdig] 3.1.1: Does noindex_start, noindex_stop work?


Frank Richter (Frank.Richter@hrz.tu-chemnitz.de)
Fri, 26 Feb 1999 10:13:59 +0100 (MET)


> Yes, it works. As of release version 3.1.0 (or 3.1.1)

Hmm, I doesn't work for me...

> >I'm trying to use the noindex_start, noindex_stop options to eliminate
> >some HTML code from digging, but with no success. Does this really work?
> >I've setup a test page at http://www.tu-chemnitz.de/~fri/test/htdig.html
> >and tried to ignore words within [...]
> ><!--htdig-noindex--> ... but with no success.

This page contains:
<!--htdig-noindex-->
        htdig - don't dig this silly text!
<!--/htdig-noindex-->

"noindex_start" and "..._stop" aren't defined in htdig-build.conf
(<!--htdig-noindex--> should be default).

htdigging:
htdig -vvvvvvvvv -i -l -t -s -c ../conf/htdig-build.conf
..
0:0:0:http://www.tu-chemnitz.de/~fri/test/htdig.html: Retrieval command
for http://www.tu-chemnitz.de/~fri/test/htdig.html: GET
/~fri/test/htdig.html HTTP/1.0
User-Agent: htdig/3.1.1 (webmaster@tu-chemnitz.de)
Host: www.tu-chemnitz.de

Header line: HTTP/1.1 200 OK
..
returnStatus = 0
Read 459 from document
Read a total of 459 bytes
Tag: HTML>, matched -1
Tag: HEAD>, matched -1
Tag: TITLE>, matched 0
word: FTP-Archive@52
Tag: /TITLE>, matched 1

title: FTP-Archive
..
Tag: /H1>, matched 10
word: htdig@758
word: don't@775
word: dig@788
word: this@797
word: silly@808
word: text!@821
word: Does@838
word: this@849
word: work@860
Tag: /BODY>, matched -1
Tag: /HTML>, matched -1
head: dummy dummy dummystyle dummystyle Willkommen auf der Testseite!
htdig - don't dig this silly text! Does this work?
 size = 459
pick: www.tu-chemnitz.de, # servers = 1
htdig: Run complete
htdig: 1 server seen:
htdig: www.tu-chemnitz.de:80 1 document

Then merge:
% htmerge -vvvvvvvv -s -c $DIR/conf/htdig-build.conf

htmerge: Sorting...
htmerge: Merging...
htmerge: Total word count: 12
htmerge: Total documents: 1
htmerge: Total doc db size (in K): 0

db.wordlist contains 13 (not 12!) words - WITH the words inside
<!--htdig-noindex-->:
auf i:0 l:684 w:1580
dig i:0 l:788 w:212
does i:0 l:838 w:162
dont i:0 l:775 w:225
dummy i:0 l:302 w:1383 c:2
dummystyle i:0 l:368 w:1240 c:2
ftparchive i:0 l:52 w:94800
htdig i:0 l:758 w:242
silly i:0 l:808 w:192
testseite i:0 l:701 w:1495
text i:0 l:821 w:179
willkommen i:0 l:660 w:1700
work i:0 l:860 w:140

And htsearching for "silly" is successfully:
http://www.tu-chemnitz.de/cgi-bin/htsearch?words=silly&method=or&format=builtin-long&config=htdig-test
 
So these words inside <!--htdig-noindex-->... are not left out...

> >Any hints available?

- Frank

-- 
Email: Frank.Richter@hrz.tu-chemnitz.de  http://www.tu-chemnitz.de/~fri/
Work:  Computing Services,  Chemnitz University of Technology,  Germany

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig@htdig.org containing the single word "unsubscribe" in the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Feb 26 1999 - 14:34:13 PST