Subject: [htdig] htmerge: Deleted, no excerpt problem
From: Andre Dalle (firstname.lastname@example.org)
Date: Thu May 18 2000 - 23:02:06 PDT
Chunks of our web site are failing to index due to being
dropped by htmerge.
I upgraded to ht://dig 3.1.5 but there is no change in behaviour
relating to this particular problem, although I do appreciate many
of the smart new features like local filesystem access!
I have checked the mailing list archives, and am sure the usual
suggested problems are not at fault..
- robots.txt does not exclude the file (htdig should have never indexed
it in the first place if that was the case?)
- server_max_docs is not in use and is definitely not at fault
- no 'noindex' or robot meta-tag in the html files
- there are keyword/description tags as well as plenty of text to search
I am at a loss and otherwise I am very pleased with ht://dig - I will include
a sample htdig/htmerge run on a small part of the website and I dearly
hope that someone can shed some light on my problem!
Note also I am using large header/document limits as well - I don't think I'm
hitting any sort of configured limit at all; I've been through the documentation
and I can find no fault in my setup, which is basically the stock htdig.conf
with some of the default limits bumped up. I will attach the file just in case.
Feel free to GET http://www.ncf.ca/rapa/index.html.
I even removed /robots.txt for this run just to be sure ..
Initial HTDIG run:
htdig# ./htdig -i -a -v -s
New server: www.ncf.ca, 80
1:1:0:http://www.ncf.ca/rapa/: ++++++** size = 5201
2:2:1:http://www.ncf.ca/rapa/RAPAHistory.html: ****** size = 43660
3:3:1:http://www.ncf.ca/rapa/PlayHist.html: ****** size = 8691
4:4:1:http://www.ncf.ca/rapa/Sponsors.html: ****** size = 3112
5:5:1:http://www.ncf.ca/rapa/SponsorInfo.html: *******- size = 3367
6:6:1:http://www.ncf.ca/rapa/Board.html: ****** size = 2421
7:7:1:http://www.ncf.ca/rapa/WhatsOn.html: ******- size = 2793
htdig: Run complete
htdig: 1 server seen:
htdig: www.ncf.ca:80 8 documents
htdig# ./htmerge -vvv -s -a
htmerge: Removing doc #0
htmerge: Total word count: 1995
Deleted, no excerpt: 0/http://www.ncf.ca/rapa
htmerge: Total documents: 7
htmerge: Total doc db size (in K): 67
-- Andre Dalle [email@example.com] Systems Administrator, National Capital Freenet [http://www.ncf.ca]
------------------------------------ To unsubscribe from the htdig mailing list, send a message to firstname.lastname@example.org You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Thu May 18 2000 - 20:50:29 PDT