Subject: [htdig] htmerge: Deleted, no excerpt problem
From: Andre Dalle (adalle@freenet.carleton.ca)
Date: Thu May 18 2000 - 23:02:06 PDT
Chunks of our web site are failing to index due to being
dropped by htmerge.
I upgraded to ht://dig 3.1.5 but there is no change in behaviour
relating to this particular problem, although I do appreciate many
of the smart new features like local filesystem access!
I have checked the mailing list archives, and am sure the usual
suggested problems are not at fault..
- robots.txt does not exclude the file (htdig should have never indexed
it in the first place if that was the case?)
- server_max_docs is not in use and is definitely not at fault
- no 'noindex' or robot meta-tag in the html files
- there are keyword/description tags as well as plenty of text to search
I am at a loss and otherwise I am very pleased with ht://dig - I will include
a sample htdig/htmerge run on a small part of the website and I dearly
hope that someone can shed some light on my problem!
Note also I am using large header/document limits as well - I don't think I'm
hitting any sort of configured limit at all; I've been through the documentation
and I can find no fault in my setup, which is basically the stock htdig.conf
with some of the default limits bumped up. I will attach the file just in case.
Feel free to GET http://www.ncf.ca/rapa/index.html.
I even removed /robots.txt for this run just to be sure ..
Initial HTDIG run:
htdig# ./htdig -i -a -v -s
New server: www.ncf.ca, 80
0:0:0:http://www.ncf.ca/rapa: redirect
1:1:0:http://www.ncf.ca/rapa/: ++++++** size = 5201
2:2:1:http://www.ncf.ca/rapa/RAPAHistory.html: ****** size = 43660
3:3:1:http://www.ncf.ca/rapa/PlayHist.html: ****** size = 8691
4:4:1:http://www.ncf.ca/rapa/Sponsors.html: ****** size = 3112
5:5:1:http://www.ncf.ca/rapa/SponsorInfo.html: *******- size = 3367
6:6:1:http://www.ncf.ca/rapa/Board.html: ****** size = 2421
7:7:1:http://www.ncf.ca/rapa/WhatsOn.html: ******- size = 2793
htdig: Run complete
htdig: 1 server seen:
htdig: www.ncf.ca:80 8 documents
htdig# ./htmerge -vvv -s -a
htmerge: Sorting...
htmerge: Removing doc #0
htmerge: Merging...
htmerge: 100:association
htmerge: 200:box
htmerge: 300:churchs
htmerge: 400:critical
htmerge: 500:drama
htmerge: 600:faithfully
htmerge: 700:gathered
htmerge: 800:his
htmerge: 900:jaston
htmerge: 1000:lighted
htmerge: 1100:mears
htmerge: 1200:night
htmerge: 1300:peer
htmerge: 1400:public
htmerge: 1500:robinson
htmerge: 1600:shirts
htmerge: 1700:such
htmerge: 1800:totten
htmerge: 1900:waltons
htmerge: Total word count: 1995
Deleted, no excerpt: 0/http://www.ncf.ca/rapa
1/http://www.ncf.ca/rapa/
6/http://www.ncf.ca/rapa/Board.html
3/http://www.ncf.ca/rapa/PlayHist.html
2/http://www.ncf.ca/rapa/RAPAHistory.html
5/http://www.ncf.ca/rapa/SponsorInfo.html
4/http://www.ncf.ca/rapa/Sponsors.html
7/http://www.ncf.ca/rapa/WhatsOn.html
htmerge: Total documents: 7
htmerge: Total doc db size (in K): 67
-- Andre Dalle [adalle@ncf.ca] Systems Administrator, National Capital Freenet [http://www.ncf.ca]
------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Thu May 18 2000 - 20:50:29 PDT