[htdig] Question about non-absolute start_url


Wendy Phillips (wphillips@answertechnology.com)
Mon, 24 May 1999 10:47:50 +0100


I have indexed three separate URLs and do not understand the results. Could
someone please explain to me exactly what is indexed if the start_url is not
an absolute URL? Below are 3 excerpts from the results of indexing 3 URLs.
The hop count for all 3 was 0.

1. Index of www.nrdc.org
conf is ../db/www_info/nrdcroot.conf

New server: www.nrdc.org, 80
0:0:0:http://www.nrdc.org/: size = 5158
htdig: Run complete
htdig: 1 server seen:
htdig: www.nrdc.org:80 1 document
..

2. Index of www.nrdc.org/nrdc (note the redirect below)
conf is ../db/www_info/nrdc.conf

New server: www.nrdc.org, 80
0:0:0:http://www.nrdc.org/nrdc: redirect
1:1:-1:http://www.nrdc.org/nrdc/: ++++++++++****+++++++++++- size = 5158
..
27:27:-1:http://www.nrdc.org/nrdc/worldview/: htmerge: Sorting...
..

3. Index of www.nrdc.nrdc/nrdc/ (note the trailing slash)
conf is ../db/www_info/nrdcx.conf

New server: www.nrdc.org, 80
0:0:0:http://www.nrdc.org/nrdc/: size = 5158
htdig: Run complete
htdig: 1 server seen:
htdig: www.nrdc.org:80 1 document

The search results from index 1 and 3 are identical. If I use my browser
(IE 4) to visit the 3 URLs I get the exact same page for all 3. I can use
the URL that gets me the results I want, but I want to understand it so I
can review the other 100 URLs I need to index!

Wendy Phillips
Answer Technology, Inc.
wphillips@answertechnology.com

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Mon May 24 1999 - 07:01:46 PDT