[htdig] Spidering .asp sites?


Subject: [htdig] Spidering .asp sites?
From: John Dispirito (JohnD@ugo.com)
Date: Wed Aug 02 2000 - 10:11:10 PDT


Hi,

I'm currently attempting to spider a site which uses .asp files, and it
seems like
the spider isn't working properly. I know its a fairly big site, and most of
the time the output lists
maybe at the most 20 -30 spidered entries from the site, and then it just
stops...

Anyone have any idea why?

here is a sample of the log, which is edited. I understand that it says
that these files haven't changed, but this is all its finding...
I took out the htmerge sort info too.

Thanks, John

=-=-=-=-==- Log Info follows =-=-=-=-=-=-=-=-=-=
New server: www.someurl.com, 80
0:0:0:http://www.someurl.com/: retrieved but not changed
1:5:1:http://www.someurl.com/?doc=manufacturer.asp?brand=Giro: retrieved
but not changed
2:30:1:http://www.someurl.com/Manufacturer.asp?brand=Bianchi: retrieved but
not changed
3:16:1:http://www.someurl.com/Manufacturer.asp?brand=Cannondale: retrieved
but not changed
4:29:1:http://www.someurl.com/Manufacturer.asp?brand=HIND: retrieved but
not changed
5:17:1:http://www.someurl.com/Manufacturer.asp?brand=Oobe: retrieved but
not changed
6:26:1:http://www.someurl.com/Manufacturer.asp?brand=Royal%20Robbins:
retrieved but not changed
7:28:1:http://www.someurl.com/Manufacturer.asp?brand=Smith: retrieved but
not changed
8:15:1:http://www.someurl.com/Manufacturer.asp?brand=Tachikara: retrieved
but not changed
9:27:1:http://www.someurl.com/Manufacturer.asp?brand=Timberland: retrieved
but not changed
10:14:1:http://www.someurl.com/Manufacturer.asp?brand=Zoic: retrieved but
not changed
11:2:1:http://www.someurl.com/affiliates/: retrieved but not changed
12:12:1:http://www.someurl.com/affiliates/affiliate_info.htm: not changed
13:24:1:http://www.someurl.com/affiliates/someurlfp.htm: not changed
14:25:1:http://www.someurl.com/affiliates/lsfp.htm: not changed
15:1:1:http://www.someurl.com/home.asp: retrieved but not changed
16:13:2:http://www.someurl.com/manufacturer.asp?brand=Giro: retrieved but
not changed
17:18:1:http://www.someurl.com/manufacturer.asp?brand=HIND: retrieved but
not changed
18:10:1:http://www.someurl.com/product.asp?p=1765: retrieved but not
changed
19:21:1:http://www.someurl.com/product.asp?p=1936: retrieved but not
changed
20:22:1:http://www.someurl.com/product.asp?p=2126: retrieved but not
changed
21:7:1:http://www.someurl.com/product.asp?p=2444: retrieved but not changed
22:20:1:http://www.someurl.com/product.asp?p=3743: retrieved but not
changed
23:8:1:http://www.someurl.com/product.asp?p=5482: retrieved but not changed
24:19:1:http://www.someurl.com/product.asp?p=6274: retrieved but not
changed
25:23:1:http://www.someurl.com/product.asp?p=6510: retrieved but not
changed
26:9:1:http://www.someurl.com/product.asp?p=6555: retrieved but not changed
27:6:1:http://www.someurl.com/product.asp?p=6791: retrieved but not changed
28:3:1:http://www.someurl.com/static/someurlnews.asp: retrieved but not
changed
29:11:1:http://www.someurl.com/store.asp?s=1525: retrieved but not changed
htdig: Run complete
htdig: 1 server seen:
htdig: www.someurl.com:80 30 documents
**** MERGE SORT INFO DELETED ****

htmerge: Total word count: 2771
htmerge: 10
htmerge: 20
htmerge: 30

htmerge: Total documents: 30
htmerge: Total doc db size (in K): 744

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Wed Aug 02 2000 - 00:10:32 PDT