[htdig] htdig 3.1.3 wierdness with cgi-params

Aaron Turner (aturner@linuxkb.org)
Sun, 24 Oct 1999 13:08:13 -0700 (PDT)

start_url: http://localhost/render.php3?op=1&oid=1&type=1&as=5
limit_urls_to: http://localhost/
exclude_urls: /cgi-bin/ .cgi .css

so I run ./htdig -vsi

and I get:

New server: localhost, 80
0:0:0:http://localhost/render.php3?op=1&oid=1&type=1&as=5: -+++++--+***++
size = 9168
1:1:1:http://localhost/: +* size = 328
2:2:1:http://localhost/usertools/: not found
3:3:1:http://localhost/search/help.html: -- size = 4654
4:4:1:http://localhost/about.html: -*------------------- size = 11749
5:5:1:http://localhost/license.html: -+*--***----- size = 15721
6:6:1:http://localhost/kb/render.php3?op=1=1=1=AdvancedSearch: size = 620
7:7:1:http://localhost/kb/render.php3?op=1.3=3=3=5.1: size = 63
8:8:1:http://localhost/kb/render.php3?op=1.4=4=3=5.1: size = 63
9:9:2:http://localhost/render.php3?op=1=1=1=5: size = 62
10:10:2:http://localhost/attribution.html: --- size = 1340
htdig: Run complete
htdig: 1 server seen:
htdig: localhost:80 11 documents

htdig: Errors to take note of:
Not found: http://localhost/usertools/ Ref:

The error is fine, but what's really wierd is all the URL's that have all
those "=" in them where it doesn't make sense. What I've figured out is
that htdig is dropping the CGI varnames in the URL's! It's also failing
to index the entire site (which I assume has to to do with the varname

The other odd thing is that the size htdig is reporting for pages is too
small. Those pages aren't 62-63 bytes long, they're close to 10K bytes

Thoughts anyone???

Aaron Turner, Core Developer       http://vodka.linuxkb.org/~aturner/
Linux Knowledge Base Organization  http://linuxkb.org/
Because world domination requires quality open documentation.

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig@htdig.org containing the single word unsubscribe in the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Sun Oct 24 1999 - 14:16:36 PDT