Re: [htdig] htdig and cgi


Subject: Re: [htdig] htdig and cgi
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Wed Mar 29 2000 - 06:29:09 PST


At 11:26 AM +0200 3/29/00, Matthias Kleine - Patzschke + Rasp
Software AG wrote:
>Folders are converted in folder-links, much like in ftp-directories.
>Documents are converted in links - you can click the link (=filename)
>and the document is opened.
>
>Now for my problem:
>Up to now, only the folder-Links are found by the htdig search engine,
>and not the documents. What I don't understand is the mechanism, how
>the database is created. I suppose that this mechanism is getting into
>conflict with our cgi-structure.

Conceptually it's fairly easy. The database is generated from
following all the links it finds from the URL(s) listed in start_url.
It ignores links that are forbidden by robots.txt files, META robots
tags, those matching patterns in exclude_urls and those that don't
match a pattern in limit_urls_to.

Of course if you turn on some of the debugging flags (say run htdig
-vvv) you'll see the reasons htdig is rejecting links.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Wed Mar 29 2000 - 05:33:50 PST