Re: [htdig] website indexer and local file indexer


Subject: Re: [htdig] website indexer and local file indexer
From: Brian Lavender (brian@brie.com)
Date: Sun Feb 27 2000 - 16:29:07 PST


I wrote a simple perl script which goes through an html directory and
creates a web page showing a site map of all your web pages. In fact
I modified it so it will insert a chunk of html, modified it again to
change the chunk of html, and fourthly, I made it so it will go through
your directory tree and check for all the bad links. I did not comment
the code much, but it's written fairly straight forward. This would solve
the problem of htdig possibly not getting all the pages. I am including
the code below. It would also mean that htdig would not index pages
you had restricted with .htaccess, so it works well all the way
around.

If you want to see a sample of the output, check
http://www.brie.com/site.html

To get the other scripts check
http://www.brie.com/coinduperl/

brian

#!/usr/bin/perl

# Configuration Variables
# Change these variables to match your web server's configuration

use File::Find;
use strict;

my $base_url = "http://www.brie.com";
my $base_dir = "/home/www/htdocs";
my $i = 1;

if (! -f "$base_dir/site.html") {
  open (OUTFILE , ">$base_dir/site.html");
} else {
  die "$base_dir/site.html already exists.\n Remove or move it before running this\n";
}

print OUTFILE << "__END__";
<html>
<head>
<title>Site Mapa</title>
<body>
<h1>$base_url Site Map</h1>
<PRE>
__END__

finddepth(\&wanted, $base_dir);

print OUTFILE << "__END__";
</PRE>
<P>
Originally developed by: <br>
<address>
<a href="mailto:brian\@brie.com">Brian Lavender</a>
</address>
</html>
__END__

sub wanted {

  if (/\.html$/) {
    my $temp = $File::Find::name;
    $temp =~ s/\Q$base_dir//;
    printf OUTFILE ("%5d",$i);
    print OUTFILE qq{ <A HREF="},$base_url,$temp, qq{">},$temp,"</A>\n";
    if ($i % 5 == 0) { print OUTFILE "\n"};
    $i++;
  }

}

On Sun, Feb 27, 2000 at 09:04:43AM -0600, Geoff Hutchison wrote:
> At 1:15 AM -0500 2/27/00, David McMahon wrote:
> >I would like the be able to just specify a
> >directory and have it recurse through looking
> >at every valid file type and index it. I don't see
> >that this is possible with htdig. Is this the case?
>
> This is correct. Ht://Dig follows links through documents, so unless
> you have some sort of directory index, it won't be able to do this.
> Of course, you can write a simple script and either use that as an
> index, or simply as a setting for your start_url setting.
>
> The 3.2 branch of the code currently supports file:// URLs and it has
> been suggested that this code could generate a list of files for
> directory URLs, but no one has stepped forward to implement this
> feature. Help is, of course, always welcome.
>
> -Geoff Hutchison
> Williams Students Online
> http://wso.williams.edu/

-- 
Brian Lavender
http://www.brie.com/brian/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Sun Feb 27 2000 - 16:33:45 PST