Subject: [htdig] Pb indexing HTML with htdig 3.1.5
From: André LAGADEC (andre.lagadec@proto.education.gouv.fr)
Date: Tue Dec 05 2000 - 23:54:13 PST


I use htdig 3.1.5 on a Red Hat Linux 5.0, and I want to index a new web
site. But when I run rundig I get only one document.

So to see what is doing, I use rundig -vvvvvvv and I get this output :
Header line: HTTP/1.1 200 OK
Header line: Server: Netscape-Enterprise/3.5.1C
Header line: Date: Wed, 06 Dec 2000 07:32:02 GMT
Header line: Content-type: text/html
Header line: Last-modified: Mon, 15 Nov 1999 10:45:01 GMT
Translated Mon, 15 Nov 1999 10:45:01 GMT to 1999-11-15 10:45:01 (99)
And converted to Mon, 15 Nov 1999 10:45:01
Header line: Content-length: 1258
Header line: Accept-ranges: bytes
Header line: Connection: close
Header line:
returnStatus = 0
Read 1258 from document
Read a total of 1258 bytes
Tag: html>, matched -1
 size = 1258
pick: x.y.z.t, # servers = 1
htdig: Run complete
htdig: 1 server seen:
htdig: x.y.z.t:8000 1 document

I think that htdig doesn't like the HTML code "<!--//" and "//-->", and
it see beginning of comment but not the end and ignore the rest of HTML
code of the page.

I am true ? An other idea ? What can I do ?

N.B. : The HTML code of the first page on the site is under this line.

<title>Accueil DIRECTION</title>
<base target="rtop">
<script language="JavaScript">
var url="";
var nom="";
var bName="";

function Ouvrir()
        bName = navigator.appName
        Version = navigator.appVersion
        Version = Version.substring(0,1)
        browserOK = ((Version >= 2))

        if (browserOK)
if (bName=="Netscape") msgWindow.focus();


<frameset framespacing="0" border="false" frameborder="0" cols="155,*">
  <frame name="gauche" scrolling="no" noresize target="haut_droite"
  marginwidth="0" marginheight="5">
  <frameset rows="*,45">
    <frame name="texte" target="bas_droite" src="defaultb.htm"
    marginwidth="0" marginheight="0" noresize>
    <frame name="bas" src="basac.htm" scrolling="no" marginwidth="7"
  <p>Cette page utilise des cadres, mais votre navigateur ne les prend
pas en charge.</p>

