Re: [htdig] Pb indexing HTML with htdig 3.1.5


Subject: Re: [htdig] Pb indexing HTML with htdig 3.1.5
From: André LAGADEC (andre.lagadec@proto.education.gouv.fr)
Date: Fri Dec 08 2000 - 12:28:13 PST


Hello,

Sorry, I made a mistake : I write in my htdig.conf
noindex_start: <SCRIPT>
noindex_stop: </SCRIPT>

and the right thing is :
noindex_start: <SCRIPT>
noindex_end: </SCRIPT>

Excuses me and thanks for your help.

Gilles Detillieux a écrit :
>
> According to =?iso-8859-1?Q?Andr=E9?= LAGADEC:
> > I use htdig 3.1.5 on a Red Hat Linux 5.0, and I want to index a new web
> > site. But when I run rundig I get only one document.
> >
> > So to see what is doing, I use rundig -vvvvvvv and I get this output :
> > Header line: HTTP/1.1 200 OK
> > Header line: Server: Netscape-Enterprise/3.5.1C
> > Header line: Date: Wed, 06 Dec 2000 07:32:02 GMT
> > Header line: Content-type: text/html
> > Header line: Last-modified: Mon, 15 Nov 1999 10:45:01 GMT
> > Translated Mon, 15 Nov 1999 10:45:01 GMT to 1999-11-15 10:45:01 (99)
> > And converted to Mon, 15 Nov 1999 10:45:01
> > Header line: Content-length: 1258
> > Header line: Accept-ranges: bytes
> > Header line: Connection: close
> > Header line:
> > returnStatus = 0
> > Read 1258 from document
> > Read a total of 1258 bytes
> > Tag: html>, matched -1
> > head:
> > size = 1258
> > pick: x.y.z.t, # servers = 1
> > htdig: Run complete
> > htdig: 1 server seen:
> > htdig: x.y.z.t:8000 1 document
>
> You should be getting much more output than that with a verbosity level of
> 7! Is it possible that there is a NUL byte in the document, soon after the
> "<html>" tag? For some reason, htdig seems to be stopping right after this
> tag, and not getting anywhere close to the other tags in the document. I've
> tried it myself on the document you sent, and on that copy it worked fine.
> The comment around the JavaScript code is correct, and htdig was able to
> handle it. There must be something different in your copy of the document,
> such as a NUL byte, which is causing htdig's parser to end prematurely.
>
> > I think that htdig doesn't like the HTML code "<!--//" and "//-->", and
> > it see beginning of comment but not the end and ignore the rest of HTML
> > code of the page.
> >
> > I am true ? An other idea ? What can I do ?
> >
> > N.B. : The HTML code of the first page on the site is under this line.
> > _________________________________________________________________
> > <html>
> >
> > <head>
> > <title>Accueil DIRECTION</title>
> > <base target="rtop">
> > <script language="JavaScript">
> > <!--//
> > var url="";
> > var nom="";
> > var bName="";
> >
> > function Ouvrir()
> > {
> > bName = navigator.appName
> > Version = navigator.appVersion
> > Version = Version.substring(0,1)
> > browserOK = ((Version >= 2))
> >
> > if (browserOK)
> > {
> > this.name="home";
> >
> > msgWindow=window.open("actu/default2.htm","popupdpd","location=no,toolbar=no,status=no,directories=no,scrollbars=yes,width=400,height=450");
> > bName=navigator.appName;
> > if (bName=="Netscape") msgWindow.focus();
> >
> > }
> > }
> > Ouvrir()
> >
> > //-->
> > </script>
> > </head>
> >
> > <frameset framespacing="0" border="false" frameborder="0" cols="155,*">
> > <frame name="gauche" scrolling="no" noresize target="haut_droite"
> > src="defaulta.htm"
> > marginwidth="0" marginheight="5">
> > <frameset rows="*,45">
> > <frame name="texte" target="bas_droite" src="defaultb.htm"
> > scrolling="auto"
> > marginwidth="0" marginheight="0" noresize>
> > <frame name="bas" src="basac.htm" scrolling="no" marginwidth="7"
> > marginheight="15"
> > noresize>
> > </frameset>
> > <noframes>
> > <body>
> > <p>Cette page utilise des cadres, mais votre navigateur ne les prend
> > pas en charge.</p>
> > </body>
> > </noframes>
> > </frameset>
> > </html>
>
> --
> Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca>
> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
> Dept. Physiology, U. of Manitoba Phone: (204)789-3766
> Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Fri Dec 08 2000 - 12:36:29 PST