Re: htdig: Trouble indexing my site . Help !


Gilles Lorphelin (gilles@mana.pf)
Tue, 26 May 1998 08:54:14 +1000


Hi,

        So here we are with the details .

        Version : Linux 2.0.32
                        libg++-2.7.2.8
                        htdig3.0.8b2

        Here is my config file :

database_dir: /disk1/htdig/db

start_url: http://www.cyclone.pf/

limit_urls_to: ${start_url}

exclude_urls: /cgi-bin/ .cgi

max_head_length: 10000

search_algorithm: exact:1

        Here is the result of "htdig -i -v -v -v -v -v -s" :

New server: www.cyclone.pf, 80
Retrieval command for http://www.cyclone.pf/robots.txt: GET /robots.txt
HTTP/1.0
User-Agent: htdig/3.0.8b2 (andrew@contigo.com)
Host: www.cyclone.pf

Header line: HTTP/1.1 302 Moved Temporarily
Header line: Date: Tue, 26 May 1998 18:40:53 GMT
Header line: Server: Apache/1.2.4
Header line: Location: http://www.mana-online.pf/error.html
Header line: Connection: close
Header line: Content-Type: text/html
Header line:
returnStatus = 3
pick: www.cyclone.pf:80, # servers = 1
0:0:0:http://www.cyclone.pf/: Retrieval command for
http://www.cyclone.pf/: GET / HTTP/1.0
User-Agent: htdig/3.0.8b2 (andrew@contigo.com)
Host: www.cyclone.pf

Header line: HTTP/1.1 200 OK
Header line: Date: Tue, 26 May 1998 18:40:53 GMT
Header line: Server: Apache/1.2.4
Header line: Last-Modified: Thu, 15 Jan 1998 19:02:26 GMT
Translated Thu, 15 Jan 1998 19:02:26 GMT to Thu, 15 Jan 1998 19:02:26
(98)
And converted to Thu, 15 Jan 1998 19:02:26
Header line: ETag: "1002-2c1-34be5d42"
Header line: Content-Length: 705
Header line: Accept-Ranges: bytes
Header line: Connection: close
Header line: Content-Type: text/html
Header line:
returnStatus = 0
Read 705 from document
Read a total of 705 bytes
Tag: HTML>, matched -1
Tag: HEAD>, matched -1
Tag: META NAME="creator" CONTENT="kristof@mana.pf">, matched 20

        
        And it get stuck here .

        Any Ideas ?

Mario Baetz wrote:
>
> Hi,
>
> could You give more details about Your problems otherwise
> one has to index all these sites to know what's going wrong.
>
> Mario
>
> Gilles Lorphelin wrote:
>
> > Hi
> >
> > I Just try ht://dig ,
> > It compile well , run well with site : htdig.sdsu.edu
> >
> > But get stuck while parsing my sites :
> > www.cyclone.pf
> > www.mana.pf
> > www.surf.pf
> > www.imagin.pf
> >
> > And seems to work with some sites :
> > www.yahoo.com
> > www.whitehouse.gov
> >
> > But not with : www.france98.com
> > www.fnac.fr
> >
> > does anyone experience the same trouble ?
> >
> > Does anyone know why I can't index my sites ?
> > Is it due to any wrong HTML programming ?
> > Is it due to my apache server ?
> >

-- 
Gilles Lorphelin
Telecoms Mgr. - ISOC Member

Phone : +689 508 888 MANA S.A. (www.mana.pf) Fax : +689 508 889 IAP/ISP - Tahiti & her Islands E-mail: gilles@mana.pf



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:18 PST