htdig: Re: Remote traversal with WebGLIMPSE


Aaron Newsome (aaron.d.newsome@wdc.com)
Mon, 04 May 1998 13:29:48 -0700


I wouldn't exactly call these "clues", but here is the output of .wg_err anyway.

Cannot go from remote site pixie.wdc.com to remote site dragon.wdc.com:80...
skipping http://dragon.wdc.com:80/.
Cannot go from remote site pixie.wdc.com to remote site year2000.wdc.com:80...
skipping http://year2000.wdc.com:80/.
Cannot go from remote site pixie.wdc.com to remote site delta2.wdc.com:80...
skipping http://delta2.wdc.com:80/oraimp.
Cannot go from remote site pixie.wdc.com to remote site gatekeeper.wdc.com:80...
skipping http://gatekeeper.wdc.com:80/.

Thanks for trying Golda, but are you the only person on this list ?

Thanks again, Aaron

Golda Bernstein wrote:

> traverse_type 1 is correct, and from the output Webglimpse did try to
> retrieve your remote links. For some reason the retrieval failed. You
> might want to check the .wg_err file for clues.
>
> --Golda
>
> At 12:35 PM 5/4/98 -0700, Aaron Newsome wrote:
> >I did run confarc, and I did answer yes to "Traverse Remote Links". I also
> >set the explicit option to no. I actually experimented with *all*
> >combinations of options for those two paramters. None would index the remote
> >files.
> >
> >I have a very simple setup so I'll include the files and the output in this
> >email.
> >
> >archive.cfg:
> >=========
> >title Western Digital Global Search
> >urlpath http://pixie.wdc.com/index
> >traverse_type 1
> >explicit_only 0
> >numhops 5
> >nhhops 3
> >local_limit 99999
> >remote_limit 10000
> >addboxes 0
> >vhost default
> >usemaxmem 0
> >urllist http://pixie.wdc.com/index/index.html
> >
> >http://pixie.wdc.com/index/index.html:
> >===========================
> ><HTML>
> ><HEAD>
> ></HEAD>
> ><BODY>
> ><A HREF="http://gatekeeper.wdc.com">http://gatekeeper.wdc.com</A>
> ><A HREF="http://year2000.wdc.com">http://year2000.wdc.com</A>
> ><A HREF="http://delta2.wdc.com/oraimp">http://delta2.wdc.com/oraimp</A>
> ><A HREF="http://dragon.wdc.com>http://dragon.wdc.com</A>
> ></BODY>
> ></HTML>
> >
> >Output of wgreindex:
> >===============
> >pixie:/usr/local/apache/html/index# wgreindex
> >No search boxes used
> >Getting remote links, 5 hops...
> >Neighborhood will be 3 hops.
> >Traversing 5 hops...
> >Url http://dragon.wdc.com:80/ is remote...
> >Getting remote url: http://dragon.wdc.com:80/
> >Url http://year2000.wdc.com:80/ is remote...
> >Getting remote url: http://year2000.wdc.com:80/
> >Url http://delta2.wdc.com:80/oraimp is remote...
> >Getting remote url: http://delta2.wdc.com:80/oraimp
> >Url http://gatekeeper.wdc.com:80/ is remote...
> >Getting remote url: http://gatekeeper.wdc.com:80/
> >No more links to traverse.
> >
> >
> >------------------------------------------------------
> >Collected 1 local pages and 0 remote pages.
> >------------------------------------------------------
> >
> >Creating neighborhood for /usr/local/apache/html/index/index.html.
> >No search boxes used
> >
> >This is glimpseindex version 4.1, 1997.
> >
> >Indexing "/usr/local/apache/html/index/index.html
> >http://pixie.wdc.com/index/ind
> >ex.html" ...
> >
> >Size of files being indexed = 508 B, Total #of files = 1
> >
> >Index-directory: "/usr/local/apache/html/index"
> >Glimpse-files created here:
> >-rw-r--r-- 1 root root 91 May 4 12:29 .glimpse_filehash
> >-rw-r--r-- 1 root root 262144 May 4 12:29
> >.glimpse_filehash_index
> >-rw-r--r-- 1 root root 89 May 4 12:29 .glimpse_filenames
> >-rw-r--r-- 1 root root 4 May 4 12:29
> >.glimpse_filenames_index
> >-rw-r--r-- 1 root root 4 May 4 12:29 .glimpse_filetimes
> >-rw-r--r-- 1 root root 4 May 4 12:29
> >.glimpse_filetimes.index
> >-rw-r--r-- 1 root root 175 May 3 17:17 .glimpse_filters
> >-rw------- 1 root root 306 May 4 12:29 .glimpse_index
> >-rw-r--r-- 1 root root 116 May 4 12:29 .glimpse_messages
> >-rw------- 1 root root 58 May 4 12:29 .glimpse_partitions
> >-rw-r--r-- 1 root root 1353 May 4 12:29 .glimpse_statistics
> >-rw-r--r-- 1 root root 262144 May 4 12:29 .glimpse_turbo
> >Zero sized output for: /usr/local/apache/html/index/.nh.index.html
> >hash_misses=0 num_input_filenames=1
> >pixie:/usr/local/apache/html/index#
> >
> >Does anybody have a clue what is going on here. I have fought with this too
> >much I think.
> >
> >No disrespect to the authors, but I have read and re-read all of the docs and
> >can still not figure this out. I did however download and compile ht://Dig.
> >It worked perfectly the first time (on all the remote files). And it has two
> >advatntages over webglimpse.
> >
> >* It's free <- This is a big one
> >* It works <- equally important
> >
> >For now ht://Dig has solved my needs but I would still like too understand
> >how to make webglimpse work. I may want to run it at home or something.
> >
> >Thanks for all your help.
> >
> >Golda Bernstein wrote:
> >
> >> At 05:54 PM 5/3/98 -0700, Aaron Newsome wrote:
> >> >I have tried every combination of archive.cfg directives I can think of.
> >> >When I try to archive remote sites I get a message that says:
> >> >
> >> >Skipping non-local url:
> >> >
> >> >Is there any way to make webglimpse index non-local URL's.
> >> >
> >> >Thanks,
> >> >Aaron Newsome
> >> >aaron.d.newsome@wdc.com
> >>
> >> Yes, see the sample archive.cfg file at
> >>
> >> http://tucson.com/webglimpse/sample.archive.cfg
> >>
> >> for more complete docs on what each setting does. When you run confarc it
> >> should also prompt you for whether to index remote pages, and you should
> >> answer Y to get the right archive.cfg setting put in automatically.
> >>
> >> If you're having trouble running confarc on your machine, you may want to
> >> try the latest beta (1.6b2). You can download it from
> >> http://tucson.com/webglimpse.
> >>
> >> --Golda
> >>
> >> ------------------------------------------------------------------
> >> Golda Bernstein mailto:gberns@tucson.com Ph. (520) 620-6878
> >> Internet WorkShop http://tucson.com FAX (520) 620-6841
> >
> >
> >
> >
> >
> ------------------------------------------------------------------
> Golda Bernstein mailto:gberns@tucson.com Ph. (520) 620-6878
> Internet WorkShop http://tucson.com FAX (520) 620-6841

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:16 PST