htdig: Re: Remote traversal with WebGLIMPSE


Golda Bernstein (gberns@tucson.com)
Mon, 04 May 1998 12:07:00 -0700


traverse_type 1 is correct, and from the output Webglimpse did try to
retrieve your remote links. For some reason the retrieval failed. You
might want to check the .wg_err file for clues.

--Golda

At 12:35 PM 5/4/98 -0700, Aaron Newsome wrote:
>I did run confarc, and I did answer yes to "Traverse Remote Links". I also
>set the explicit option to no. I actually experimented with *all*
>combinations of options for those two paramters. None would index the remote
>files.
>
>I have a very simple setup so I'll include the files and the output in this
>email.
>
>archive.cfg:
>=========
>title Western Digital Global Search
>urlpath http://pixie.wdc.com/index
>traverse_type 1
>explicit_only 0
>numhops 5
>nhhops 3
>local_limit 99999
>remote_limit 10000
>addboxes 0
>vhost default
>usemaxmem 0
>urllist http://pixie.wdc.com/index/index.html
>
>http://pixie.wdc.com/index/index.html:
>===========================
><HTML>
><HEAD>
></HEAD>
><BODY>
><A HREF="http://gatekeeper.wdc.com">http://gatekeeper.wdc.com</A>
><A HREF="http://year2000.wdc.com">http://year2000.wdc.com</A>
><A HREF="http://delta2.wdc.com/oraimp">http://delta2.wdc.com/oraimp</A>
><A HREF="http://dragon.wdc.com>http://dragon.wdc.com</A>
></BODY>
></HTML>
>
>Output of wgreindex:
>===============
>pixie:/usr/local/apache/html/index# wgreindex
>No search boxes used
>Getting remote links, 5 hops...
>Neighborhood will be 3 hops.
>Traversing 5 hops...
>Url http://dragon.wdc.com:80/ is remote...
>Getting remote url: http://dragon.wdc.com:80/
>Url http://year2000.wdc.com:80/ is remote...
>Getting remote url: http://year2000.wdc.com:80/
>Url http://delta2.wdc.com:80/oraimp is remote...
>Getting remote url: http://delta2.wdc.com:80/oraimp
>Url http://gatekeeper.wdc.com:80/ is remote...
>Getting remote url: http://gatekeeper.wdc.com:80/
>No more links to traverse.
>
>
>------------------------------------------------------
>Collected 1 local pages and 0 remote pages.
>------------------------------------------------------
>
>Creating neighborhood for /usr/local/apache/html/index/index.html.
>No search boxes used
>
>This is glimpseindex version 4.1, 1997.
>
>Indexing "/usr/local/apache/html/index/index.html
>http://pixie.wdc.com/index/ind
>ex.html" ...
>
>Size of files being indexed = 508 B, Total #of files = 1
>
>Index-directory: "/usr/local/apache/html/index"
>Glimpse-files created here:
>-rw-r--r-- 1 root root 91 May 4 12:29 .glimpse_filehash
>-rw-r--r-- 1 root root 262144 May 4 12:29
>.glimpse_filehash_index
>-rw-r--r-- 1 root root 89 May 4 12:29 .glimpse_filenames
>-rw-r--r-- 1 root root 4 May 4 12:29
>.glimpse_filenames_index
>-rw-r--r-- 1 root root 4 May 4 12:29 .glimpse_filetimes
>-rw-r--r-- 1 root root 4 May 4 12:29
>.glimpse_filetimes.index
>-rw-r--r-- 1 root root 175 May 3 17:17 .glimpse_filters
>-rw------- 1 root root 306 May 4 12:29 .glimpse_index
>-rw-r--r-- 1 root root 116 May 4 12:29 .glimpse_messages
>-rw------- 1 root root 58 May 4 12:29 .glimpse_partitions
>-rw-r--r-- 1 root root 1353 May 4 12:29 .glimpse_statistics
>-rw-r--r-- 1 root root 262144 May 4 12:29 .glimpse_turbo
>Zero sized output for: /usr/local/apache/html/index/.nh.index.html
>hash_misses=0 num_input_filenames=1
>pixie:/usr/local/apache/html/index#
>
>Does anybody have a clue what is going on here. I have fought with this too
>much I think.
>
>No disrespect to the authors, but I have read and re-read all of the docs and
>can still not figure this out. I did however download and compile ht://Dig.
>It worked perfectly the first time (on all the remote files). And it has two
>advatntages over webglimpse.
>
>* It's free <- This is a big one
>* It works <- equally important
>
>For now ht://Dig has solved my needs but I would still like too understand
>how to make webglimpse work. I may want to run it at home or something.
>
>Thanks for all your help.
>
>Golda Bernstein wrote:
>
>> At 05:54 PM 5/3/98 -0700, Aaron Newsome wrote:
>> >I have tried every combination of archive.cfg directives I can think of.
>> >When I try to archive remote sites I get a message that says:
>> >
>> >Skipping non-local url:
>> >
>> >Is there any way to make webglimpse index non-local URL's.
>> >
>> >Thanks,
>> >Aaron Newsome
>> >aaron.d.newsome@wdc.com
>>
>> Yes, see the sample archive.cfg file at
>>
>> http://tucson.com/webglimpse/sample.archive.cfg
>>
>> for more complete docs on what each setting does. When you run confarc it
>> should also prompt you for whether to index remote pages, and you should
>> answer Y to get the right archive.cfg setting put in automatically.
>>
>> If you're having trouble running confarc on your machine, you may want to
>> try the latest beta (1.6b2). You can download it from
>> http://tucson.com/webglimpse.
>>
>> --Golda
>>
>> ------------------------------------------------------------------
>> Golda Bernstein mailto:gberns@tucson.com Ph. (520) 620-6878
>> Internet WorkShop http://tucson.com FAX (520) 620-6841
>
>
>
>
>
------------------------------------------------------------------
Golda Bernstein mailto:gberns@tucson.com Ph. (520) 620-6878
Internet WorkShop http://tucson.com FAX (520) 620-6841
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:15 PST