Re: [htdig] Pages still not all indexing.


Subject: Re: [htdig] Pages still not all indexing.
From: Karen Reardon (karen.reardon@yale.edu)
Date: Tue Sep 05 2000 - 08:21:12 PDT


If I take out the extra '0' that I added in the max_doc_size, I do not get
the segmentation fault.... but I also do not get the entire page...

-kmr

At 05:09 PM 9/5/00 +0200, J. op den Brouw wrote:

>Segmentation fault means that the digger (htdig) crashed
>due to illegal address references (e.g. it points to
>an address outside its range).
>
>This is not good.
>
>I'll bet it happened right in the 'J' page. htmerge
>continues, but it cannot 'merge' more that htdig
>put in the database, so you'll miss pages.
>
>Are you running 3.1.5?
>
>Is there a start point that we can use to index
>your site. Maybe we will have a crash too (not good),
>or maybe we can index your site without problems (also
>not good, because you can't).
>
>On Tue, 5 Sep 2000, Karen Reardon wrote:
>
> > I upped the max_doc_size to 50000000 and got this response:
> >
> > ./rundig_ejournals[38]: 15102 Segmentation fault(coredump)
> > htmerge: Total word count: 3102
> > htmerge: Total documents: 37
> > htmerge: Total doc db size (in K): 2451
> >
> > I can't fine a reference to 'Segmentation fault' in the documentation on
> > the web site...? Also, I still do not get any hits on Journals further
> down
> > on the page, so this did not work...
> > Should I run it with a -i to make the database from scratch?
> >
> > -karen reardon
> >
> >
> > At 08:47 AM 9/3/00 -0500, Geoff Hutchison wrote:
> > >At 8:14 AM -0400 9/3/00, Karen Reardon wrote:
> > >>I cannot get the entire pages of the large pages to index.. for example,
> > >>in the letter 'J' (the largest page), I have only about 1/3 of the
> page in
> > >>the index. I have changed max_doc_size, up to 5000000 and I still don't
> > >>get it. The page is a little over 1MB in page information in Netscape.
> > >>Is there a parameter on how long HtDig will wait for a page to load? (I
> > >>can't find one.)
> > >
> > >Looking at the description right now, it doesn't mention that the number
> > >is in bytes. So if it's really over 1MB in size, then you're only pulling
> > >in ~500K (actually a bit less). Try upping it again (say to ~5MB by
> adding
> > >another zero).
> > >
> > >--
> > >-Geoff Hutchison
> > >Williams Students Online
> > >http://wso.williams.edu/
> > >
> > >------------------------------------
> > >To unsubscribe from the htdig mailing list, send a message to
> > >htdig-unsubscribe@htdig.org
> > >You will receive a message to confirm this.
> > >List archives: <http://www.htdig.org/mail/menu.html>
> > >FAQ: <http://www.htdig.org/FAQ.html>
> >
> >
> > ------------------------------------
> > To unsubscribe from the htdig mailing list, send a message to
> > htdig-unsubscribe@htdig.org
> > You will receive a message to confirm this.
> > List archives: <http://www.htdig.org/mail/menu.html>
> > FAQ: <http://www.htdig.org/FAQ.html>
> >
> >
>
>--jesse
>--------------------------------------------------------------------
>J. op den Brouw Johanna Westerdijkplein 75
>Haagse Hogeschool 2521 EN DEN HAAG
>Faculty of Engeneering Netherlands
>Electrical Engeneering +31 70 4458936
>-------------------- J.E.J.opdenBrouw@st.hhs.nl --------------------
>
>Linux - because reboots are for hardware changes
>
>
>------------------------------------
>To unsubscribe from the htdig mailing list, send a message to
>htdig-unsubscribe@htdig.org
>You will receive a message to confirm this.
>List archives: <http://www.htdig.org/mail/menu.html>
>FAQ: <http://www.htdig.org/FAQ.html>

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Tue Sep 05 2000 - 08:22:12 PDT