Re: [htdig] Pages still not all indexing.


Subject: Re: [htdig] Pages still not all indexing.
From: Karen Reardon (karen.reardon@yale.edu)
Date: Tue Sep 05 2000 - 08:15:57 PDT


If I understand what HtDig should do, you can start it from here:

http://www.library.yale.edu/pubstation/alphalist.html

or...
http://130.132.146.110/ejournals2/ejournals.asp?wheretogo=J

which is the J's page. The ASP pages are from 130.132.146.110 which is an
NT 4.0 box, www.library.yale.edu is where HtDig runs and is an AIX box.
Any help really, really appreciated!
If you then search the pages for 'Journal of the American Chemical Society'
you get no returns, American Chemical gets no returns... also under 'R' you
can't get lower down titles, like ones that begin with 'Russian'.

-karen reardon
yale university library

At 05:09 PM 9/5/00 +0200, J. op den Brouw wrote:

>Segmentation fault means that the digger (htdig) crashed
>due to illegal address references (e.g. it points to
>an address outside its range).
>
>This is not good.
>
>I'll bet it happened right in the 'J' page. htmerge
>continues, but it cannot 'merge' more that htdig
>put in the database, so you'll miss pages.
>
>Are you running 3.1.5?
>
>Is there a start point that we can use to index
>your site. Maybe we will have a crash too (not good),
>or maybe we can index your site without problems (also
>not good, because you can't).
>
>On Tue, 5 Sep 2000, Karen Reardon wrote:
>
> > I upped the max_doc_size to 50000000 and got this response:
> >
> > ./rundig_ejournals[38]: 15102 Segmentation fault(coredump)
> > htmerge: Total word count: 3102
> > htmerge: Total documents: 37
> > htmerge: Total doc db size (in K): 2451
> >
> > I can't fine a reference to 'Segmentation fault' in the documentation on
> > the web site...? Also, I still do not get any hits on Journals further
> down
> > on the page, so this did not work...
> > Should I run it with a -i to make the database from scratch?
> >
> > -karen reardon
> >
> >
> > At 08:47 AM 9/3/00 -0500, Geoff Hutchison wrote:
> > >At 8:14 AM -0400 9/3/00, Karen Reardon wrote:
> > >>I cannot get the entire pages of the large pages to index.. for example,
> > >>in the letter 'J' (the largest page), I have only about 1/3 of the
> page in
> > >>the index. I have changed max_doc_size, up to 5000000 and I still don't
> > >>get it. The page is a little over 1MB in page information in Netscape.
> > >>Is there a parameter on how long HtDig will wait for a page to load? (I
> > >>can't find one.)
> > >
> > >Looking at the description right now, it doesn't mention that the number
> > >is in bytes. So if it's really over 1MB in size, then you're only pulling
> > >in ~500K (actually a bit less). Try upping it again (say to ~5MB by
> adding
> > >another zero).
> > >
> > >--
> > >-Geoff Hutchison
> > >Williams Students Online
> > >http://wso.williams.edu/
> > >
> > >------------------------------------
> > >To unsubscribe from the htdig mailing list, send a message to
> > >htdig-unsubscribe@htdig.org
> > >You will receive a message to confirm this.
> > >List archives: <http://www.htdig.org/mail/menu.html>
> > >FAQ: <http://www.htdig.org/FAQ.html>
> >
> >
> > ------------------------------------
> > To unsubscribe from the htdig mailing list, send a message to
> > htdig-unsubscribe@htdig.org
> > You will receive a message to confirm this.
> > List archives: <http://www.htdig.org/mail/menu.html>
> > FAQ: <http://www.htdig.org/FAQ.html>
> >
> >
>
>--jesse
>--------------------------------------------------------------------
>J. op den Brouw Johanna Westerdijkplein 75
>Haagse Hogeschool 2521 EN DEN HAAG
>Faculty of Engeneering Netherlands
>Electrical Engeneering +31 70 4458936
>-------------------- J.E.J.opdenBrouw@st.hhs.nl --------------------
>
>Linux - because reboots are for hardware changes

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Tue Sep 05 2000 - 08:17:03 PDT