[htdig] PDF indexing problem

Subject: [htdig] PDF indexing problem
From: J. op den Brouw (MSQL_User@st.hhs.nl)
Date: Mon Nov 29 1999 - 05:42:29 PST


I'm using Acroread 3.02 for HP-UX to index .pdf files. It seems to
work alright, but when htmerge starts, a lot of words seem to be
"glued" together.

Anyone ideas?

See attach file:

[msql@chaos scripts]$ ./index_test -v

New server: www.st.hhs.nl, 80
0:0:0:http://www.st.hhs.nl/~e_bro/download/infotheory.pdf: size = 1026323
htdig: Run complete
htdig: 1 server seen:
htdig: www.st.hhs.nl:80 1 document
htmerge: Sorting...
htmerge: Merging...
htmerge: 100:abilit
htmerge: 200:alsaresho
htmerge: 300:andmak
htmerge: 400:arenotjoin
htmerge: 500:atfunction
htmerge: 600:biguous
htmerge: 700:butdow
htmerge: 800:codes
htmerge: 900:curredsofar
htmerge: 1000:eablecompressionwitha
htmerge: 1100:edforlinearcodes
htmerge: 1200:elytobe
htmerge: 1300:eralargebandwidth
htmerge: 1400:erthesame
htmerge: 1500:ethesignalthatw
htmerge: 1600:foragiv
htmerge: 1700:hancethatthecodesma
htmerge: 1800:hiev
htmerge: 1900:inciden
htmerge: 2000:isdesiredtobe
htmerge: 2100:johnscollege
htmerge: 2200:madeupofaw
htmerge: 2300:namelythe
htmerge: 2400:non
htmerge: 2500:oillustrateho
htmerge: 2600:ordsthe
htmerge: 2700:pectedlength
htmerge: 2800:reproduceablewithcom
htmerge: 2900:soforsu
htmerge: 3000:talaxisthet
htmerge: 3100:thatsatisfythekraft
htmerge: 3200:themostprob
htmerge: 3300:thesetracesandarefoundto
htmerge: 3400:tifthesamplemean
htmerge: 3500:totalprobabilit
htmerge: 3600:tuitiv
htmerge: 3700:usespersecondofagaussian
htmerge: 3800:whatisitscapacit
htmerge: 3900:wnpeopledra
htmerge: 4000:ycon
htmerge: 4100:yoftheh
htmerge: 4200:ythatabitof

htmerge: Total word count: 4247

htmerge: Total documents: 1
htmerge: Total doc db size (in K): 1002
[msql@chaos scripts]$

J. op den Brouw
Haagse Hogeschool 2521 EN DEN HAAG
Sector Techniek Netherlands
Afdeling Elektrotechniek +31 70 4458936
-------------------- J.E.J.opdenBrouw@st.hhs.nl --------------------

Linux - because reboots are for hardware changes

