Re: [htdig] multiple "documents" in one file?


Subject: Re: [htdig] multiple "documents" in one file?
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Fri May 19 2000 - 15:13:46 PDT


According to David Sklar:
> I am attempting to use htdig to index a large number (~100,000) files each of
> which are pretty small (~500 bytes). Running htdig -vv and using strace seems
> to indicate that htdig is spending most of its time opening and closing these
> files, and not actually doing the indexing.
>
> Is there a way (either in 3.1.5, which I'm using, or in 3.2) to concatenate
> all of these individual files into one large file, with some delimiter between
> them, and have htdig be aware of that delimiter to differentiate between the
> files?

Are you making use of the local_urls attribute? When indexing through the
local file system, there shouldn't be that much overhead to opening and
closing files, but if you're going through HTTP, then there would be a
significant overhead for fetching small files like this.

Unfortunately, htdig is designed to index documents based on unique URLs,
which it needs because each search result must point to a unique page, so
there's no good way of combining documents.

Version 3.2 supports HTTP 1.1's persistent connections, so if your server
supports it, it will greatly decrease the overhead of continually opening
and closing server connections for each file.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Fri May 19 2000 - 13:01:59 PDT