[htdig] multiple "documents" in one file?


Subject: [htdig] multiple "documents" in one file?
From: David Sklar (sklar@student.net)
Date: Fri May 19 2000 - 08:24:33 PDT


I am attempting to use htdig to index a large number (~100,000) files each of
which are pretty small (~500 bytes). Running htdig -vv and using strace seems
to indicate that htdig is spending most of its time opening and closing these
files, and not actually doing the indexing.

Is there a way (either in 3.1.5, which I'm using, or in 3.2) to concatenate
all of these individual files into one large file, with some delimiter between
them, and have htdig be aware of that delimiter to differentiate between the
files?

I.e., instead of foo.html containing

<HEAD><TITLE>foo</TITLE><HEAD>
<BODY><H1>Nice pants</H1><H3>I really like a good pair of pants</H3></BODY>

and bar.html containing

<HEAD><TITLE>bar</TITLE><HEAD>
<BODY><H1>Turtle soup</H1><H3>Eating turtle soup is the cat's meow</H3></BODY>

I could have one file, that might look something like

<HTML>
<HEAD><TITLE>foo</TITLE><HEAD>
<BODY><H1>Nice pants</H1><H3>I really like a good pair of pants</H3></BODY>
</HTML>
<HTML>
<HEAD><TITLE>bar</TITLE><HEAD>
<BODY><H1>Turtle soup</H1><H3>Eating turtle soup is the cat's meow</H3></BODY>
</HTML>

-dave

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Fri May 19 2000 - 06:13:46 PDT