Frank Guangxin Liu (email@example.com)
Wed, 3 Mar 1999 08:39:44 -0500 (EST)
> On Tue, 2 Mar 1999, Frank Guangxin Liu wrote:
> > Does that mean it won't discover/dig new URLs either?
> It will dig new URLs, (unless you are limiting the # of pages/server, and
> have already maxed this out).
Consider this scenario, on the initial dig, some of my web servers were
down, so the statistics from the original htdig shows 0 documents for
those servers. Now on the update dig, those servers are up. I would imagine
htdig will fully dig them, unfortunately, that is not the case.
The statistics from the update htdig doesn't show those servers at all,
not even 0 documents.
> I'm testing some mods to htDig to add an ability to ignore URLs in the
> database and start only on the start_url.
> This was easy on the surface, but tricky in practice because I wanted
> to skip unchanged pages, but still follow their links. Adding a list
> of HREFs for each document to the database allowed me to maintain a
> breadth-first search order during an update dig. This is nice for me
> because I want to frequently refresh an index of just the top 500 pages
> of a server without starting from scratch each time.
> I'd like to add this option to the build if anyone else would be
> (P.S. You might also consider doing an initial dig on your subset and
> then merging the subset data into the full database when it's done)
> Matthew Edwards (firstname.lastname@example.org) | The fuel of innovation and
> Go2Net Inc. 999 Third Ave Suite 4700 | progress is freedom.
> Seattle WA 98104 |
> To unsubscribe from the htdig mailing list, send a message to
> email@example.com containing the single word "unsubscribe" in
> the SUBJECT of the message.
To unsubscribe from the htdig mailing list, send a message to
firstname.lastname@example.org containing the single word "unsubscribe" in
the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Thu Mar 04 1999 - 09:09:19 PST