Re: [htdig] Problem in creating database...

Subject: Re: [htdig] Problem in creating database...
From: Gilles Detillieux (
Date: Wed Aug 23 2000 - 08:56:35 PDT

According to Srini Sathya.:
> I am using htdig for building my DB. But when i run the htdig and htmerge
> the datbase is only 2k and all the keywords i enter is returning me no
> matches found. I have tried deleting the db and indexing it again, but
> still no luck. Herewith i have attached the shellscript/configuration file
> for creating db. Can any1 shed somelight of where i am going wrong.

Nothing is readily apparent from the files you sent. You should try running
htdig -ivvv and looking at its output for reasons why it might be rejecting
URLs you're expecting it to find and index.

However, have a look at my observations below...
(By the way, when submitting a new question to the list, please don't
include a copy of a response to an earlier, unrelated question. It just
adds clutter and confuses the issue.)

> #!/bin/sh
> SHELL=/bin/sh
> HOME=/tmp
> HTBINDIR=../bin
> $HTBINDIR/htsearch

You didn't mention what sort of query string you were passing to this
script, but as you mentioned that the database is tiny, I'll take your
word for it that htdig isn't indexing much at all.

> # Example config file for ht://Dig.
> start_url:

I don't know if a numeric URL here is a problem or not. It shouldn't be,
but if some pages include absolute URLs to this server and mention it by
name rather than by address, those URLs will be excluded because they fall
out of the scope of limit_urls_to, which takes on the value of start_url.

> limit_urls_to: ${start_url}
> exclude_urls: /cgi-bin/ .cgi /images /images/icons /images/search \
> /_vti_cnf /_bframe.htm /_index.htm /_lframe.htm \
> /tframe.htm /index.htm /*.bak

Wildcards aren't supported in this context. The /*.bak above will only
match URLs that literally contain the string of characters "/*.bak"
at some point. However, you could add .bak to bad_extensions to get
the effect you want.

> bad_querystr: Port 80

I'm not sure what you're trying to do with this, but what it will do is
reject any URL that contains the string "port" or "80" somewhere in the
query string portion, after a "?" in the URL. Probably not what you
thought it was.

> maintainer: unconfigured@htdig.searchengine.maintainer

You ought to configure this before indexing any site through HTTP.

That's about all I can point out. The -vvv should give you more indication
of what's wrong. Note also that the main index page on your server must
contain HTML links to the rest of the pages, either directly or indirectly,
or htdig won't reach them. It doesn't see hidden (non-linked) files, and
it doesn't follow JavaScript links.

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW:
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this. List archives: <> FAQ: <>

This archive was generated by hypermail 2b28 : Wed Aug 23 2000 - 08:57:22 PDT