Re: [htdig] Local Director


Subject: Re: [htdig] Local Director
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Wed Aug 30 2000 - 12:40:59 PDT


According to Eric Maquiling:
> Today, The Esteemed Geoff Hutchison gathered electrons and wrote:
> > And I said if you wanted to do this, you'd need to use url_part_aliases:
> > <http://www.htdig.org/attrs.html#url_part_aliases>
>
> Ah, yes, I did this and reindexed. I tried start_url: as ip and as dns
> names and they were showing up on the results page
>
> I had it set up like this:
>
> local_urls_only: true
> start_url: http://web1
> url_part_aliases: http://web1.company.com/ *1 http://www.company.com/ *1
>
> I then then ran htdig -vvv and htmerge -vvv

How do I put this diplomatically. It's rather frustrating trying to
explain something by e-mail when the recipient doesn't read or type
carefully. Geoff and I both explained that you need two different
settings of url_part_aliases, in two separate configuration files,
to get this to work for remapping. The documentation is also clear on
this point. Putting the two different settings on one line in one file
will not accomplish anything.

Think of url_part_aliases as a search and replace operation. It goes from
left to right when putting values into the database (i.e. by htdig and
htmerge), and from right to left when taking values out of the database
(i.e. by htsearch). So, by using a different value on the left-hand side
in htsearch's config file, you end up remapping the strings in URLs from
the value in htdig's config file to that in htsearch's. Does that make
things more clear?

Now, can you state more clearly what exactly you're trying to accomplish
as far as substitutions in the URLs. I'm getting the impression that you
want web1.company.com and www.company.com to be treated as equivalent,
but that you can't use either of these as your start_url because of the
Cisco Local Director, which I assume is a firewall of some sort.

Does this mean that you want any web1.company.com or Www.company.com
address to be mapped simply to "web1" (unqualified) before any attempt
is made to fetch the document? If so, then server_aliases is probably
called for, but you need to set it up to map all fully-qualified names
to the unqualified one, so that you can fetch them from the local server.

Once you have the URLs in a consistent form that can be fetched, you
can select which of these can be mapped to local directories, using
local_urls. This may help with .html and .txt files, but won't help
with the .jsp files unless you modify the Document.cc code (but only
do so if it makes sense not to have the .jsp files be server-parsed
before indexing). But I will mention (again) the importance of the
trailing slashes in this context, as your most recent example of its
use still was missing them, giving me the impression you missed my
earlier point.

Finally, you need to decide how to set up the two separate settings
of url_part_aliases, for htdig and for htsearch, to do the search and
replace on the URLs in the database. Note that this will not affect
the URL that's used to fetch the document via HTTP. The substitution
will be done afterward. Your example above clearly doesn't make sense
when you see it in terms of a search and replace, because the pattern
you're trying to change (web1.company.com) is larger that the pattern
you're actually using in your URLs (web1), so you'll never get a match.

So, if I'm understanding you correctly, something like the following
may do the job. If not, you'll need to explain things more clearly.

For htdig and htmerge...

start_url: http://web1/
local_urls: http://web1/=/home/apache/htdocs/
local_urls_only: true
server_aliases web1.company.com:80=web1:80 \
                        www.company.com:80=web1:80
url_part_aliases: http://web1/ *1

For htsearch...

url_part_aliases: http://www.company.com/ *1

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Wed Aug 30 2000 - 12:42:23 PDT