Brett Hansen (brett@annis.com)
Mon, 24 May 1999 16:14:05 -0400
We are currently have an issue with HTDIG where the meta tags just don't
seem to work.
this is what we are trying to use:<meta name="robots"
content="noindex,follow">
We run htdig with the correct parms and
1st -- The page itself gets indexed (even though we said noindex)
2nd -- The links on the page are not followed
This is our config file:
database_dir: /net/testicsc/data/searchfiles
database_base: ${database_dir}/junk
#/var/lib/htdig
start_url: http://test.icsc.org/srch/indexme.html
limit_urls_to: http://test.icsc.org/srch/
exclude_urls: /cgi-bin/ .cgi .19 .99 .98 /cases_imp/
/articles_imp/
max_head_length: 10000
search_algorithm: exact:1 synonyms:0.5 endings:0.1
#stuff put in by mike to override the default header &footer stuff
search_results_header: /net/netsitedocs/testicsc/searchheader.html
search_results_footer: /net/netsitedocs/testicsc/searchfooter.html
star_image: /graphics/star.gif
star_blank: /graphics/star_blank.gif
nothing_found_file: /net/netsitedocs/testicsc/searchnomatch.html
And here is the file we start indexing, called indexme.html:
<html>
<head>
<meta name="robots" content="noindex,follow">
</head>
<body>
<p>If you need something indexed that isn't pointed to from in srch, even
though
it is in srch, add it here</p>
<a href="/srch/cgi/memberprint?datafile=aprrer/current/index.html">Asia
Pacific
Report</a>
<a href="/srch/cgi/memberprint?datafile=logo/logo.html">logo page</a>
</body>
</html>
These are the commands we type to run htdig:
# htdig -v -i -u xxxxx:xxxxx -c /etc/testicschtdigjunk.conf
New server: test.icsc.org, 80
0:0:0:http://test.icsc.org/srch/indexme.html: ++ size = 373
1:1:1:http://test.icsc.org/srch/cgi/memberprint?datafile=aprrer/current/:
size = 14
:2:1:http://test.icsc.org/srch/cgi/memberprint?datafile=logo/logo.html: ---
--------------------- size = 9143
# htmerge -c /etc/testicschtdigjunk.conf
When we do a search on "asia" the indexme.html page is the only thing that
shows up on
the search. Why is this? What am I doing something wrong?
Note: If we use the meta tag: <meta name="htdig-noindex"> the page doesn't
get indexed but
there is no "name" command to force htdig to follow the links.
Any help would be great!
Brett Hansen
The Annis Group
Network Support Technician
email: brett@annis.com
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Mon May 24 1999 - 12:30:56 PDT