Re: [htdig] Todo Ideas. Spam control, new search options and output


Geoff Hutchison (ghutchis@wso.williams.edu)
Wed, 17 Feb 1999 14:10:39 -0500 (EST)


First off, thanks for your suggestions!

> -link: url: etc..search operators
> is this "Field-base searching" discussed in the TODO list ?

Not quite. I think I left "AltaVista style searches" on the TODO list. If
not, shame on me. Field-based searching refers to searching for "Author"
or "Title" based on meta information or tags. Field-based searching would
make ht://Dig work better on online databases.

> -url dependant template
> I'd like to have different templates with certain urls, major
> sponsors, free services categories from our local directory
> etc.etc. right know yuo can modify only the stars image ...

You can easily set up templates for each site and pick the template in the
search form (using either a config file or the allow_in_form attribute set
to template_name). You can basically set anything on the templates
themselves.

For examples, check out http://www.thesaurus.com/ or my site's search at
http://wso.williams.edu/search/

> It would be nice to have a switch that groups all urls from the same
> site showing only the first hit and perhaps a variable like
> $(SISTER_URLS_LIST) that could be expanded to ... guess ...
> a list of linked url from the same site matching the query. :-)

Interesting idea, that could give search results in an outline form too.
Hmm. :-)

> -strong anti spamming control
> The sites that happen to have more often this behavior are
> intensively using keywords, description and lots of tricks to get
> high rankings. I'd like to give penalties for such things as:
> keyword spamming, empty content etc.

If you're having a problem with this, I'd suggest setting something like
this:

keyword_factor: 0.5
meta_description_factor: 0.5
(i.e., basically ignore those two fields)

I'm also looking at a variety of search ranking improvements, including
ranking words lower if they're more common. This would decrease the
ranking of documents with frequent "spam" words.

> -raw excerpts
> We are also using htdig to compile searchable dbs of glossary data.
> If it was possible to have raw excerpts (we obviously have full
> documents in excerpts right now) we could dump the files and have a
> more compact and functional system.
> There is no real need after a search to send the user to the HTML
> page. But this now means loosing formatting and anchors.

I'm not sure what you mean. Do you want ht://Dig to store the files in the
database or to not store excerpts at all? If you don't want to send the
user to the page, why don't use customize a template to remove the link to
the URL?

Cheers,
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Mon Feb 22 1999 - 07:08:23 PST