This is a partial list of literature relevent to ht://Dig
development, including sources on other web search engines, search
algorithms, databases, fuzzy searching and other topics. It is by no
means a complete bibliography of these topics but should include
some good resources.
-
Agirre, E., K. Gojenola, et al. (1998). ``Towards a single
proposal in spelling correction.'' http://xxx.lanl.gov/ps/cmp-lg/9806010.
- Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text
http://theory.stanford.edu/people/raghavan/www7/181.html
- Brin, S., J. Davis, et al. (1995). Copy Detection Mechanisms
for Digital Documents. ACM SIGMOD Annual Conference, San
Francisco, California. http://www-db.stanford.edu/~shiva/copy.pdf.
-
Brin, S. and L. Page (1998). The Anatomy of a Large-Scale
Hypertextual Web Search Engine. The Seventh Annual
International WWW Conference. http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm.
- Sergey Brin, Rajeev Motwani, Lawrence Page, Terry Winograd. What can you do with a Web in your pocket?
http://www.research.microsoft.com/research/db/debull/98june/webbase.ps
-
Chakrabarti, S., B. Dom, et al. (1998). Automatic Resource
Compilation by Analyzing Hyperlink Structure and Associated
Text. Seventh International WWW Conference, Brisbane,
Australia. http://decweb.ethz.ch/WWW7/1898/com1898.htm.
-
Cho, J., H. Garcia-Molina, et al. (1998). Efficient Crawling
Through URL Ordering. The Seventh Annual International World
Wide Web Conference. http://www-db.stanford.edu/~cho/crawler-paper/.
-
Engineering, U. I. (1998). Why On-Site Searching Stinks. http://world.std.com/~uieweb/searchart.htm.
-
Fang, M., N. Shivakumar, et al. (1998). Computing Iceberg
Queries Efficiently. 1998 International Conference on Very
Large Databases (VLDB '98), New York. http://www-db.stanford.edu/~shiva/Pubs/iceberg-full.ps.
-
Golding, A. R. and Y. Schabes (1996). Combining Trigram-based
and Feature-based Methods for Context-Sensitive Spelling
Correction. The 34th Annual Meeting of the Association for
Computational Linguistics, Santa Cruz, CA. http://xxx.lanl.gov/ps/cmp-lg/9605037.
- Monika R. Henzinger, Allan Heydon, Michael Mitzenmacher, Mark Najork. Measuring Index Quality Using Random Walks on the Web
http://www8.org/w8-papers/2c-search-discover/measuring/measuring.html
Kleinberg, J. (1998). Authoritative sources in a hyperlinked
environment. The Nineth Annual ACM-SIAM Symposium on Discrete
Algorithms. http://simon.cs.cornell.edu/home/kleinber/auth.ps.
-
Kukich, K. (1992). ``Technique for automatically correcting words
in text.'' ACM Computing Surveys 24(4): 377-439. http://www.acm.org/pubs/toc/Abstracts/surveys/146380.html.
-
Lawrence, S. and C. L. Giles (1998). Searching the World Wide Web.
Science. 280: 98-100. http://www.sciencemag.org/cgi/content/full/280/5360/98?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=&author1=Lawrence&author2=Giles&searchid=QID_NOT_SET&FIRSTINDEX=.
-
Manber, U. (1997). ``A Text Compression Scheme that Allows Fast
Searching Directly in the Compressed File.'' ACM Transactions
on Information Systems 15(2). ftp://ftp.cs.arizona.edu/people/udi/CAS.ps.
-
Marchiori, M. (1997). The quest for correct information on the
web: Hyper search engines. The Sixth International WWW
Conference, Santa Clara, USA. http://www6.nttlabs.com/HyperNews/get/PAPER222.html.
-
Mayfield, J. (1998). Research on N-Grams in Information Retrieval.
http://www.cs.umbc.edu/ngram/.
- Members of the Clever Project. Hypersearching the Web
http://www.sciam.com/1999/0699issue/0699raghavan.html
-
Muth, R. and U. Manber (1996). Approximate Multiple String
Search. Seventh Annual Combinatorial Pattern Matching
Symposium, Laguna Beach, CA. ftp://ftp.cs.arizona.edu/people/udi/approx-multi.ps.
-
Page, L., S. Brin, et al. (1998). ``The PageRank Citation Ranking:
Bringing Order to the Web.'' (work in progress). http://google.stanford.edu/~backrub/pageranksub.ps.
-
Pinkerton, B. (1994). Finding What People Want: Experiences
with the WebCrawler. The Second International WWW Conference,
Chicago, USA. http://info.webcrawler.com/bp/WWW94.html.
-
Pollock, J. J. and E. M. Zamora (1984). ``Automatic spelling
correction in scientific and scholarly text.'' Communications
of the ACM 27(4): 358-368. .
-
Rapp, R. (1997). Text-Detector. c't: 386. http://www.heise.de/ct/english/9704386/.
-
Shivakumar, N. and H. Garcia-Molina (1995). SCAM: A Copy
Detection Mechanism for Digital Documents. 2nd International
Conference in Theory and Practice of Digital Libraries, Austin,
Texas. http://www-db.stanford.edu/~shiva/Pubs/scam.pdf.
-
Shivakumar, N. and H. Garcia-Molina (1996). Building a scalable
and accurate copy detection mechanism. First ACM Conference on
Digital Libraries, Bethesda, Maryland. .
-
Shivakumar, N. and H. Garcia-Molina (1998). Finding
near-replicas of documents on the web. Workshop on Web
Databases. http://www-db.stanford.edu/~shiva/Pubs/web.ps.
-
Tillman, H. N. (1997). Evaluating quality on the
net. Internet Librarian, Monterey, California. http://www.hopetillman.com/findqual.html.
-
Wu, S. and U. Manber (1992). ``Fast Text Searching Allowing
Errors.'' Communications of the ACM 35(October
1992): 83-91. .
-
Wu, S. and U. Manber (1994). ``A Fast Algorithm for Multi-Pattern
Searching.'' . ftp://ftp.cs.arizona.edu/reports/1994/TR94-17.ps.
This is a growing list of various useful RFCs and other standards.
Last modified: $Date: 2001/06/13 14:31:49 $