[m-dev.] diff: make htdig self-contained
Peter Ross
petdr at cs.mu.OZ.AU
Tue Feb 15 16:21:41 AEDT 2000
Hi,
===================================================================
Estimated hours taken: 1
Changes to make htdig more self contained.
htdig/htdig-mercury.conf:
Keep the htdig config file under source control.
htdig/Makefile:
Install the new config file.
include/search.inc:
mailing-lists/include/search.inc:
Changes to use the new configuration file.
Index: htdig/Makefile
===================================================================
RCS file: /home/staff/zs/imp/w3/htdig/Makefile,v
retrieving revision 1.2
diff -u -r1.2 Makefile
--- htdig/Makefile 1999/12/06 23:57:22 1.2
+++ htdig/Makefile 2000/02/15 05:13:30
@@ -21,7 +21,7 @@
header_and_footer.html
local_install:
- $(CP) header.inc footer.inc $(INSTALL_WEBDIR)/
+ $(CP) htdig-mercury.conf header.inc footer.inc $(INSTALL_WEBDIR)/
local_clean:
rm -f header.inc footer.inc
Index: htdig/htdig-mercury.conf
===================================================================
RCS file: htdig-mercury.conf
diff -N htdig-mercury.conf
--- /dev/null Wed May 28 10:49:58 1997
+++ htdig-mercury.conf Tue Feb 15 16:19:07 2000
@@ -0,0 +1,135 @@
+#
+# IMPORTANT: this file needs to be linked to from the location that
+# htdig expects to find its configuration files.
+#
+# Example config file for ht://Dig.
+#
+# This configuration file is used by all the programs that make up ht://Dig.
+# Please refer to the attribute reference manual for more details on what
+# can be put into this file. (/usr/doc/htdig/html/confindex.html)
+# Note that most attributes have very reasonable default values so you
+# really only have to add attributes here if you want to change the defaults.
+#
+# What follows are some of the common attributes you might want to change.
+#
+
+#
+# Specify where the database files need to go. Make sure that there is
+# plenty of free disk space available for the databases. They can get
+# pretty big.
+#
+database_dir: /home/mercury5/htdig/
+
+#
+# This specifies the URL where the robot (htdig) will start. You can specify
+# multiple URLs here. Just separate them by some whitespace.
+# The example here will cause the ht://Dig homepage and related pages to be
+# indexed.
+#
+start_url: http://www.cs.mu.oz.au/research/mercury/
+
+#
+# This attribute limits the scope of the indexing process. The default is to
+# set it to the same as the start_url above. This way only pages that are on
+# the sites specified in the start_url attribute will be indexed and it will
+# reject any URLs that go outside of those sites.
+#
+# Keep in mind that the value for this attribute is just a list of string
+# patterns. As long as URLs contain at least one of the patterns it will be
+# seen as part of the scope of the index.
+#
+limit_urls_to: ${start_url}
+
+#
+# If there are particular pages that you definately do NOT want to index, you
+# can use the exclude_urls attribute. The value is a list of string patterns.
+# If a URL matches any of the patterns, it will NOT be indexed. This is
+# useful to exclude things like virtual web trees or database accesses. By
+# default, all CGI URLs will be excluded. (Note that the /cgi-bin/ convention
+# may not work on your web server. Check the path prefix used on your web
+# server.)
+#
+# [I also made it exclude .gz, .deb, and .rpm files,
+# since there doesn't seem to be any point trying to index
+# binary files. -fjh.]
+#
+exclude_urls: /cgi-bin/ .cgi .gz .deb .rpm
+
+#
+# The excerpts that are displayed in long results rely on stored information
+# in the index databases. The compiled default only stores 512 characters of
+# text from each document (this excludes any HTML markup...) If you plan on
+# using the excerpts you probably want to make this larger. The only concern
+# here is that more disk space is going to be needed to store the additional
+# information. Since disk space is cheap (! :-)) you might want to set this
+# to a value so that a large percentage of the documents that you are going
+# to be indexing are stored completely in the database. At SDSU we found
+# that by setting this value to about 50k the index would get 97% of all
+# documents completely and only 3% was cut off at 50k. You probably want to
+# experiment with this value.
+# Note that if you want to set this value low, you probably want to set the
+# excerpt_show_top attribute to false so that the top excerpt_length characters
+# of the document are always shown.
+#
+max_head_length: 10000
+
+#
+# Depending on your needs, you might want to enable some of the fuzzy search
+# algorithms. There are several to choose from and you can use them in any
+# combination you feel comfortable with. Each algorithm will get a weight
+# assigned to it so that in combinations of algorithms, certain algorithms get
+# preference over others. Note that the weights only affect the ranking of
+# the results, not the actual searching.
+# The available algorithms are:
+# exact
+# endings
+# synonyms
+# soundex
+# metaphone
+# By default only the "exact" algorithm is used with weight 1.
+# Note that if you are going to use any of the algorithms other than "exact",
+# you need to use the htfuzzy program to generate the databases that each
+# algorithm requires.
+#
+search_algorithm: exact:1 synonyms:0.5 endings:0.1
+
+#
+# The following are used to change the text for the page index.
+# The defaults are just boring text numbers. These images spice
+# up the result pages quite a bit. (Feel free to do whatever, though)
+#
+next_page_text: <img src=/doc/htdig/images/buttonr.gif border=0 align=middle width=30 height=30 alt=next>
+no_next_page_text:
+prev_page_text: <img src=/doc/htdig/images/buttonl.gif border=0 align=middle width=30 height=30 alt=prev>
+no_prev_page_text:
+page_number_text: "<img src=/doc/htdig/images/button1.gif border=0 align=middle width=30 height=30 alt=1>" \
+ "<img src=/doc/htdig/images/button2.gif border=0 align=middle width=30 height=30 alt=2>" \
+ "<img src=/doc/htdig/images/button3.gif border=0 align=middle width=30 height=30 alt=3>" \
+ "<img src=/doc/htdig/images/button4.gif border=0 align=middle width=30 height=30 alt=4>" \
+ "<img src=/doc/htdig/images/button5.gif border=0 align=middle width=30 height=30 alt=5>" \
+ "<img src=/doc/htdig/images/button6.gif border=0 align=middle width=30 height=30 alt=6>" \
+ "<img src=/doc/htdig/images/button7.gif border=0 align=middle width=30 height=30 alt=7>" \
+ "<img src=/doc/htdig/images/button8.gif border=0 align=middle width=30 height=30 alt=8>" \
+ "<img src=/doc/htdig/images/button9.gif border=0 align=middle width=30 height=30 alt=9>" \
+ "<img src=/doc/htdig/images/button10.gif border=0 align=middle width=30 height=30 alt=10>"
+#
+# To make the current page stand out, we will put a border arround the
+# image for that page.
+#
+no_page_number_text: "<img src=/doc/htdig/images/button1.gif border=2 align=middle width=30 height=30 alt=1>" \
+ "<img src=/doc/htdig/images/button2.gif border=2 align=middle width=30 height=30 alt=2>" \
+ "<img src=/doc/htdig/images/button3.gif border=2 align=middle width=30 height=30 alt=3>" \
+ "<img src=/doc/htdig/images/button4.gif border=2 align=middle width=30 height=30 alt=4>" \
+ "<img src=/doc/htdig/images/button5.gif border=2 align=middle width=30 height=30 alt=5>" \
+ "<img src=/doc/htdig/images/button6.gif border=2 align=middle width=30 height=30 alt=6>" \
+ "<img src=/doc/htdig/images/button7.gif border=2 align=middle width=30 height=30 alt=7>" \
+ "<img src=/doc/htdig/images/button8.gif border=2 align=middle width=30 height=30 alt=8>" \
+ "<img src=/doc/htdig/images/button9.gif border=2 align=middle width=30 height=30 alt=9>" \
+ "<img src=/doc/htdig/images/button10.gif border=2 align=middle width=30 height=30 alt=10>"
+
+ # location of the current directory when it is installed.
+search_results_header: /var/www/htdig/header.inc
+search_results_footer: /var/www/htdig/footer.inc
+search_results_wrapper: /var/www/htdig/wrapper.html
+nothing_found_file: /var/www/htdig/nomatch.html
+syntax_error_file: /var/www/htdig/syntax.html
Index: include/search.inc
===================================================================
RCS file: /home/staff/zs/imp/w3/include/search.inc,v
retrieving revision 1.1
diff -u -r1.1 search.inc
--- include/search.inc 1998/11/19 07:54:57 1.1
+++ include/search.inc 2000/02/15 05:14:47
@@ -18,9 +18,9 @@
<option value=builtin-short>Short
</select>
</font>
-<input type=hidden name=config value=htdig>
+<input type=hidden name=config value=htdig-mercury>
<input type=hidden name=restrict value="">
-<input type=hidden name=exclude value="">
+<input type=hidden name=exclude value="mailing-lists">
<br>
Search:
<input type="text" size="30" name="words" value="">
Index: mailing-lists/include/search.inc
===================================================================
RCS file: /home/staff/zs/imp/w3/mailing-lists/include/search.inc,v
retrieving revision 1.1
diff -u -r1.1 search.inc
--- mailing-lists/include/search.inc 1998/11/19 07:55:07 1.1
+++ mailing-lists/include/search.inc 2000/02/15 05:15:22
@@ -16,8 +16,8 @@
<option value=builtin-short>Short
</select>
</font>
-<input type=hidden name=config value=htdig-mlists>
-<input type=hidden name=restrict value="">
+<input type=hidden name=config value=htdig-mercury>
+<input type=hidden name=restrict value="mailing-lists">
<input type=hidden name=exclude value="">
<br>
Search:
----
Peter Ross
PhD Student University of Melbourne
http://www.cs.mu.oz.au/~petdr/
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to: mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions: mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------
More information about the developers
mailing list