[m-dev.] diff: make htdig self-contained

Peter Ross petdr at cs.mu.OZ.AU
Tue Feb 15 16:21:41 AEDT 2000


Hi,


===================================================================


Estimated hours taken: 1

Changes to make htdig more self contained.

htdig/htdig-mercury.conf:
    Keep the htdig config file under source control.

htdig/Makefile:
    Install the new config file.

include/search.inc:
mailing-lists/include/search.inc:
    Changes to use the new configuration file.


Index: htdig/Makefile
===================================================================
RCS file: /home/staff/zs/imp/w3/htdig/Makefile,v
retrieving revision 1.2
diff -u -r1.2 Makefile
--- htdig/Makefile	1999/12/06 23:57:22	1.2
+++ htdig/Makefile	2000/02/15 05:13:30
@@ -21,7 +21,7 @@
 		header_and_footer.html
 
 local_install:
-	$(CP) header.inc footer.inc $(INSTALL_WEBDIR)/
+	$(CP) htdig-mercury.conf header.inc footer.inc $(INSTALL_WEBDIR)/
 
 local_clean:
 	rm -f header.inc footer.inc
Index: htdig/htdig-mercury.conf
===================================================================
RCS file: htdig-mercury.conf
diff -N htdig-mercury.conf
--- /dev/null	Wed May 28 10:49:58 1997
+++ htdig-mercury.conf	Tue Feb 15 16:19:07 2000
@@ -0,0 +1,135 @@
+#
+# IMPORTANT: this file needs to be linked to from the location that
+# htdig expects to find its configuration files.
+#
+# Example config file for ht://Dig.
+#
+# This configuration file is used by all the programs that make up ht://Dig.
+# Please refer to the attribute reference manual for more details on what
+# can be put into this file.  (/usr/doc/htdig/html/confindex.html)
+# Note that most attributes have very reasonable default values so you
+# really only have to add attributes here if you want to change the defaults.
+#
+# What follows are some of the common attributes you might want to change.
+#
+
+#
+# Specify where the database files need to go.  Make sure that there is
+# plenty of free disk space available for the databases.  They can get
+# pretty big.
+#
+database_dir:		/home/mercury5/htdig/
+
+#
+# This specifies the URL where the robot (htdig) will start.  You can specify
+# multiple URLs here.  Just separate them by some whitespace.
+# The example here will cause the ht://Dig homepage and related pages to be
+# indexed.
+#
+start_url:		http://www.cs.mu.oz.au/research/mercury/
+
+#
+# This attribute limits the scope of the indexing process.  The default is to
+# set it to the same as the start_url above.  This way only pages that are on
+# the sites specified in the start_url attribute will be indexed and it will
+# reject any URLs that go outside of those sites.
+#
+# Keep in mind that the value for this attribute is just a list of string
+# patterns. As long as URLs contain at least one of the patterns it will be
+# seen as part of the scope of the index.
+#
+limit_urls_to:		${start_url}
+
+#
+# If there are particular pages that you definately do NOT want to index, you
+# can use the exclude_urls attribute.  The value is a list of string patterns.
+# If a URL matches any of the patterns, it will NOT be indexed.  This is
+# useful to exclude things like virtual web trees or database accesses.  By
+# default, all CGI URLs will be excluded.  (Note that the /cgi-bin/ convention
+# may not work on your web server.  Check the  path prefix used on your web
+# server.)
+#
+# [I also made it exclude .gz, .deb, and .rpm files,
+# since there doesn't seem to be any point trying to index
+# binary files. -fjh.]
+#
+exclude_urls:		/cgi-bin/ .cgi .gz .deb .rpm
+
+#
+# The excerpts that are displayed in long results rely on stored information
+# in the index databases.  The compiled default only stores 512 characters of
+# text from each document (this excludes any HTML markup...)  If you plan on
+# using the excerpts you probably want to make this larger.  The only concern
+# here is that more disk space is going to be needed to store the additional
+# information.  Since disk space is cheap (! :-)) you might want to set this
+# to a value so that a large percentage of the documents that you are going
+# to be indexing are stored completely in the database.  At SDSU we found
+# that by setting this value to about 50k the index would get 97% of all
+# documents completely and only 3% was cut off at 50k.  You probably want to
+# experiment with this value.
+# Note that if you want to set this value low, you probably want to set the
+# excerpt_show_top attribute to false so that the top excerpt_length characters
+# of the document are always shown.
+#
+max_head_length:	10000
+
+#
+# Depending on your needs, you might want to enable some of the fuzzy search
+# algorithms.  There are several to choose from and you can use them in any
+# combination you feel comfortable with.  Each algorithm will get a weight
+# assigned to it so that in combinations of algorithms, certain algorithms get
+# preference over others.  Note that the weights only affect the ranking of
+# the results, not the actual searching.
+# The available algorithms are:
+#	exact
+#	endings
+#	synonyms
+#	soundex
+#	metaphone
+# By default only the "exact" algorithm is used with weight 1.
+# Note that if you are going to use any of the algorithms other than "exact",
+# you need to use the htfuzzy program to generate the databases that each
+# algorithm requires.
+#
+search_algorithm:	exact:1 synonyms:0.5 endings:0.1
+
+#
+# The following are used to change the text for the page index.
+# The defaults are just boring text numbers.  These images spice
+# up the result pages quite a bit.  (Feel free to do whatever, though)
+#
+next_page_text:		<img src=/doc/htdig/images/buttonr.gif border=0 align=middle width=30 height=30 alt=next>
+no_next_page_text:
+prev_page_text:		<img src=/doc/htdig/images/buttonl.gif border=0 align=middle width=30 height=30 alt=prev>
+no_prev_page_text:
+page_number_text:	"<img src=/doc/htdig/images/button1.gif border=0 align=middle width=30 height=30 alt=1>" \
+			"<img src=/doc/htdig/images/button2.gif border=0 align=middle width=30 height=30 alt=2>" \
+			"<img src=/doc/htdig/images/button3.gif border=0 align=middle width=30 height=30 alt=3>" \
+			"<img src=/doc/htdig/images/button4.gif border=0 align=middle width=30 height=30 alt=4>" \
+			"<img src=/doc/htdig/images/button5.gif border=0 align=middle width=30 height=30 alt=5>" \
+			"<img src=/doc/htdig/images/button6.gif border=0 align=middle width=30 height=30 alt=6>" \
+			"<img src=/doc/htdig/images/button7.gif border=0 align=middle width=30 height=30 alt=7>" \
+			"<img src=/doc/htdig/images/button8.gif border=0 align=middle width=30 height=30 alt=8>" \
+			"<img src=/doc/htdig/images/button9.gif border=0 align=middle width=30 height=30 alt=9>" \
+			"<img src=/doc/htdig/images/button10.gif border=0 align=middle width=30 height=30 alt=10>"
+#
+# To make the current page stand out, we will put a border arround the
+# image for that page.
+#
+no_page_number_text:	"<img src=/doc/htdig/images/button1.gif border=2 align=middle width=30 height=30 alt=1>" \
+			"<img src=/doc/htdig/images/button2.gif border=2 align=middle width=30 height=30 alt=2>" \
+			"<img src=/doc/htdig/images/button3.gif border=2 align=middle width=30 height=30 alt=3>" \
+			"<img src=/doc/htdig/images/button4.gif border=2 align=middle width=30 height=30 alt=4>" \
+			"<img src=/doc/htdig/images/button5.gif border=2 align=middle width=30 height=30 alt=5>" \
+			"<img src=/doc/htdig/images/button6.gif border=2 align=middle width=30 height=30 alt=6>" \
+			"<img src=/doc/htdig/images/button7.gif border=2 align=middle width=30 height=30 alt=7>" \
+			"<img src=/doc/htdig/images/button8.gif border=2 align=middle width=30 height=30 alt=8>" \
+			"<img src=/doc/htdig/images/button9.gif border=2 align=middle width=30 height=30 alt=9>" \
+			"<img src=/doc/htdig/images/button10.gif border=2 align=middle width=30 height=30 alt=10>"
+
+	# location of the current directory when it is installed.
+search_results_header: /var/www/htdig/header.inc
+search_results_footer: /var/www/htdig/footer.inc
+search_results_wrapper: /var/www/htdig/wrapper.html
+nothing_found_file: /var/www/htdig/nomatch.html
+syntax_error_file: /var/www/htdig/syntax.html
Index: include/search.inc
===================================================================
RCS file: /home/staff/zs/imp/w3/include/search.inc,v
retrieving revision 1.1
diff -u -r1.1 search.inc
--- include/search.inc	1998/11/19 07:54:57	1.1
+++ include/search.inc	2000/02/15 05:14:47
@@ -18,9 +18,9 @@
 <option value=builtin-short>Short
 </select>
 </font>
-<input type=hidden name=config value=htdig>
+<input type=hidden name=config value=htdig-mercury>
 <input type=hidden name=restrict value="">
-<input type=hidden name=exclude value="">
+<input type=hidden name=exclude value="mailing-lists">
 <br>
 Search:
 <input type="text" size="30" name="words" value="">
Index: mailing-lists/include/search.inc
===================================================================
RCS file: /home/staff/zs/imp/w3/mailing-lists/include/search.inc,v
retrieving revision 1.1
diff -u -r1.1 search.inc
--- mailing-lists/include/search.inc	1998/11/19 07:55:07	1.1
+++ mailing-lists/include/search.inc	2000/02/15 05:15:22
@@ -16,8 +16,8 @@
 <option value=builtin-short>Short
 </select>
 </font>
-<input type=hidden name=config value=htdig-mlists>
-<input type=hidden name=restrict value="">
+<input type=hidden name=config value=htdig-mercury>
+<input type=hidden name=restrict value="mailing-lists">
 <input type=hidden name=exclude value="">
 <br>
 Search:

----
Peter Ross
PhD Student University of Melbourne
http://www.cs.mu.oz.au/~petdr/
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list