RuterSearch Documentation
RuterSearch Perl version 3.5
Programming Language: Perl, version 5; Also available in Php
Database support: MySQL and Text Delimited File
RuterSearch, RuterSoft, all code; copyright © 2001; Weston Ruter
RuterSearch homepage: http://www.ruter.net/soft/rutersearch/
Contents:
-
History
-
Features
-
Installation
-
Configuration and Customization
-
Running RuterSearch
-
Troubleshooting
Awhile ago, before I was a Perl programmer, I wanted to start a search engine. I searched and
searched, but I couldn't find one that I could afford and one that I liked. So I settled for a lesser
search engine (for $50) that I thought would serve my needs. Since I was not a programmer, I had
no idea what features a search engine should have, and I paid the price for my naivety. That
search engine was pathetic, a total disappointment, and a rip off. My mistake.
So when I became a Perl programmer, I started RuterSearch. I wanted it to be better than any
other, and cheaper. So I incorporated everything that I would have wanted, and what I anticipate
you need. I have spent more time on this project than any other, and I hope it shows!
History
v1.0 Perl
Winter '99
|
- Simple search routine developed. Importance based on the matched word's location in the entry. (title, description, etc.)
- Addition program for easy addition of new entries
- Very easy customization of design in search and search result pages using substitutions
- CSS used in layout of results
- "Previous/Next page of results", "End of Results", and "No matches" messages scripted.
- Helpful error messages
|
v2.0 Perl
Spring '00
|
- Syntax for requiring words. (e.g.. +required)
- Search History for keeping track of what the user has searched for.
|
v3.0 Perl
v1.0 Php
Summer '00
|
- Radical redesign of search. Importance based on number of matches (including all sections) and locations in an entry.
- Numerous new syntax features like quoted phrases ("a phrase"), required words (+required), required not words (-not), section restrictions (title, description, etc.) and wildcards searches (e.g. Ruter*, ?uter, *zilla, etc.).
- Styling of matched words
- Total customization with result template (resulttemplate.html)
- Administration program for easy configuration of RuterSearch (no editing of code)
- Very simple installation program.
|
v3.5 Perl
v1.5 Php
Winter '00
|
- A couple of small bug fixes
- MySQL database support added
- Installation program integrated into administration
- Administration completely re-done. Search history, entry management, documentation, and more can all be accessed through administration
- Website Spider for the quick addition of many websites
- Documentation greatly expanded
|
v4.0 Perl
v2.0 Php
(coming up) |
- Entry editing program
- Done in 3.5: Website spider
- Error help: online troubleshooting pages
- Date restricted searches
- New additions page
|
Features
-
Installation program: Sometimes large scripts are hard to install, especially for a novice.
So I devised an installation program to automatically setup files for use on your server. All you have
to do is upload all files and folders in the RuterSearch archive. Additionally, if you are using a UNIX type
system you will need to edit the path to perl on the first lone of "admin.cgi" and chmod 755 admin.cgi.
Detailed instructions on how to install RuterSearch can be found here.
-
Administration: Many search engines require you to edit code in order for everything to be
configured just how you want. Since most people are not programmers, this can be an unfriendly task.
Therefore I created an Administration program to handle most of the configuration for you.
-
Search syntax: RuterSearch has more syntax features than even Yahoo! Here is the syntax for
searching RuterSearch:
-
Phrases: To search for a certain sequence of words (a phrase), set it off with double quotation marks.
To RuterSearch, phrases work much like regular search words.
Example: "Spanish food"
-
Required words: To require that a word to be in a entry, put a plus sign before the word.
Examples: +Mozilla, +"perl scripts"
-
Rejected words: To reject an entry that has a certain word, put a minus sign before the word.
Examples: -Microsoft -"Bill Gates"
-
Section restrictions: Website entries in RuterSearch are divided into sections, which are:
title, description, url, keywords, and author (also date and type). As default, a search is performed on every section of an entry.
To restrict searching of a word to a specific section of an entry, enter the name of the section
or the first letter of the section, followed by a pair of parentheses, and within the parentheses,
enter the search query. You can use different section restrictors together. Examples: title(Mozilla) and t(Mozilla); u(netscape.com) and url(netscape.com);
etc... searchable sections are: title, description, url, keywords, and author.
-
Wildcards: There are two wildcards, "?" and "*", same as Windows and Unix/Linux. A question
mark "?" matches a single character. An asterisk "*" matches multiple characters.
Examples: (e.g. Ruter*, ?uter, *script, ?ozill*, etc.)
-
Summary: All syntax features can be used together to create complex searches. Here
is the order in which you can combine syntax:
- section restrictions
- required and rejected words
- words and phrases. Examples Below
- +elections -local keywords(+politic*)
- title(+browser -"Internet Explorer" -IE) Netscape Mozilla
- +perl -C++ title(RuterSoft) author("Weston Ruter")
-
Substitutions: One feature that everyone needs is
customization. Every page needs to look exactly how you want and substitutions allow you to
do that easily and effectively. This is how substitutions work: Lets say you want to
customize the search results page, you would open html/searchresults.html in your favorite
html editor and design it however you want by incorporating substitutions such as _totalentries_
and _totalresults_ (see below for more) where you want them to be displayed. When a search is
performed, the search engine would open searchresults.html and substitute the substitution with
the real value. There are five pages that are customizable in this way, and they are search.html,
searchresults.html, resulttemplate.html, add.html, and additionresults.html. search.html
is the search home page which is displayed when no search has been made. searchresults.html and
resulttemplate.html are used together. searchresults.html is the page where the results
are displayed, and the code in resulttemplate.html determines what each result looks like. More information about
customizing the look of RuterSearch can be found here.
All substitutions are case insensitive. Examples: _totalentries_ replaced with 150; _totalresults_ replaced with 12; etc... Here is a list of all Substitutions:
Search results page searchresults.html: Where the results are displayed
|
_totalentries_
|
Total number of entries in your database. |
|
_totalresults_
|
Total number of results from the search. |
|
_nextpageurl_
|
Url of the next page of results. Javascript alert if at end |
|
_nextpage_
|
Returns 'true' if there is another page of results, and 'false' if
not. |
|
_previouspageurl_
|
Url of the previous page of results. Javascript alert if at beginning |
|
_previouspage_
|
Returns 'true' if there is a previous page of results, and 'false'
if not. |
|
_stylewords_
|
Returns 'true' if styling of words was used , and 'false' if not. |
|
_perpage_
|
The number of results displayed per page |
|
_currentpage_
|
The page of results that is being displayed |
|
_searchquerytext_
|
Exactly what the user searched for in plain text. Not recommended for
use in html tags. |
|
_searchqueryhtml_
|
What the user searched for, but with some characters changed to html
character entities. " to " etc... This should be used most often.
Doesn't cause problems with html tags. |
|
_searchqueryencoded_
|
What the user searched for but url encoded. Example: %2BMozilla+does+%22X+M+L%22 |
|
_searchtype_
|
File type that was searched for. E.g. html |
|
_searchresults_
|
Displays all the results. Either in a predefined format or the way
you defined in "resulttemplate.html". See administration |
Result template resulttemplate.html: Template for displaying results.
Must be turned on in administration
|
_date_
|
Date result was added/modified |
|
_type_
|
File type of result |
|
_title_
|
Title of result |
|
_description_
|
Description of result |
|
_url_
|
Url of result |
|
_keywords_
|
Keywords of result |
|
_author_
|
Author of result |
Addition page: Where the user inputs the new entry
|
_totalentries_
|
Total number of entries in your database. |
Addition results page: Results of addition
|
_totalentries_
|
Total number of entries in your database. |
|
_date_
|
Date entry was added/modified |
|
_type_
|
File type of entry |
|
_title_
|
Title of entry |
|
_description_
|
Description of entry |
|
_url_
|
Url of entry |
|
_keywords_
|
Keywords of entry |
|
_author_
|
Author of entry |
-
Styling of matched words: Another option I built in is the styling of matched words.
If turned on (either by user or by administration), a word that was matched will be highlighted in
the way you specify. To change the style of matched words, look at the class .matchedWord in
the <style> tag of searchresults.html. More info
-
Databases: RuterSearch comes in two versions, one with MySQL support and one without. But both support a Text Delimited Database File which was the only available database type in earlier versions. Entries can be added and deleted from Administration.
-
Query record: From Administration, a record of search queries and other information about the user can be viewed by clicking on the link labeled "Search History".
-
Addition page: This is designed to simplify the addition of the new entries. From Administration, you can password protect access to the addition page with the Administration password.
-
Website spider: The website spider simply parses out all links in an HTML file. It then visits each of those URL's and gets information such as the title, description, keywords, and author. And finally it adds each qualified entry to the database. From Administration, you can password protect access to the spider with the Administration password.
Installation
As of RuterSearch Perl version 3.5 and RuterSearch Php version 1.5, the installation program has been
integrated into the administration program admin.cgi. Here are instructions for installing RuterSearch:
- Extract the files from the downloaded archive to a temporary folder.
- Upload all files and folders to the desired location on your server, most likely the CGI-BIN for Perl scripts. Usually this is done with an FTP client such as WS_FTP.
- For UNIX type systems, you will need to modify the first line of admin.cgi to reflect your path to perl. Most likely the first line will need to look like one of the following:
#!/usr/bin/perl -w
or
#!/usr/local/bin/perl -w
If these paths do not work, you can type in which perl into a Telnet prompt or just ask your system administrator.
-
- Additionally for UNIX type systems: set install.cgi's permissions to execute (Linux/Unix command CHMOD 755).
Usually your FTP client will have CHMOD ability built in and you should be able to find it. You need to set
admin.cgi's permissions to CHMOD 755 which means all execute, owner write, and read.
If your FTP Client doesn't have CHMOD built in, you can simply type the command directly into a Telnet prompt. An example of this is:
Login: johnsmith
Password: (password hidden)
[johnsmith@245.27.1.120 johnsmith]$ cd http-home/rutersearch
[johnsmith@245.27.1.120 rutersearch]$ chmod 755 admin.cgi
|
-
Now run admin.cgi from your browser and it will setup all files that will be used by RuterSearch.
Configuration and Customization
-
After you install RuterSearch, you can configure it however how you like.
And the best way to do this is through the RuterSearch administration program. Login to
admin.cgi with your password (the default password is password).
-
Basically everything that you might need to do to configure RuterSearch, you can do from within
Administration. But customizing the page look is a bit different. Below I will explain:
Customization
The way RuterSearch displays pages is it that when it is run, it takes in specific HTML pages that you have created, and then it substitutes
data onto specific locations on that page that you specify. This is done through the use of
substitutions. By opening search.html (or any page listed below) in a text editor, you can design the search home page
to look exactly how you want. Simply write HTML as you would normally
for your website. When you want to display the total entries in the database, you would insert
_totalentries_ into the HTML page where you want that text displayed. For more information
about substitutions and a list of all substitutions for each HTML page,
click here.
Below is a list of each editable page with descriptions.
- search.html - If someone accesses search.cgi, the search program, without searching for anything, this page is displayed. It is essentially a home page for the search engine.
- searchresults.html - This is the page that displays the results of a search. Design it how you want it to look then insert _searchresults_ where you want to display the results of the user's search. Each returned result can be customized as well, which is done through editing the following file:
- resulttemplate.html - When a search is performed, this file is used as a template for the search engine to display each result. All results are appended to the substitution _searchresults_, which is then displayed in the previous file, searchresults.html.
- add.html - If you have decided to not restrict access to the addition page, this page will be useful if you want to enable users to add websites to the database.
- additionresults.html - This page shows the results of an addition to the database.
To illustrate how to use substitutions, look at the following example which will display a message like "There are 125 entries in the database."
<html>
<head>
<title>My Search Engine</title>
<style type="text/css">
body {
font-family: Arial, Helvetica;
}
</style>
</head>
<body>
Welcome to my search engine. There are _totalentries_ entries in the database.
</body>
</html>
|
The search results page is a bit more involved. In the <head> of searchresults.html, there is a <style> tag which contains information on how to display certain elements.
The most important style is the .matchedWord class which tells the user's browser how to highlight a matched word (if that option is enabled). The following code in the <style> tag would highlight matched words in yellow.
<style type="text/css">
.matchedWord {
background-color: yellow;
}
</style>
|
The user needs to be able to navigate back and forth through the results that have been found. This navigation can be appended to the _searchresults_ substitution, or you can create your own navigation by using JavaScript and substitiutions together. The searchresults.html that comes with RuterSearch uses the JavaScript created navigation.
Knowledge of HTML, preferably CSS, and possibly JavaScript, is required. But do not worry! These are very simple languages to learn. Pick up a book, or visit the W3C for official specifications.
Running RuterSearch
-
To run a part of RuterSearch, do not open the HTML files for they are only templates.
Rather you need to run the CGI files, which are the program files. Following are all
the files that can be executed:
-
search.cgi - Without any query, this will display the main search page (search.html).
When a search query is made, searchresults.html is displayed with results displayed in the form of
resulttemplate.html.
-
add.cgi - Run this program to add entries to the database. Without any query, add.html is displayed.
When an addition is made, additionresults.html is displayed.
-
add.cgi - This is another way to add entries to the database. By suppling a URL, the spider will
visit that page and add each link to the database. The spider design cannot cannot be customized as add.cgi and search.cgi can.
-
admin.cgi - This allows you to make changes to RuterSearch.
-
config.cgi - This is not a program, but rather it is the configuration file. It ends in the extension "cgi" to prevent it from being viewed by people on the web.
Troubleshooting
|
Error Code
|
Description and Remedy
|
|
000
|
Unable to open configuration file. Either run administration (admin.cgi) or reinstall the program.
|
|
001
|
The database file cannot be found. Please verify that you have the correct filename entered in RuterSearch Administration
|
|
002
|
The database type is invalid. Please reset the the database type to "text" or "MySQL" in Administration.
|
|
003
|
The search page (search.html) cannot be found. Please make sure that there is a file names search.html in the "html" folder. If not you can extract a new one from your RuterSearch distribution archive.
|
|
004
|
The result template file 'html/resulttemplate.html' could not be found. Make sure that it is in the folder 'html', or you may need to reinstall.
|
|
005
|
Was not able to add to 'queryrecord.txt'. You may need to set the permissions to 'queryrecord.txt'. Make sure the permissions are for read and write (chmod 666).
|
|
006
|
Could not open search results file 'html/searchresults.html'. Make sure that the file is there, if it isn't, you should upload another.
|
|
007
|
Unable to open the file 'html/add.html'. Make sure that the file is there, if it isn't, you should upload another.
|
|
008
|
Unable to open the file 'html/additionresults.html'. Make sure that the file is there, if it isn't, you should upload another.
|
Validates as W3C XHTML 1.0