A search engine is a program which accepts a query of interest from a user and returns one or more pages that best correspond to the query. The search process involves broadly, a sequence of four steps: query refinement, index search, match evaluation, and presentation of results. Each of these steps is itself comprised of steps. The objective of a search is ideally to present all results matching a query and only those results that match the query.
Because of the dynamic nature of much of the content at this web site, a full-text search would be impractical. So queries are matched to a static representation of site content which is referred to as the site index. Presently, this index has been manually developed with some automatic assistance.
The step of query refinement attempts to improve a query before matching it to content. Query refinement involves the following steps:
Deleting stoplist words - A stoplist is a list of words which are so common in a language that they are usually considered to be the least meaningful words in content. These high-frenquency, low-content words can be deleted from a query except where they occur in a phrase. (They are retained in a phrase because the phrase is meaningful as a unit so that removing any stopword from a phrase can destroy the meaning of the phrase as a whole.)
Stemming of plurals - Stemming replaces the plural form of a word with its singular form.
Substituting with preferred words - Substituting replaces all forms of a word with a standard form.
Synonymizing words - Synonymy replaces words having the same meaning.
Hypernymizing words - Hypernymy replaces words of the same hypernym with that hypernym.
The step of index search matches a refined query to a site index as efficiently as possible. This search engine integrates query refinement and index search steps: A step of refinement is performed and a search is done. If the search is successful, the results are evaluated. If the search is not successful, another step of refinement is performed and the search repeated.
The step of match evaluation ranks the results of an index search. If only one result is yielded, no evaluation is needed. If more than one result is yielded, the results are scored and the highest scoring result is presented. If more than one result is scored highest, then the highest scoring results are presented together.
The step of result presentation decides the most useful way to show the user what content was found. At this time, only the best choice among all matches (if more than one match occurs) is returned to the user by displaying the page corresponding to the best match. If no match occurs, the query is sent to Google and the first page of results from Google is displayed to the user.