Many sites are now generated dynamically from databases, including ecommerce catalog sites, online news, and even entertainment sites. These databases generally have their own search functions, which may appear to take the place of a full-text search engine. But that's not always the case. Database search is not oriented towards text search and relevance ranking: they are great for locating widgets by part number and listing the inventory of leather slippers, but not so good at helping site visitors find the articles on widget quality or comparing leather and fleece slippers. Text search engines are often the right solution, even for database-backed sites.
Database search vs. text search
Searching Many Fields and Tables
Databases store their information neatly organized into fields, such as product name, category, description, price and so on. However most people don't like to choose a field before searching: who knows whether widget is in the name, category, or description field? While databases can set up complex queries to find the search words in all applicable fields, this makes them slower to respond, requires more memory, and is more difficult to program. Text search engines store this information in one index and can find words in any field for a record. Many high-end search engines can also store field information, so searches can be limited to a specific field as well.
Some databases, such as MySQL, are limited to "Or" searches -- they will return all records which match any of the search terms. Others default to searching on exact words or phrases in most databases: in these cases, searchers must type in complex Boolean or SQL (Structured Query Language) commands for other cases.
While some programmers and librarians enjoy the control that query languages give them, most people do not. This is particularly important for searching multiple words: if someone types in brown bear, they probably want to see records with bear, brown in them as well, without typing bear AND brown! Search logs show clearly that most people will not do anything complicated while searching -- they will give up if the search is too hard to use. Full-text search engines offer simple and flexible search options, with most providing an Advanced Search feature for power users.
Flexible Search Morphology
Database search functions tend to look for exact matches in capitalization and characters. If someone searches for pokemon, they won't find records with Pokémon in them. Many text search engines will automatically match lowercase searches with any text, and will adjust for extended and diacritical characters. Some also allow search administrators to set up synonyms for searching (doctor and physician), or automatically perform stemming (find octopi when searching for octopus), fuzzy matching (find colour when searching for color), and sound-alike matching (find Dr. Koop if someone searches for Dr. Coop).
Some databases even return a list of everything they have if a user enters an empty search string -- this requires significant server processing time. Search engines tend to just report that they found no matches.
Many databases are not designed for easy searching. In a wine database, for example, it may be difficult to find wines from a certain region, such as the Napa valley of California, because the location is stored in a table far away from the wine names. Search engines, such as Ripfire, AltaVista, and Inktomi, which bundle related information during indexing, provide a rich data set for searching. Mercado's Intuifind provides options to entirely rearrange a database structure, if that's what's necessary for searching.
Response Time and Database Resources
Databases are optimized to search for exact words and phrases, and they tend to respond very slowly otherwise. So if a searcher wants to find sheepskin or shearling in the same search, databases will tend to do two searches and then merge the results. Full text indexes are designed to store these words together, so a search engine can perform these kinds of searches efficiently and return quickly.
In addition to the time, searching databases requires additional back-end and server resources. By storing a search index on a separate server and searching that rather than the live data, a text search engine can perform queries without additional demands on the database itself.
Results Per Page
Many database search engines will happily display all results on the same page, whether there are 8 or 8,000 records. Text search engines have a mechanism for dividing up the results and providing navigation from the first page to following pages (and back).
Sorting by Relevance
Text search engines sort by relevance, as determined by the number and location of matched words in the result page or record. Database search functions sort by size, or price, or date, or the order in which the items were entered in the database!
Many text search engines can sort results by date as an option, and a few of them can sort by price, size, geographic location, etc.
To add a text search engine to a database-generated site, the engine can connect to the database directly, using SQL, ODBC or JDBC. Or it may crawl the pages generated, viewing the pages as a browser would, in which case it needs a listing of the pages or a site map with links to follow. The indexer should be set to ignore navigation and other non-product data, and update frequently, preferably with a list of new URLs rather than a full crawl.
Search Engines with Database Interfaces
- Albert web - interfaces to Oracle, Sybase, mySQL, PostgreSQL, Informix, MS-SQL Server, ODBC and dBase formats
- AltaVista Search - JDBC, turnkey and SDK versions
- Aurweb combines database and free-text search.
- EasyAsk - All ODBC Data Sources; native SQL for Oracle, Sybase, DB2, Informix and SQL Server; supports Flat Files and Database Views
- Excalibur RetrievalWare and Web Express - native bridges to Oracle, Sybase, Informix and MS SQL, ODBC bridge
- dtSearch Web - ODBC
- interMedia (Oracle text search engine)
- Isys:web - ODBC
- Lotus Domino Search - direct connection Notes databases, ODBC
- Mercado IntuiFind - ODBC, JDBC and more
- MondoSearch - Microsoft Commerce Server partner
- MPS Information Server - various external parsers
- Muscat - Oracle gateway
- ReachCast ReachSite - Oracle 8i integration
- Verity - ODBC, Sybase has a special API
- Thunderstone Webinator and Texis RDBMS - text search integrated into high-end database software