The Professional Development Institute
Harvard University Global System
Harvard® Planner Group

Strategic Intelligence Briefs
Search Engines: How do They Work?

These notes complement Alain Paul Martinís book Harnessing the Power of Intelligence,
Counterintelligence and Surprise Events
published in 2002 by

Most search engines are a hybrid of at least three expert systems: a gatherer, an indexer and an extractor.(1)

    Electronic Gatherers or Crawlers

    The gatherer visits websites, scans for content, links, meta-tags and images and electronically fingerprints information objects excluding duplicates and objects from ineligible sites (by policy). With millions of sites to visit, gatherers work in parallel to speed-up delivery. They are also called spiders, crawlers or knowledge robots. It takes several days for a search engine to identify new links and web pages.

    Electronic Indexers

    The indexer takes over to classify content and map it with context to create a database called a catalog or a knowledge repository. It cross-references content and other objects by keyword, titles, format, source, dates, and other attributes specific to the search engine. Some connect related objects using background knowledge like synonyms, clusters like SIC classification or drill-down hierarchical lexicons. In addition to these cross-references, smart engines also map content to subject matter, company, geographic location, people or other contexts and fuzzy sets. This clustering can create rich content/context relationships. Indexers organize and store the product in the search-engine databases. They also work in parallel and use the frequency of past searches collected by extractors to create new pointers, and build redundancy in high-density search areas to alleviate search-engine traffic jams. Note that creating an index is a complex task even for humans whose experience and educational background influence the keywords, phrases and topics to index. The electronic indexers play it safe on keywords, but have a long way to go on subject matter cataloguing.

    Extractors and Knowledge Brokers

    Sometimes called knowledge brokers, extractors interact with users, validate requests to signal potential errors (typos, dates, Boolean operators), suggest correct wording, search and match the query with the right indexed content. They rank the results and report a summary of the findings along with URLs and links for source tracing and detailed browsing. Smart extractors keep track of various search patterns, compositions, frequencies, hit rates and other statistics to optimize the total performance of the search-engine.



1. Words in plain English like gatherer (or electronic gatherer) are easier to understand and recall than exotic terms like spider, crawler, scooters, bots or search robots. Likewise, the term extractor has a higher power of designation than the academic phrase "knowledge broker."

USA  Cambridge, MA, USA.  1-800-HARVARD or +1-819-772-7777
Monday through Thursday: 9 AM to 4:00 PM, Eastern Time. Voicemail: 24 hours 7 days
Canada  Ottawa, ON, CANADA
International  Worldwide Order Center & Main Training Campus: 70 Technology Boulevard
Gatineau, QC J8Z 3H8 CANADA 1-800-HARVARD or +1-819-772-7777
Monday through Thursday: 9 AM to 4:00 PM, Eastern Time. Voicemail: 24 hours 7 days
Agendas Harvard et instruments de gestion Harvard en France: DÔŅĹmarche Harvard University Global System  European Distribution Centre for Harvard Planners: WH Smith, 248, rue de Rivoli, Paris,75001
Dorothée Ben Tahar: +33 1 44 77 88 99 Extension 1 (Stationery). Concorde Métro Station.

     Contact us: E-mail, comments, questions or requests for information, please click here.

     Secure Shopping and Privacy Policy
     Return Policy
     Copyrights, Patents, Trademarks and Other IntellectualProperty Terms & References

     Thank you for visiting us at