Colin J. Ansell
" The Future Just Arrived "
We Teach the People, Who Em-Power the Nation.
Providing a Sustainable Income through Education.
Internet Marketing Skills
Second Generation Searching on the Web
Updated: March 2006
This tutorial covers some of the newer search engine services on the Web. It includes a group of search services that make use of technology that organizes search results by peer ranking, or clusters results by concept, site or domain. This is in contrast to the more long-standing method of term relevancy ranking. This newer type of ranking often looks at "off the page" information to determine the retrieval and order of your search results. Search engines that employ this alternative may be thought of as second generation search services. For example:
- Google ranks by the number of links from pages ranked high by the service
- Guidebeam organizes results by keyword and/or concept
Here are a few of the trends to watch with second-generation services:
- The human element: concept processing. Second generation services such as Ask Jeeves and SurfWax apply different kinds of concept processing to a search statement to determine the probable intent of a search. This is often accomplished by the use of human generated indexes. With these services, the burden of coming up with precise or extensive terminology is shifted from the user to the engine. These services are therefore taking on the role of thesauri.
- The human element: horizontal presentation of results. Most search tools return results in one long, vertical list. In contrast to this, there is a group of search tools that use concept processing to return results in a horizontal organization. With these tools, you can first review concept categories retrieved by your search before examining the results within particular categories. This can make it easier to zero in on the aspects of your topic that interest you. Examples of these tools include Guidebeam, Query Server, and Vivisimo.
- The human element: peer ranking. Search services such as Google and Teoma derive their results from the behavior and judgment of millions of Web users.
- The human element: directories. First generation search services have gotten into the act by partnering with second generation services and/or including content from human gathered directories with their search results to supplement documents retrieved from the spider-indexed Web. Examples include AltaVista, Lycos and many others.
For a tutorial covering the more basic aspects of Web search engines, including first generation search engines, see Searching the Internet: Recommended Sites and Search Techniques.
Search engines covered in this tutorial
Exercise: Retrieving results by link ranking using Google.
Google - http://www.google.com/
Google ranks results by the number of links from pages ranked high by the service. The more highly ranked pages that link to a certain page, the higher the linked-to page will be ranked by Google. This unique ranking system can be quite effective.
When to use Google? When you are looking for a specific site or targeted topic and want to take advantage of Google's excellent page ranking technology.
Special Features:
- Returns results ranked by the number of links from pages ranked high by the service; high ranking pages are also determined by the number of links to them
- In determining relevancy ranking, the engine also looks at various textual clues including linking text
- Search results include sites from the Open Directory Project, offering an interesting mix of sites from the wider Web and those chosen by editors for inclusion into the directory. See also Google's own version, the Google Web Directory.
- Requires no syntax: simply type keywords and Google defaults to the Boolean AND with term proximity
- OR searching is supported if "OR" is typed in CAPS, e.g., university OR college; works only with multiple single words
- Attempts to return results in which multiple query words are in close proximity within the source document
- For more refined searches, use quotations for phrases ("El Nino") or a minus sign (-) for the Boolean NOT
- Engine does not stem words; it searches on your word form exactly as it is typed
- Results include the text from the source document that matches your query
- I'm feeling lucky option returns the top-ranked source for a query
- Offers searching of Web pages in a number of languages; and the Google site can be set to display only the tips and instructions in a different language
- Offers a spell check operation. Example: spell:priviledge
- Displays links to news headlines when they are relevant to a search
- Searches the deep Web for such information as:
- Files in Portable Document Format, Microsoft Word, Excel, and PowerPoint, Rich Text Format and PostScript
- Images, from the Advanced Web Search interface or from Google Image Search
- Maps from Yahoo! or MapBlast (enter an address)
- Phone book entry (enter first and last name, and city or zip)
- Stock prices (enter a comapny's ticker symbol)
Drawbacks:
- New Web pages will not appear in your results, as it takes time for the creators of other Web pages to link to new resources, and for this activity to be reflected at Google
Query: I'd like to learn more about Richard Nixon's resignation.
Search:
- Type: Nixon resignation [Google! defaults to Boolean AND logic]
- Examine results for relevancy
- Note the related categories from the Google Web Directory listed at the top of the results screen
If you like Google, try Teoma. This search engine also uses link ranking to determine relevance. The difference: Teoma looks for links on pages in subject-specific "communities" that are about or related to the same subject. This is called Subject-Specific Popularity. It is interesting to run the same search on Google and Teoma and compare the differences.
Guidebeam
Exercise: Grouping of results into concept folders with Guidebeam
Concept grouping engines offer results in a horizontal layout. This means that you can first review concept categories retrieved by your search before examining the results within a particular category. This is in contrast to the more common vertical layout of results, in which you are presented with one long list. In this case, you need to examine each site one by one to determine if it relates to the aspects of the topic that interest you.
Northern Light had led the pack in this category of search tools, but its free engine has been discontinued. Other tools of this type include:
- Guidebeam, which returns a daunting number of categories
- Query Server, a meta engine that also searches certain deep Web source
- Vivisimo, a meta engine that returns a list of categories that are further organized into subtopics
In this tutorial, we will discuss Guidebeam.
Guidebeam - http://guidebeam.com/
Guidebeam organizes results into categories that represent concepts and/or types of sites. With this type of arrangement, you can ignore the categories that are irrelevent and choose those that fit your query best. This may be more convenient than working through one master list of results.
When to use Guidebeam? When you are researching any topic and want an organized grouping of component subtopics and document sources e.g., keywords, a particular site related to your topic, etc.
Special Features:
- Sorts search results into categories based on keywords
- Within categories, a new group of categories is generated that represent narrower concepts
- Green categories are at the final level of detail; selecting them will take you directly to the search results
- Takes its results from another search engine index rather than employing its own crawler and index
Drawbacks:
- Sometimes changes the source from which it dervies its results; lately has been using Yahoo!, a directory with inadquate quality control for researchers
- Categories are not always well organized
- The number of categories can be overwhelming for certain topics
Query: I'd like to learn more about housing discrimination.
Search:
- Type: "housing discrimination"
- Note the green categories at the top of the page. If you select one of these, you will be taken directly to the search results.
- Select one of the blue categories near the bottom of the page. Note that a new group of categories is generated, based on the original topic.
SurfWax
Exercise: Concept searching with SurfWax
SurfWax - http://www.surfwax.com/
SurfWax is a meta engine that offers options to see a quick view of the content of sites in your search results list, along with search terms to broaden or narrow a subsequent search. It has a somewhat busy interface, but it offers much to the user that is worth exploring.
When to use SurfWax? When you are looking for a specific site or targeted topic and want help choosing search terms as well as a content summary of sites retrieved from your search.
Special Features:
- Offers "SiteSnaps" that display summaries of retrieved sites including Author Description, Key Points, Emphasis and FocusWords
- FocusWords may be chosen to be added to your Personal Searcher for a future search [Note: you must have Preferences set to turn on personalization]
- Focus feature may be applied to your search terms, allowing you to choose broader or narrower search terms to apply to subsequent searches
- Various personalization options are available
Drawbacks:
- Site has a learning curve for first-time users
Query: I'm interested in learning about discrimination.
Search:
- Type: discrimination
- Choose a site from your results list by clicking on the magnifying glass icon
- Explore the information on the right side of the screen. Note the list of Focus Words.
- Choose a Focus Word that you would like added to your search. Click on the word. Notice that it has been added to your search box.
- Click on the Search button to run a new search
There is another way to get additional terms into your search box:
- "Focus" your original search term. Click on the small arrow icon next to "Focus: discrimination" located underneath the search window. A list of related terms will appear. Clicking on subsequent arrow icons will focus the chosen term.
- Explore a term that interests you. Click on a term to add it to your search statement.
- Click on the Search button to run a new search
Go to SurfWax to try this search.
Ixquick
Exercise: Tapping into the ranking schemes of several engines with Ixquick
Ixquick - http://ixquick.com/
Ixquick is a meta search engine that searches multiple engines and directories and returns only those documents that appear in the top 10 of any search results.
When to use Ixquick? When you are looking for a specific site or targeted topic and want to see the most relevant results as ranked by multiple engines and directories on the Web.
Special Features:
- Returns the most relevant results as ranked in the top 10 by a number of individual sources
- Uses a "star" system whereby the number of stars indicates the number of sites ranking each result in the top 10
- Shows the sources that have ranked the page and the placement within the top 10 list, e.g., Google (1)
- Offers a variety of search options including full Boolean, implied Boolean, natural language search, truncation, case sensitivity and field searching; Ixquick sends your query to the engines that support these options
- Also searches for news, MP3 music files and pictures
Drawbacks:
- Because it offers only the top 10 results from any source, obscure sites will not appear in its results
- Some search syntax options do not work well, i.e., natural language searching is an option but the results are not necessarily successful
Query: I'm looking for good Web sites about Mozart.
Search:
- Type: Mozart
- Examine results for relevancy
- Note how, without concept processing, different meanings of the term "Mozart" have been returned. Of course, many results relate to the composer.
TracerLock
Exercise: Storing Queries for regularly updated results with TracerLock
TracerLock - http://www.peacefire.org/tracerlock/
TracerLock is a service that saves your search query, process it at regular intervals, and e-mail you when new pages are found containing your search terms. The service is, and requires users to register. Note that The Informant service has merged with TracerLock and is no longer a separate service.
When to use TracerLock When you are researching any topic and want to keep up with the newest documents on your topic.
Special Features
- Stores Boolean search statements for regular processing
- Searches AltaVista every night for pages matching your terms that were indexed by AltaVista on or before the date three days ago, and on or after the date stored with your search terms; the date is adjusted daily to keep the search fresh
- You can reset the search date window at any time
- The first ten results are sent to you by e-mail
Exercise: TracerLock
To store a query, go to each site and choose a userid and password. When you log in, follow the instructions for formulating and storing a query.
Note: Karnak is another similar service to try. The free service will track only one query, but the premium services will track multiple queries. A strength of this service is the significant number of sources which it uses to locate documents.
Updated: March 2006
Colin J.Ansell
All Rights Reserved for the full content of this site through formal copyright registration.
©1992-2006 Colin J. Ansell.