|
Searching
for the Right Search Engine
Some
perspectives on effective mechanisms for conducting
research on the Web are included in this interesting
article by Robert Berkman, that appeared in the
Chronicle of Higher Educations January 21,
2000 issue. Dr. Berkman is the author of Find It Fast:
How to Uncover Expert Information on Any Subject, the
fifth edition of which will be published by HarperCollins
in May. He is a member of the faculty of the graduate
media-studies program at the New School University and
consults and provides training workshops on searching the
web.
Researchers now
have it all on the World Wide Web: facts on virtually any
topic, available from the far corners of the globe,
unfiltered by reporters, editors, or publishers, and
usually free. But sometimes we feel that we have too much
informationoften way too muchand that it may
not be correct.
Despite the
latest flurry of prime-time ads by search-engine vendors
boasting that they can find anything you want online,
search engines cant distinguish among Web pages
based on their contents. The only way researchers can
pinpoint information on the Web is if they learn how to do
efficient Web searches, and which engines are best for
which purposes.
One important
lesson is to understand the range of search tools now
available. Many researchers dont realize that they
can use hierarchical indexes, standard search engines,
alternative search engines, meta search engines, and
databasesand that those tools are not all the same.
In a
hierarchical indexprobably the best known is Yahoo (http://www.yahoo.
com)people trained to categorize information,
such as librarians and indexers, examine Web sites and put
them in categories and subcategories. Thus, when you do a
search on a hierarchical index, it is much more likely
that what you find will be relevant to what you are
looking for.
The drawback to
hierarchical indexes is that they are extremely selective.
Because they are created by human beings rather than by
computers, they can include only a tiny portion of what is
available on the Web. Of course, in these days of abundant
information, that may not be such a bad thing.
Yahoo uses a
standard search engine as well. For that reason, the
results of a search on Yahoo are split into several
sections. Category matches inform you if your
topic matches one of Yahoos existing categories. Site
matches are the sites that have been indexed and
categorized. Web pages provide links to pages
located by the search engine. Yahoo also groups results
into two other sections: related news, for any
news item it locates on your subject, and Net
events, which are mostly chat sites.
Yahoo is by no
means the only hierarchical index, and some of the many
others are aimed specifically at academic users. The
latter group includes: AlphaSearch (http://www.calvin.edu/library/as),
BUBL Link (http://www.bubl.ac.uk/link),
and Infomine (http://infomine.ucr.edu).
Then there are
the standard search engines. Popular ones include
AltaVista (http://www.altavista.com),
Excite (http://www.excite.com),
Go Network (http://infoseek.go.com),
and HotBot (http://hotbot.lycos.com).
Unlike hierarchical indexes, standard search engines send
out software robots or spiders to
search the Web and index the pages in each site they
encounter. The engines then calculate mathematically how
relevant the pages are to your search terms; each engine
uses its own algorithm to rank pages. Factors in the
calculation include the frequency and placement of your
keywords on a page, and their occurrence in the
descriptions that owners write of their pages, which are
invisible to users. The search engine puts the pages that
get the highest score at the top of the list of results.
Savvy
researchers will avoid standard search engines when they
have a very broad subject. Instead, they will use a
hierarchical index, to find just a few relevant,
well-cataloged sites.
Alternative
search engines, which take various approaches to ranking
and sorting the pages that they find, are often more
helpful than standard engines. Northern Light (http://www.northernlight.com),
for instance, ranks Web pages as a standard search engine
does. But instead of displaying all of its results in a
single listing, it sorts pages into categories and groups
the results into folders. As an example, a search for alternative
energy creates folders with labels such as solar
power, air pollution, and National
Technical Information Service, which includes
documents from that agency. And the folders contain
subfolders. Within the solar-power folder, for instance,
are folders for photovoltaic systems and government
sites. That arrangement of material can help you
determine which groups of pages are most likely to be
relevant to your needs.
Ask Jeeves (http://www.askjeeves.com)
takes an altogether different approach. You dont
enter keywords, but type a question in plain English
perhaps Is there evidence of life on Mars? Ask
Jeeves has recorded millions of questions that users have
asked it, and has found Web sites that answer those
questions.
The first thing
that Ask Jeeves does after getting your query is to scan
its database of questions and answers. It then gives you a
list of questions that it thinks you want the
answer to. If you select one of them, it lists sites that
contain the answers. Ask Jeeves doesnt always work,
but it can save you time, and it is fun to use.
Google (http://www.google.com)
takes yet another tack. Like other search engines, it
first matches up your keywords to the pages it has
collected in its index. Then, however, it ranks each page
based on how many other pages link to itand how many
link to those pages in turn. The pages you see at the top
of your list of results are those with the highest number
of links to other pages. The idea is that such popularity
is meaningful, just as a diner that has many trucks parked
in front probably serves better food than the diner whose
parking lot is empty. The approach works. After several
years of being a loyal AltaVista user, I am now a googler.
Oingo (http://www.oingo.com)
has an even more radical approach. The sites slogan
is We know what you mean, and Oingo conducts a
conceptual search to make sure that it
understands your request. Ask it to search for china,
for example, and it will ask you to choose porcelain
or any of the various geographical Chinas. Once you make a
selection, Oingo will display directory hits
and Web hits. The site combines a hierarchical
index and a search engine (it uses AltaVista), although
the conceptual search applies only to its directory
results.
Search engines
that search other engines are called meta search engines.
Among the popular ones are Dogpile (http://www.dogpile.com),
Inference Find (http://www.inferencefind.com),
and MetaCrawler (http://www.metacrawler.com).
The concept here is that because no single search engine
indexes the entire Web, using a meta search engine allows
a researcher to scan more sites. The downside is that such
an engine needs to use a lowest common denominator
search statement, so that all of the search engines that
it searches understand the request. Therefore, meta search
engines are not a very good choice for complex searches,
involving, say, Boolean logic. (Dogpile does include some
Boolean-search capabilities.)
A completely
different strategy is to search a database on the Web.
Hundreds of databases originally searchable on CD-ROM or
through proprietary online dial-up services are now
available on the Web, and new databases are continually
being born there as well. That makes it possible to search
rich databases with a standard Web browser, although in
many cases, the researcher must pay a fee or be affiliated
with a university that subscribes to the database. The
fee-based sites typically filter the data they contain,
increasing the likelihood that the results will be
relevant to a search; many also offer superior search
capabilities, so requests can be more precise.
The many new,
free databases on the Web can also be helpful. A site that
does an excellent job of identifying and sorting free
databases is The BigHub (http://www.thebighub.com).
Through its specialty search categories, it
allows you to search more than 1,500 databases on the Web,
many of which are oriented toward academics.
What new tools
for searching the Web are on the horizon? At a recent
conference, I heard about vortals, vertical
portals that provide information from only a designated
slice of the Web. For example, a vortal might search only
those sites and pages that have to do with health care.
VerticalNet (http://www.verticalnet.com)
offers portals to industries including communications and
advanced technologies. Although the concept is a good one,
the jury is still out on vortals usefulness.
Farther down the
road are visual representations of search results. Those
search tools display their results graphically, allowing
you to see at a glance which items are the most relevant.
A service called NewsMaps (http://www.newsmaps.com),
for example, displays the results of your search as a
thematic map. Topographical markers indicate clusters of
similar documentsthe most similar ones are piled up
into little hills. According to Cartia, the company behind
the technology, the maps are created automatically by an
algorithm that reads documents, extracts the
content, and organizes the collection into a map.
You can view some sample maps at the site.
No matter which
search tool you choose, you will get the best results if
you know what information you need, know the advantages
and disadvantages of the various ways to search the Web,
and regularly practice doing research online. Despite
technological innovation, the best research tool remains
the human brain.
This article
is reprinted with permission from Professor Robert
Berkman. To contact him, call (508) 540-5990 or email
rberkman@aol.com.
|