Search Engines: Their Necessity and Potential Danger

Search Engines: Their Necessity and Potential Danger

A Case For Regulation Or The Need For Non-Commercial Alternatives?

James Kovacs

Introduction

The Internet, and specifically the portion known as the World Wide Web, has seen an explosion in popularity over the past several years. Once the tool of just academics and researchers collaborating on scientific endeavors, today the World Wide Web is home to commercial enterprises, news agencies, and entertainment venues. Politicians and entrepreneurs alike sing its praises, claiming that the Internet shall revolutionize how we work, shop, learn, and play in the twenty-first century. Dozens of companies busily are working to develop the next generation of tools with which we are going to navigate the World Wide Web and interface with the Internet. Unfortunately, most of these efforts have been undertaken from a commercial perspective, and all too often the concerns of educational psychologists about the usefulness of computers in education and the conclusions of hypertext researchers about the potential limits or drawbacks of hypermedia learning are ignored.

As on-line interactions become more complex, people are faced with a growing number of decisions that must be made to complete a single transaction. How best can I find the information for which I am searching? What facts about myself do I feel comfortable revealing to on-line companies that request my identity? Many people are calling for the development of electronic agents to handle these decisions. In the future, electronic agents will search out information, complete negotiations, and monitor Internet traffic. Agent technology stands poised to become the librarians, power brokers, and police officers of the Internet. Although technological solutions carry a certain allure, there are reasons to be critical of this movement towards electronic agent solutions. Not knowing the underlying algorithms of these agents, what basis is there to trust the results? Should our trust of these agents vary depending on whether we are using them in a commercial setting? Will the widespread use of electronic agents have negative implications for cognitive skills? These questions should be examined throughout the development and adoption of electronic agent technology.

This paper analyzes whether policy makers should consider crafting legislation that requires operators of search engines to disclose facts about the methodology of their engines. As the predominate method for navigating the World Wide Web, the way in which search engines are constructed is an important public concern. Hypertext research has identified that the quality of learning in hypertext environments — of which the World Wide Web is one example — is dependent on the ability of the user to get a sense of the structure of the environment. With the tremendous political pressure to provide Internet access in the public schools, and to incorporate Internet research into the curriculum, concerns about having to rely on the results of commercial search engines are amplified.

First, the paper provides an overview of how search engines work and the possible look of the technology in the future. It is the hope of the first section to impress upon the reader the importance of a technology that so many take for granted. Second, the paper reviews some of the research cognitive psychologists and others have done regarding hypertext. Their research emphasizes the importance structure plays in an individual’s ability to comprehend systems of information. Next, the paper explores some of the concerns that educational groups have about being dependent on commercially focused computer products. This section tries to provide the reader with a sense of the possible, and arguably undesirable, directions market forces may push search engine technology. Finally, the paper examines whether we should trust market forces to address these concerns, mandate companies to disclose important facts about their search engine’s methodology, or promote the development of non-commercial, educational search engines. The paper concludes by suggesting that the discussion outlined here regarding search engines should motivate a deeper inquiry into the consequences of electronic agent technology in general.

Search Engines and Hyperlinks: "Maps" of the Web’s "Roads"

Most users of the World Wide Web are familiar with the idea of a search engine. Except for those individuals whose web-surfing is limited to looking up commercial web sites addresses that are advertised on television or typing random words into the URL line of their web browser hoping to be taken to a desirable web site, most people actively make use of search engines on a regular basis to navigate their way around the Web. Some web sites incorporate search engines into their design in order to help the viewer find his way around a complicated site consisting of numerous individual pages. In addition to providing links to the articles appearing in its current print edition, for example, the web site for a newspaper might include a search engine so readers can locate specific stories in the newspaper’s archives. The main function of other web sites is as a search engine. The web sites of Yahoo! and Excite are two examples. These web sites provide web surfers a way to locate other sites of interest. This type of web site is often referred to as a portal, as it serves as the launch pad from which the web surfer reaches his other destinations. In sum, if hyperlinks are the roads that allow viewers to travel from web site to web site or web page to web page, search engines are the maps that allow us to find the roads.

Every search engine has several key components. First, there is a catalogue or database of information. This database is what the engine queries when conducting a search. Most often, the database consists of a list of URLs and descriptions of the corresponding web sites. Unlike a library’s card catalog, which similarly compiles titles of books and their corresponding Dewey-decimal number, search engine databases are updated continuously. Further, the search engine can be designed so that information about the searches themselves can be added to the database. If most everyone who searches the phrase "star wars" ends up visiting the web site StarWars.com, the search engine can inform future users searching the phrase that most everyone else ends up choosing to visit the StarWars.com web site.

Second, there is the interface. Does the user have to enter keywords? Does the computer ask the user to answer a set of questions? Does the computer display a list of headings from which the user can choose one to explore? Each of these interfaces are used on the World Wide Web today, providing different capabilities to the surfer. Keyword search engines are very common. They allow the web surfer to locate pages that are somehow associated with the keyword. This is most useful to the web surfer who already has in mind the type of information he is hoping to retrieve from the search engine. Many search engines, like Yahoo!, combine this keyword function with a display of category headings for the database, allowing a user to search the database according to its hierarchical structure. The third type of search engine is most often encountered on commercial web sites, such as Amazon.com. When you purchase a book from Amazon.com, a search engine checks the database to discover what past purchasers of that book have also selected. Amazon.com lists these titles under its "recommendations" heading. In effect, the search engine has implicitly asked the surfer the question, "What books do you like," to search a database consisting of all of the book titles available through Amazon.com. Although this type of interface has been most used in the commercial setting, it could be applied with just as much effectiveness in a web site like Yahoo! or others.

Finally, every search engine consists of some algorithm that determines how the engine searches the database and displays the results of the search. For a keyword search, does the engine search for the web pages that use the word in the highest frequency? Does it display the pages alphabetically or according to some other criterion? Unfortunately, little can be said for the exact technique used by most search engines today. The methods are not publicized, particularly because the method is one of the company’s most valuable trade secrets. This lack of disclosure raises some concerns when you recognize that these web sites are commercial corporations who hope to make a profit on their business enterprises at some point. If you understand the significant role search engines play in the web surfing behavior of Internet users, you can see the value a corporation might place in having its web site be returned as the first and "most appropriate" result of a search engine’s search.

Although this may seem to be of minor concern to some individuals, consider its implications as search engines develop. Today, a user’s encounter with most search engines is static. The web surfer visits the search engine web site, enters a search, and retrieves the result. In the future, though, search engine agents may become more popular. These agents would conduct periodic searches on an individual’s behalf, distilling the resulting information and delivering the content to the person requesting the search.¹ For example, consider a search that has been tailored to scan for news related to someone's business industry.² The electronic agent would use this search to keep the individual abreast of the news — information which the individual might use as the basis for important business decisions. The integrity of the search engine’s algorithm becomes crucial in this context.

Lessons From Research on Hypertext

Despite all of the predictions that the Internet shall play a more important role in education and commerce in the years to come, little attention has been paid to the research of cognitive psychologists on hypertext.³ Although their research is still in its infancy, a number of scientists have been exploring and analyzing the cognitive skills involved in using hypertext, identifying the keys to effective design of hypermedia systems. It is not the purpose of this paper to make an exhaustive review of the results of this research. However, a general discussion of their findings will help further this paper’s premise that search engines are such a crucial component of the World Wide Web that policy makers should watch carefully the evolution of search engines.

One of the main findings of the hypertext researchers is that a reader's ability to comprehend hypertext is improved by having some concept of the overall structure, or map, of the hypertext items. Research has found that without some concept of the structure of the hypertext readers tend to become disoriented and loop around the various pages, not paying critical attention to the texts or the relationships of pages embodied in the links.⁴ Providing the reader with a picture of the overall structure of the hypertext resulted in better comprehension of the presented material.⁵ Other researchers have found that readers employ strategies to try and maintain the coherence of the hypertext system that they are reading. If an overall map of the system isn't given to them, readers typically choose to read the texts linearly.⁶ When the development of the structure is left to the reader, however, his cognitive workload is increased.

Search engines and Internet portals are the closest things that exist to a "structure map" for the Internet. Domain names provide surfers with little sense of the bounds of the Web, both in terms of its physical size and content. Like a site map for an individual web site, portals try to catalog web pages into some sort of searchable directory. However, with the volume of information on the Internet and its rapid growth, portals cannot just provide surfers with a page that lists hierarchically the contents of the entire Internet. Ultimately, web surfers must rely on search engines to provide part of their sense of the Internet's structure and content.

Thus, the hypertext research highlights the important role that search engines play in the ability of web surfers to intelligently navigate the Internet. Certainly, the design of individual web sites is the most powerful factor in determining the user's ability to analyze the information presented on that site's pages. However, viewing the World Wide Web as one large hypertext document, it is the quality of the search engine's design that shall determine the cognitive ease or burden that navigating the Web imposes on the surfer.

When Educational and Commercial Interests in the Internet Clash

Our fascination with computers has led many to promote them as the solution to the woes of our public education system. Many believe that access to a computer and the Internet shall give disadvantaged school districts a well-needed boost in access to quality information resources in the form of a digital library. Others fear that if our children are not exposed to technology early in their development, and be given a chance to acquire the skills necessary to operate the technology, they will be unable to compete in the work force with those who have been exposed early on to computers.⁷

Although the exact scope of the computer's advantages for education is hotly debated, to be fair, the computer shall find its place in the education system. Whether it shall live up to its current billing remains to be seen. One of its most promising features is as a gateway to information on the Internet. Via the World Wide Web, educators can access classic works of fiction, view paintings in a digital art gallery, or examine transcripts of White House press briefings. In the future, books might be released directly through the Web. To the extent that access to these digital collections costs less than acquiring the print versions, school districts can give their student’s access to a larger collection of materials than ever before with just a computer and an Internet connection.

As some educational psychologists have noticed, however, there is only one World Wide Web. And more and more, the look and shape of the World Wide Web is being determined by corporations and the growth of electronic commerce. Competing for the attention of web surfers, companies place advertisements on web sites known for having large numbers of viewers. Search engines and portal web sites are prime locations on the Internet, as companies recognize that surfers are dependent on these sites to navigate and structure their Web experience.⁸ Further, the ads can be linked to key terms. Thus, the advertiser can tailor his advertisements to those most likely to be interested in his site: individuals who have just conducted a search using a related key term.

As a result, the search engine industry has become a key commercial sector of the Internet. The stock value of four of the largest portal and search engine sites — Yahoo!, Excite, Lycos, and Infoseek — is over nine billion dollars.⁹ Yet, few if any of these companies have begun to make sustained profits off of their operations.¹⁰ Nevertheless, these companies have drawn the attention of large media conglomerates. For example, earlier this year Disney agreed to purchase forty-three percent of Infoseek.¹¹ Although the "viewing" share of the web surfing community that these sites possess might be sufficient motivation to prompt such investments for the moment, ultimately search engine and portal sites will have to develop viable revenue streams. Currently, these sites sell advertisements at rates ranging from two to five cents per view.¹² However, these companies possess a greater financial asset than advertising: "pay for placement" customization of the search algorithm. How much, for example, would General Motors be willing to pay to have the company's web site placed at the top of the result list of a search of the term "cars" or, even better, as the only item on the result list?

To some, the idea of a corporation tailoring its search engine web site to reflect favorably on those companies that are willing to pay top dollar for placement is disturbing and shocking, feeling even somewhat unethical. Such reactions highlight the way in which search engines are viewed currently. In other settings, this type of contracting isn't as shocking. Telephone books are used as a research tool, but most people recognize that each business paid to be included in the book. A chamber of commerce is a good source for information on a city’s businesses, but most people understand that each business paid a membership to belong to the association. Search engines are not viewed in the same light. Many people place a level of trust in the objectiveness of the results of these engines that ordinarily would be reserved for services conducted by a non-profit educational organization or the government.

Is there seriously reason to be concerned that search engine web sites will begin selling placements to the highest bidder? Much will depend on whether search engine companies believe that such agreements would pose a long-term risk to their brand name as the search engine of choice. And, in fact, it has occurred already. Several years ago, OpenText Corporation operated a web site that allowed companies to pay for top placement in search results. The company abandoned the practice, however, after receiving numerous customer complaints.¹³ A new company has revisited the idea of the "pay for placement" search engine. Unlike OpenText Corporation, however, Goto.com lists next to the result the amount of money that the linked site paid for the placement.¹⁴ It is too early to tell whether Goto.com shall become a viable player in the search engine industry, but the web site does demonstrate the potentially greater revenue that "pay for placement" offers over advertising.¹⁵ Emerging lawsuits suggest that other sites may be engaging in this form of contracting as well. In the United States District Court for the District of Columbia, GTE New Media Services Inc. has sued, among others, Yahoo!, alleging that Yahoo! violated the Sherman Act by making an agreement with GTE’s competitors in the yellow pages industry to not include GTE’s link on the web site Yahoo! maintained for Netscape called, "Netscape’s Guide by Yahoo!"¹⁶ Although the merits of this lawsuit have yet to be addressed, this lawsuit and the other examples suggest that search engines may not deserve their current rosy image, and that their results should be viewed with skepticism.

Even so, what harm to education do these commercial search engines pose? Certainly, educators will need to adjust and teach children to think critically about sources of information. This is a critical skill that everyone should develop, and shall be of great importance as our economy becomes more of an information economy. To the extent that these skills are difficult to learn, commercial search engines may point students to highly slanted information sources that the students may not be able to put into proper perspective. Of greater concern, however is the ability to find high-quality educational resources on the Internet. If search engine results become dominated by "pay for placement" links, finding other sites will become burdensome.

The Range of Policy Responses

Suppose for a moment that the dangers of search engines outlined in the previous sections begin to grow. What are the types of responses available to policy makers wishing to address the problem? Ultimately, they take two forms: regulation of search engine algorithms or the creation of non-commercial search engines.

It should be noted that market forces may solve any problems given time or the proper inducements. The anecdote about Open Text Corporation suggests that consumers could apply market pressure on those sites that sell placements in result lists. Of course, this depends on consumers knowing that such sales are occurring. Similarly, non-profit organizations might step up and develop an alternative to the commercial search engines. Government intervention is not the only way by which these concerns can be addressed.

Nevertheless, policy makers may wish to consider regulating search engine algorithms. The most probable form for this regulation would be to require that the search engine sites disclose information about their engines' methodologies. For example, legislation could be passed requiring sites that sell placements to inform visitors up front of that fact. Similarly, one could require the search engine to identify whether it uses a frequency rating to sort results, or whether it uses past information about all previous searches of a given term to order the results of future searches of that term, placing the sites that past visitors most often chose at the top of the list.

A pointed concern that follows from this is the use of search engines by commercial web sites to offer recommendations of other products to customers. Amazon.com and Columbia House use engines that analyze past purchases for trends or make use of richly cross-referenced databases of products.¹⁷ Covert selling of placements within these search engines is highly objectionable as customers are likely to perceive the "recommendations" of these search engines as being more objective.

In addressing the concerns of educators, the government could subsidize the creation of a non-commercial search engine, or mandate that a government agency, such as the Library of Congress, to develop such an engine. Many associations have begun developing lists of sites that may be of interest to educators and researchers. The development of an education search engine takes this a step further. Such an engine will be costly to develop, as sites will need to be screened to ensure their "appropriateness" for educational research. Such screening also suggests that a credible organization will need to be given the task of creating the engine, as users will need to trust, to some extent, the organization's judgment calls on what belongs in the engine. Review policies could be openly published and processes could be established to allow users to refer sites for inclusion in the search engine.

Concluding Thoughts

Whether it will be necessary for the government to intervene and regulate search engines still remains to be seen. However, the above discussion highlights some of the dangers that electronic agent technology might pose. The structure of the World Wide Web, and its vast volume of content, will dictate that electronic agents shall be used to some degree. Yet, like search engines, the exact functioning of the agents can be hidden from the end user. What commercial arrangements might be hidden within the very code of the electronic agent? As we grow more dependent on agents to find information and deal with complex negotiations, what will become of our own cognitive skills? What are the consequences of allowing search engines to not only find the item for which we are looking, but also to suggest the items for which we should be looking? Might we share Carol Gigliotti's concern about agents and their potential damage to culture:

"There's a big difference between knowledge and wisdom[.] .... And a great deal of what's on the Web has to do with information, which is confused with knowledge, which is confused with wisdom. Wisdom can understand knowledge and decipher information. We're allowing [by using agent technology] a very random, nonexperiential machine to say what we should like."¹⁸

Despite the necessity of some degree of search engines and electronic agents, policy makers should pay close attention to the development of these technologies with an eye to heading off the potential dangers that these technologies might bring.

FOOTNOTES

1. For example, one company is offering a primitive version of this service for gathering news information. See <http://www.newshound.com>.

2. Example from Don Tapscott, Growing Up Digital: The Rise of the Net Generation, (McGraw Hill: New York 1998), p. 33.

3. A hypertext system is a system that organizes information in a network, providing links between various items. The World Wide Web, for example, consists of numerous "pages" that contain texts, graphics, sound, and video files. Each page - and even sections within a single page - are connected to each other with hyperlinks. This paper shall use the term "hypertext" to refer to such systems. For a more complete description of hypertext systems, see generally, Rouet et. al., Hypertext and Cognition, (Lawrence Erlbaum Associates: New Jersey 1996).

4. See Foss, C. L. (1989). Detecting lost users: Empirical studies on browsing hypertext (INRIA Technical Report No. 972, Programme 8). Sophia-Antipolis, France: INRIA.

5. Diana Dee Lucas, Effects of Overview Structure on Study Strategies and Text Representations for Instructional Hypertext, in Rouet et. al., Hypertext and Cognition, (Lawrence Erlbaum Associates: New Jersey 1996).

6. Peter W. Folz, Comprehension, Coherence and Strategies in Hypertext and Linear Text, in Rouet et. al., Hypertext and Cognition, (Lawrence Erlbaum Associates: New Jersey 1996).

7. For a discussion of some of the issues surrounding computers in the classroom, see Jane Healy, Failure to Connect: How Computers Affect Our Children's Minds - for Better and Worse, (Simon & Shuster: New York 1998).

8. For example, it is estimated that over 30 million US households visit Yahoo! each month, viewing the site for a collective 503 million hours. Saul Hansell, Turning Search Engines Into Money Machines, New York Times on the Web, May 11, 1998, <http://www.nytimes.com>.

9. Id.

10. See Laurie J. Flynn, A Search Engine That Charges For Top Billing, New York Times on the Web, March 16, 1998, <http://www.nytimes.com>.

11. See Reuters, Disney Buys Stake in Infoseek, New York Times on the Web, June 18, 1998, <http://www.nytimes.com>.

12. Advertising rates are from Michael Lesk and Hal Varian, eds., Internet Publishing and Beyond: The Economics of Digital Information and Intellectual Property, (Cambridge, Mass.: MIT Press, 1998).

13. See Laurie J. Flynn, A Search Engine That Charges For Top Billing, New York Times on the Web, March 16, 1998, <http://www.nytimes.com>.

14. See <http://www.Goto.com>. The reference is ambiguous; a casual user of the search engine may not realize that the linked site paid for its placement.

15. For example, the top-listed site for a search of the term "auto" pays forty-three cents per hit. <http://www.Goto.com> (visited December 20, 1998). Compare that to the couple of cents per view for advertising.

16. GTE New Media Services Inc. v. Ameritech Co., No. 97-CV-2314 (RMU), 1998 WL 682984 (D.D.C., Sept. 28, 1998).

17. See Samuel G. Freedman, Asking Software to Recommend a Good Book, New York Times on the Web, June 20, 1998, <http://www.nytimes.com>.

18. Id.