DNS as a Search Engine: A Quantitative Evaluation

DNS as a Search Engine: A Quantitative Evaluation

[ Overview - Data Sources - Results - Conclusions & Implications - Motivation ]

Overview

In the course of the Internet's growing popularity, many Internet users have come to use the domain name system (DNS) as a directory and search engine: When trying to reach the web site of a new or unknown company, users often request the web page at the address http://www.companyname.com, replacing "companyname" with a guess as to a site's likely domain name. However, this DNS-based method is imperfect in that users may fail to correctly guess or remember a given company's domain name, instead typically receiving errors or sites operated by other entities.

The research described below suggests that alternative search mechanisms, such as leading search engine Google, provide the content of interest with greater accuracy and reliability than does the DNS. This finding supports the claim offered by, among others, DNS software designer Paul Vixie, that DNS "is not a directory service and was never intended to be used as one." This finding also quantifies Dan Gillmor's "Google effect" whereby Google replaces DNS as the preferred mechanism of locating content online.

Research further suggests that, while DNS offers what some might consider relatively high accuracy when conducting searches for top brands, companies, and organizations, DNS is substantially less accurate in searches for smaller brands, companies, and organizations.

Data Sources

Analysis considers users' attempts to reach sites from five distinct groups, as follows:

The 100 most valuable brand names worldwide, as reported by Interbrand.
100 randomly-selected US brand names, obtained from the Brands and Their Companies, 23rd Edition (2002).
100 randomly-selected companies listed in the 2001-2002 Verizon Boston Area Yellow Pages.
The 100 most selective colleges and universities in the United States, as reported by the College Board.
100 randomly-selected American colleges and universities, from among the listings in Yahoo.

For each brand name, company name, or organization name, the author formed a domain name likely to be guessed by an ordinary Internet user attempting to reach the corresponding site via the DNS. Domain names were formed as follows: All punctuation marks were removed from each name; ampersands were replaced with the word "and"; spaces and hyphens were removed. For educational institutions, the words "university" and "college" were ordinarily dropped, along with associated prepositions. The first significant remaining word (or two or more words when the first word would lead to clearly-erroneous, vague, or nonsensical domain names) of the organization's title was then used as the domain name for subsequent testing. For brands and companies, testing attempted to access the www machine name within the given second-level domain within the .COM TLD, while educational organizations were tested within .EDU.

Each brand name, company name, and organization name was subsequently entered into the Google search engine. Analysis considers only the first reported result -- the result provided to users who press Google's "I'm Feeling Lucky" button.

Each brand name, company name, and organization name was also tested against the RealNames keyword resolution system. As shown in this screenshot, when a keyword was entered into the Address Bar of an ordinary Microsoft Internet Explorer web browser, the RealNames keyword system was in some instances invoked to provide access to at most a single web site potentially of interest. When such a RealName link was provided on the ordinary Microsoft "can't find requested web site" error page, analysis considers the destination of that link. This is the result that would have been provided to a user employing the RealNames system as a primary method of locating desired web content. (Note that, effective June 28, 2002, RealNames' service is no longer available.)

Results of DNS, Google, and RealNames searches were reviewed for relevance to the requested content. A result was deemed to be a success if it seemed to pertain to the specific brand, company, or organization at issue. In general, successes linked to the default web page of the corresponding entity. However, in the Random Brand Names and Boston Yellow Pages testing, many Google results instead linked to or mentioned the companies at issue, or offered their products for sale; nonetheless, so long as each result at least mentioned the product or organization at issue, it was recorded as a success.

Results

The table below summarizes web site reachability by category of request and by search method:

Category of request	% success via direct DNS access	% success via Google search	% success via RealNames	Sample Size
Top brand names	95%	96%	61%	100
Randomly-selected brand names	5%	21%	1%	100
Randomly-selected Boston Yellow Pages results	14%	46%	7%	100
Most selective colleges and universities in the US	50%	99%	69%	100
Randomly-selected colleges and universities in the US	31%	96%	61%	100
Overall (Average / Total)	39%	72%	40%	500

Testing took place on June 19-28, 2002.

Conclusions, Future Work, and Policy Implications

These results suggest that, at least for the categories of content examined, Google is in each instance a more accurate search methodology than are DNS and RealNames. In many instances, Google is significantly more accurate.

DNS, Google, and RealNames are each more successful among well-known brands and organizations than among lesser-known entities.

While DNS has high accuracy among Interbrand's top 100 brand names (95%), and relatively high accuracy among the most selective US colleges and universities (50%), its accuracy among randomly-selected brand names (5%) and Yellow Pages results (14%) is notably lower. Randomly-selected companies and brands constitute, by and large, small businesses and their products. This result therefore suggests that DNS falls short for small businesses -- that domain registrations disproportionately fail to match the brand names of small businesses, even as large companies' domain registrations do reflect their brand names. One likely cause of this result is that many small businesses use similar or identical brand names, typically preventing any particular such small business from registering its particular brand name; while small businesses may register domains with the addition of modifiers or other additional characters, these additional characters render the resulting domain more difficult to guess.

RealNames' greatest strength seems to be within colleges and universities, both among the selective instutitions and among randomly-selected institutions. This may reflect that RealNames offered reduced-price or free keywords to these institutions. In contrast, RealNames is notable also for its lower rate of success among brand names and company names. For example, among Interbrand's top 10 brands, RealNames provides no listings for any of Nokia, Intel, or Ford -- presumably because each of these companies failed to pay RealNames its requested listing fee. While this strategy may or may not have proven on balance profitable for RealNames, it seems to have led to a rate of accuracy among large company listings that is substantially lower than even the DNS. This result may reflect a fundamental conflict in the goals of keyword service operators -- that providing an accurate keyword system requires giving away a large number of registrations, but that operating a profitable keyword system requires denying registrations to those who have not paid the specified fees.

(Update: Steve Sturgeon smartly suggests that the combination of a keyword system with a search engine might well address this problem -- providing top-of-list placement for those who pay a keyword fee, but providing comprehensive results even without such a payment. As the Microsoft/RealNames screenshot shows, this is in fact how Microsoft linked to the RealNames service.)

Future work might attempt to investigate the following questions:

To what extent Google's accuracy further improves if a success is deemed to take place when Google offers a relevant page among its first several results, first page of results, or first several pages of results.
Whether Google's accuracy improves if searches for brand names include both company name and brand name, rather than brand name alone.
Whether the accuracy rates of DNS and of Google are changing over time, and if so in what directions for what categories of content.
How other keyword systems perform, and what business models and registration practices yield what levels of accuracy and of keyword system profits.
How results vary for brands, marks, and organizations of varying sizes and prominence.
How results vary for brands, marks, and organizations outside the US and in languages other than English.

Discussion in this document speaks only to methods of finding a site the first time or of otherwise coming to know how to reach a desired site. Conditional on knowing a site's domain name, DNS ordinarily remains the most efficient way to access that site since a Google search requires at least one additional step.

Motivation

The purpose of this work is primarily academic -- to document the activity at issue for the benefit of those who seek to make policy decisions on related matters. For example, ICANN may consider the rollout of future TLDs that provide or attempt to provide directory-like services; various organizations may offer services that attempt to make DNS more or less like a directory; large trademark holders may attempt to register their marks in new TLDs in hopes of preserving users' ability to find their sites directly using the DNS. In any of these contexts, it may prove helpful to understand the extent to which DNS is already used as or already functions as a directory.

Thanks to Martin Schwimmer for suggesting this project and providing guidance on helpful data sources.

Ben Edelman
Last Updated: July 1, 2002 - Sign up for notification of major updates and related work.

This page is hosted on a server operated by the Berkman Center for Internet & Society at Harvard Law School, using space made available to me in my capacity as a Berkman Center affiliate for academic and other scholarly work. The work is my own, and the Berkman Center does not express a position on its contents.