DNS as a Search Engine: A Quantitative Evaluation
[ Overview - Data Sources - Results - Conclusions & Implications - Motivation ]
In the course of the Internet's growing popularity, many Internet users have come to use the domain name system (DNS) as a directory and search engine: When trying to reach the web site of a new or unknown company, users often request the web page at the address http://www.companyname.com, replacing "companyname" with a guess as to a site's likely domain name. However, this DNS-based method is imperfect in that users may fail to correctly guess or remember a given company's domain name, instead typically receiving errors or sites operated by other entities.
The research described below suggests that alternative search mechanisms, such as leading search engine Google, provide the content of interest with greater accuracy and reliability than does the DNS. This finding supports the claim offered by, among others, DNS software designer Paul Vixie, that DNS "is not a directory service and was never intended to be used as one." This finding also quantifies Dan Gillmor's "Google effect" whereby Google replaces DNS as the preferred mechanism of locating content online.
Research further suggests that, while DNS offers what some might consider relatively high accuracy when conducting searches for top brands, companies, and organizations, DNS is substantially less accurate in searches for smaller brands, companies, and organizations.
Analysis considers users' attempts to reach sites from five distinct groups, as follows:
For each brand name, company name, or organization name, the author formed a domain name likely to be guessed by an ordinary Internet user attempting to reach the corresponding site via the DNS. Domain names were formed as follows: All punctuation marks were removed from each name; ampersands were replaced with the word "and"; spaces and hyphens were removed. For educational institutions, the words "university" and "college" were ordinarily dropped, along with associated prepositions. The first significant remaining word (or two or more words when the first word would lead to clearly-erroneous, vague, or nonsensical domain names) of the organization's title was then used as the domain name for subsequent testing. For brands and companies, testing attempted to access the www machine name within the given second-level domain within the .COM TLD, while educational organizations were tested within .EDU.
Each brand name, company name, and organization name was subsequently entered into the Google search engine. Analysis considers only the first reported result -- the result provided to users who press Google's "I'm Feeling Lucky" button.
Each brand name, company name, and organization name was also tested against the RealNames keyword resolution system. As shown in this screenshot, when a keyword was entered into the Address Bar of an ordinary Microsoft Internet Explorer web browser, the RealNames keyword system was in some instances invoked to provide access to at most a single web site potentially of interest. When such a RealName link was provided on the ordinary Microsoft "can't find requested web site" error page, analysis considers the destination of that link. This is the result that would have been provided to a user employing the RealNames system as a primary method of locating desired web content. (Note that, effective June 28, 2002, RealNames' service is no longer available.)
Results of DNS, Google, and RealNames searches were reviewed for relevance to the requested content. A result was deemed to be a success if it seemed to pertain to the specific brand, company, or organization at issue. In general, successes linked to the default web page of the corresponding entity. However, in the Random Brand Names and Boston Yellow Pages testing, many Google results instead linked to or mentioned the companies at issue, or offered their products for sale; nonetheless, so long as each result at least mentioned the product or organization at issue, it was recorded as a success.
The table below summarizes web site reachability by category of request and by search method:
Category of request |
% success |
% success via Google search |
% success via RealNames |
Sample Size |
Top brand names |
95%
|
96%
|
61%
|
100
|
Randomly-selected brand names |
5%
|
21%
|
1%
|
100
|
Randomly-selected Boston Yellow Pages results |
14%
|
46%
|
7%
|
100
|
Most selective colleges and universities in the US |
50%
|
99%
|
69%
|
100
|
Randomly-selected colleges and universities in the US |
31%
|
96%
|
61%
|
100
|
Overall (Average / Total) |
39%
|
72%
|
40%
|
500
|
Testing took place on June 19-28, 2002.
Conclusions, Future Work, and Policy Implications
These results suggest that, at least for the categories of content examined, Google is in each instance a more accurate search methodology than are DNS and RealNames. In many instances, Google is significantly more accurate.
DNS, Google, and RealNames are each more successful among well-known brands and organizations than among lesser-known entities.
While DNS has high accuracy among Interbrand's top 100 brand names (95%), and relatively high accuracy among the most selective US colleges and universities (50%), its accuracy among randomly-selected brand names (5%) and Yellow Pages results (14%) is notably lower. Randomly-selected companies and brands constitute, by and large, small businesses and their products. This result therefore suggests that DNS falls short for small businesses -- that domain registrations disproportionately fail to match the brand names of small businesses, even as large companies' domain registrations do reflect their brand names. One likely cause of this result is that many small businesses use similar or identical brand names, typically preventing any particular such small business from registering its particular brand name; while small businesses may register domains with the addition of modifiers or other additional characters, these additional characters render the resulting domain more difficult to guess.
RealNames' greatest strength seems to be within colleges and universities, both among the selective instutitions and among randomly-selected institutions. This may reflect that RealNames offered reduced-price or free keywords to these institutions. In contrast, RealNames is notable also for its lower rate of success among brand names and company names. For example, among Interbrand's top 10 brands, RealNames provides no listings for any of Nokia, Intel, or Ford -- presumably because each of these companies failed to pay RealNames its requested listing fee. While this strategy may or may not have proven on balance profitable for RealNames, it seems to have led to a rate of accuracy among large company listings that is substantially lower than even the DNS. This result may reflect a fundamental conflict in the goals of keyword service operators -- that providing an accurate keyword system requires giving away a large number of registrations, but that operating a profitable keyword system requires denying registrations to those who have not paid the specified fees.
(Update: Steve Sturgeon smartly suggests that the combination of a keyword system with a search engine might well address this problem -- providing top-of-list placement for those who pay a keyword fee, but providing comprehensive results even without such a payment. As the Microsoft/RealNames screenshot shows, this is in fact how Microsoft linked to the RealNames service.)
Future work might attempt to investigate the following questions:
Discussion in this document speaks only to methods of finding a site the first time or of otherwise coming to know how to reach a desired site. Conditional on knowing a site's domain name, DNS ordinarily remains the most efficient way to access that site since a Google search requires at least one additional step.
The purpose of this work is primarily academic -- to document the activity at issue for the benefit of those who seek to make policy decisions on related matters. For example, ICANN may consider the rollout of future TLDs that provide or attempt to provide directory-like services; various organizations may offer services that attempt to make DNS more or less like a directory; large trademark holders may attempt to register their marks in new TLDs in hopes of preserving users' ability to find their sites directly using the DNS. In any of these contexts, it may prove helpful to understand the extent to which DNS is already used as or already functions as a directory.
Thanks to Martin Schwimmer for suggesting this project and providing guidance on helpful data sources.