Technical Responses to Unilateral
The Deployment of VeriSign "Site Finder" and ISP Response
Jonathan Zittrain and Benjamin Edelman - Berkman Center for Internet & Society - Harvard Law School
[ Background - Motivation - Data & Analysis - Discussion - Submit Additional Data ]
Much of the day-to-day functioning of the Internet is thought to be "self-governing": Engineers operating Internet systems at participating institutions (including ISPs) make daily decisions that help keep traffic flowing efficiently, without having to forge formal agreements with each other and without having to adhere to formal rules set out by a governing body. For those functions that are thought to require centralized coordination, organizations like ICANN have come to exist, and ICANN's proper scope of "jurisdiction" remains in tension with the prior self-governing model. Arguments about the need for, and proper scope of, centralized coordination in part depend on the reliability and effectiveness of these informal self-governing alternatives.
A recent action by the registry of domain names ending in .COM and .NET -- the creation of a "Site Finder" service to which Internet users are now directed if they ask for any unassigned name -- has provoked reaction by ICANN as well as by individual network engineers and the institutions that employ them. As ICANN's policy reaction is still unfolding, we sought to find out just how much the summed actions of the Internet engineering community affected Site Finder's adoption. In the absence of any reaction, Site Finder would function for nearly all users seeking .COM and .NET names. However, as network engineers choose to adopt certain "patches," Site Finder's functionality is blocked for users of the corresponding networks. With help from data gathered by Alexa through users of its toolbar browser plug-in, we find that several large networks have already blocked Site Finder and that approximately 9% of users likely therefore no longer receive Site Finder content. We find particular evidence of blocking of Site Finder by networks outside of the United States -- most notably, much of China.
The Internet's legacy domain name system (DNS) provides a way to convert textual identifiers like www.nytimes.com into the numeric IP addresses necessary to reach specific computers connected to the Internet. A set of root servers provides a list of the "registries" responsible for maintaining respective top-level domains (TLDs, e.g. .COM), and each TLD registry's name server in turn provides references to the servers selected by domain name registrants to handle information each domain listed within that TLD (e.g. nytimes.com).
An Internet user's interactions with the DNS typically pass through the user's Internet Service Provider. In particular, when the user requests a web site or other content associated with a domain name, the user's computer queries the ISP's name server, which in turn queries the appropriate registry's name server. If the user asks for a second-level domain that does not exist in the registry, the typical answer from the registry is one that tells the user's software "no such address" (literally, "RCODE 3" or, for popular DNS server BIND, "NXDOMAIN"). That answer is not mandated by the relevant "RFC" specifications written by those who invented the DNS (RFC 1034), but such an answer is common practice. (RFC 2308) Upon receiving a "no such address" message, the user's software might simply display an error message, or in the case of a browser like Internet Explorer, query an Internet site designated by the browser manufacturer to provide suggestions and alternatives to the name requested by the user. The screen shot at right shows a typical error message shown by Internet Explorer.
Since 1992, the registries for .COM and .NET have been operated by a company called Network Solutions, subsequently purchased by VeriSign, under collaborative agreement with the United States Department of Commerce and also under contract with ICANN. (This contract initially also covered .ORG, control over which has since been assigned to the Public Interest Registry.)
On September 15, 2003, VeriSign introduced a service called Site Finder. With Site Finder in place, VeriSign no longer provided a "no such address" answer to DNS queries for nonexistent names. Instead, Site Finder simply directs otherwise-nonexistent addresses to VeriSign's own server. That answer causes web browsers to display a designated VeriSign site, which in turn suggests alternative sites, categories, sponsored links and other content thought to match the nonexistent domain the user had requested. This is described by VeriSign as "improv[ing] the user web browser experience" (VeriSign Site Finder FAQ) and offering "improved navigation" (VeriSign Response to ICANN) to users who have forgotten, misremembered, or mistyped a desired site name. Site Finder tracks click-throughs to the sponsored links found on its pages and earns commissions accordingly, typically paid by the web sites receiving traffic from Site Finder. (This payment and tracking technology is provided by Overture, recently purchased by Yahoo.)
On October 3, VeriSign announced that it would temporarily disable Site Finder, and VeriSign did so on October 4. As a result, the Site Finder system is not currently in use, and the analysis in this document speaks to conditions observed immediately before the service was suspended.
Site Finder as both a Technical and "Meta Technical" Change to the Internet
Site Finder has been controversial for numerous reasons, set out in greater detail in our separate Listing of Objections to Site Finder. Objections range from the way in which Site Finder's implementation confuse existing programs that use DNS queries to determine whether a given domain has been registered, to "meta-technical" objections as to the scope of and profits from VeriSign's franchise on .COM and .NET registries, which comprise roughly 60% of active second-level domains. (ICANN Analysis)
For example, concerns include Site Finder's unexpected effects on web browsing, email, and other Internet applications; transmission and possible analysis of information that users regard as private; effects on intellectual property rights and consumer confusion; and effects on the existing businesses of other companies.
These concerns have been sufficiently grave within the Internet technical community to prompt a recommendation from the Internet Architecture Board (IAB) calling for removing of the "wildcards" used to direct traffic to Site Finder, except when implemented with informed consent of registrants, as well as a recommendation from ICANN's Security and Stability Advisory Committee that VeriSign voluntarily suspend the service.
Motivation and Purpose
In response to these and other concerns, operators of DNS servers have sought to restore ("patch") their servers to their prior behavior, preventing their servers from receiving SiteFinder references from VeriSign's main .COM and .NET name servers, or from passing such references on to users and customers. The Internet Software Consortium, makers of leading DNS server BIND, report "quite a few" requests for such a patch to their server software. Other DNS servers can also be modified, as documented in an independent VeriSign Countermeasures page offering more than a dozen distinct patches.
Many of the debates surrounding "Internet governance" hinge on the extent to which a technical elite, operating Internet infrastructure, can and should take individual or collective action against perceived problems. This approach presents an alternative to a more formalized governance structure, such as that exemplified by ICANN or by government oversight. Each approach has at least theoretical appeal, but evaluation of their respective merits often focuses on how readily the engineering community can in fact respond to situations that violate norms rather than rules of Internet procedure.
The availability of name server patches represents a technical capacity for individual ISPs to prevent VeriSign's service from being the default for their respective users. We therefore seek to find out the extent to which ISPs are implementing those patches, and at what rate since Site Finder's introduction. Such patching tends to moot concerns about Site Finder as a service, while raising a new set of concerns -- particularly if DNS behavior became unpredictable or varied significantly across networks, violating the principle of "the unique DNS root" (IAB RFC) and reducing the Internet's end-to-end transparency. (End-To-End Arguments in System Design)
The remainder of this document therefore seeks to empirically assess the prevalence of disabling Site Finder at the ISP or network operator level, supplementing anecdotal data collected elsewhere. (Slashdot thread)
Data & Analysis: Quantifying ISPs' Responses to Site Finder
We wish to measure the amount of traffic reaching Site Finder over time, and the sources of this traffic. In general, it is impossible for third parties to know how much traffic a web site receives, and from where -- sites may be unwilling to disclose this information, or may not tabulate this information in ways that are helpful to answering the policy questions at issue. Indeed, upon our request, VeriSign was unwilling to provide detailed Site Finder usage data or log files (even modified to remove individual-identifying information). Nonetheless, we have developed two methods for assessing Site Finder traffic, each using data provided by Alexa Internet .
Alexa offers a free browser toolbar that offers users additional information about the sites they visit. In addition, with users' consent, the toolbar provides information to Alexa about the specific sites and pages visited -- allowing Alexa to draw inferences about site popularity. Alexa's users are not perfectly representative of all Internet users; for example, only users of Internet Explorer are counted, and some countries may be over- or under-represented. (Alexa FAQ) However, experience shows Alexa data generally to be a reliable method of assessing web traffic patterns, especially among popular sites.
Aggregate Usage of Verisign.com and Site Finder
Alexa makes available analysis, charting, and trends of traffic to third party web sites. These statistics are updated daily on Alexa's own web site, allowing relatively granular tracking of changes in site popularity. Alexa reports that the overall traffic ranking of verisign.com jumped from rank 1,559 to rank 19 over the first day of Site Finder's launch, and the site's "reach" (proportion of distinct users who visit the site at least daily) has increased roughly tenfold, from 3,515 to 37,750 per million, such that more than one in every thirty users visits verisign.com, overwhelmingly its Site Finder service, each day.
If usage of Site Finder remains roughly constant, with users continuing to mistype .COM and .NET domains and reach Site Finder content as a result, verisign.com's traffic ranking and reach are likely to remain roughly constant, as pictured in the left graph, below. In contrast, if many ISPs take steps to disable Site Finder for their customers, usage of Site Finder is likely to decrease accordingly, as in the center graph below. To date, through September 29, there has been at most an extremely limited sign of downturn in verisign.com's traffic rankings, as shown in the right graph. Accordingly, whatever effect may result from the name server patches that block the Site Finder service, the effect is not yet apparent in overall aggregate Alexa traffic statistics.
Alexa traffic does reflect a small drop in the "reach" of msn.com, reflecting that somewhat fewer distinct visitors now visit msn.com than visited MSN before introduction of the Site Finder service -- from 237,400 per million to 218,000 per million, as of September 29. This result tends to confirm the theoretical claim that VeriSign's gain from Site Finder is in part other companies' (in particular Microsoft's) loss. (By happenstance, this data also helps establish what part of msn's overall traffic reach can be credited to Internet Explorer's automated redirection service for reported nonexistent domain names, versus actual user requests for content at msn.com.)
Author Edelman's prior posting to CircleID describes this method of analysis, and offers additional conclusions from this data.
Site Finder Usage - Analysis Grouping Users by Networks
With data from Alexa documenting two weeks of accesses to Site Finder by Alexa users, the authors analyzed the specific IP blocks (groups of computers) sending traffic to Site Finder as well as changes over time in such traffic. This extends the analysis above by assessing not only how much traffic is reaching Site Finder but also which ISPs do, and do not, send such traffic.
For most IP blocks, Site Finder traffic increased from nothing prior to September 15, to a level on September 17 at which traffic has subsequently remained relatively constant. (The relatively lower level of observed traffic on September 15 reflects, at least in part, that Site Finder was only operational for a portion of that day, a factor that is exacerbated because Alexa times are measured in GMT and because Site Finder was introduced during the morning, US Eastern time. As to the observed intermediate level of traffic on September 16, we note that the Site Finder web site was particularly slow on that day, likely somewhat reducing its successful traffic, as measured by Alexa. The reason for the reported drop on September 23 is unknown, but might include a DoS attack or other failure of Site Finder servers or a partial failure of Alexa data collection systems.) The chart at right plots total Site Finder page views by all Alexa users.
However, in certain IP blocks, Site Finder traffic increased but subsequently decreased. For example, from Alexa users at China Renmin University, Site Finder traffic reached 38 page requests on September 20, fluctuating somewhat through September 25th, on which date Site Finder traffic dropped to zero, where it has remained since. This same result is seen elsewhere in China, almost always with the same dates of blocking, suggesting either coordinated name server patching or network filtering (via routers or packet sniffers), consistent with our prior research in Empirical Analysis of Internet Filtering in China (November 2002). But China is by no means the only entity to have blocked Site Finder in this way; Greek ISP OTEnet and Peruvian ISP Comsat also show similar behavior.
We generally include a /16 "Class B" network in our reporting based on its distinctive increase-decrease trend in Site Finder accesses if it meets the following criteria: Site Finder page views for at least the final three days in our data set was at most 0.05 the usage at the peak day in our data set, and peak usage was at least 20 page views. However, some networks were subsequently omitted when manual inspection called into question the conclusion of Site Finder blocking, i.e. because other networks operated by the same ISP seemed to show ordinary trends in Site Finder usage. Additional networks were subsequently added when IP-Whois comparisons yielded adjacent networks blocks also blocking Site Finder according to manual inspection.
In the table below, these networks are highlighted in yellow.
Other IP blocks do not show a distinctive upwards-downward trend, but nonetheless provide indications that Site Finder may be disabled. In particular, some network blocks provide far less Site Finder traffic than would be expected based on the number of page views made by Alexa users of these networks. For example, Alexa users on the Chunghwa Telecom network viewed nearly 50,000 web pages on September 22 -- yet viewed Site Finder only once during the period from September 15 to 29.
We generally include a /16 "Class B" network in our reporting based on its exceptionally low Site Finder usage if Alexa users on that network requested at least 20,000 web pages on September 22 yet requested at most 20 Site Finder pages during September 15 to 29. As above, some networks were subsequently removed by manual inspection.
In the table below, these networks are highlighted in green.
The table below reports these and similar results. The remainder of the table is available in a separate listing. Note, however, that these results are not exhaustive -- we have erred on the side of caution, and it is likely that additional networks have also disabled Site Finder. Reporting is particularly likely to exclude small networks and networks with few Alexa users.
IP-Whois Site Finder Accesses (on September dates, by users with Alexa) All Site Accesses
(on September 22, by users with Alexa)
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Total China Renmin University - CN (APNIC) 0 0 11 9 11 38 27 12 1 36 0 0 0 0 0 145 40294 China United Telecommunications Corporation - CN (APNIC) 0 2 10 6 6 39 39 28 2 25 0 0 0 0 0 157 48189 Comsat Peru - PE (LACNIC) - (LACNIC) 0 0 3 3 7 2 4 33 0 1 12 14 0 0 0 79 2842 Data Communication Business Group, Chunghwa Telecom Co., Ltd. - TW (APNIC) 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 47638 Network for COMCOR - RU (RIPE) 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 93097 OTEnet - GR (RIPE) 0 5 33 3 7 12 15 10 1 25 33 6 0 0 0 150 34863 ...
remainder of table available in separate listing
These two criteria for inferring blocking both assess the same underlying characteristic -- that a network sends less traffic to Site Finder than is expected, based on information about the network's users and their use of the web. We consider the first method of inferring blocking more robust than the second, for sudden change over time in Site Finder accesses is particularly difficult to explain other than via blockage of the Site Finder service. However, the second method of inference is necessary when network administrators blocked Site Finder especially quickly (e.g. before their Alexa users requested many nonexistent .COM and .NET domains, and therefore viewed Site Finder results) and when a network includes few Alexa users.
The chart at right (enlarged version) shows Site Finder traffic from selected network blocks for which traffic fell during the sampled time period (i.e. those network blocks highlighted in yellow).
We find evidence that at least a handful of networks have disabled Site Finder, but that at least some of these networks are extremely large (e.g. China). From the majority of these networks, Site Finder traffic has dropped off significantly since the introduction of the service -- supporting the inference that Site Finder was blocked on these networks sometime subsequent to the service's introduction (typically during the week of September 22). In addition, at a few large networks, Site Finder never reached significant traffic -- supporting the inference that the corresponding ISPs blocked the Site Finder service quickly.
Our analysis indicates that approximately 9% of Alexa users at present do not receive Site Finder when they request a nonexistent .COM or .NET domain. More than half of this proportion results from China's apparent decision, effective beginning September 24-25, to block Site Finder, while the remainder reflects other network operators jointly. We reach these estimates using Alexa data as to web usage by network -- logs that tell us what proportion of web browsing (of sites generally) comes from which networks, allowing us to estimate the amount of web traffic likely to result from the networks we have identified.
To the extent that Alexa users are generally representative of Internet users, our 9% estimate is an accurate measure of the proportion of overall Internet users not receiving Site Finder content. Of course, this inference relies on certain assumptions -- namely that Site Finder page request counts are proportional to ordinary web browsing traffic, and that Alexa users connect to the Internet via designated networks in proportion to the networks' overall web usage and user base. To the extent that certain networks have disproprtionately many or few Alexa users, the second assumption is called into question. (To date we know of only one major network significantly underrepresented: AOL has notably few Alexa users, perhaps due to complications in the interaction between AOL client software and the Alexa toolbar or due to difficulties in transfer of toolbar usage data across the AOL network.)
We observe that the majority of networks blocking access to Site Finder are located outside the United States. To some extent this result may reflect greater centralized coordination of networks in certain countries, e.g. China, allowing faster or more successful response to network changes deemed undesirable. We note, however, that Site Finder is blocked by networks in countries with no special experience at Internet filtering (e.g. Greece, Korea, Russia). We also note that relatively more intense blocking of Site Finder outside the US is precisely as anticipated by two distinct sets of concerns: 1) That Site Finder pages are always presented in English (notwithstanding users' language preferences), and 2) That Site Finder pages are large and therefore slower and more costly to transmit than ordinary error messages. Both these concerns disproportionately affect non-US users -- for whom English web pages are less likely to be useful than pages in native languages, and for whom data transfer cost and speed constraints may be particularly acute. Meanwhile, we consider equally noteworthy our finding that relatively few large US ISPs have made efforts to block Site Finder.
"Arms Race" Blocking
Analysis in this document considers efforts at blocking Site Finder by individual ISPs and network operators, typically by patching name servers. Separately, some discussions have considered blocking Site Finder using null routing, whether on users' PCs or at network backbones. (ICANNWatch thread) When implemented by individual users, such blocking is difficult or impossible to detect using the methods proposed here, since such blocking would be idiosyncratic, affecting individual users but not the entire networks that are the basis of our analysis. Of course, when implemented in a way that impacts entire networks, our methods do allow us to observe the modification -- e.g. the China intervention described above.
DNS server patching could give rise to circumvention efforts by VeriSign. For example, to the extent that DNS server patches check for the specific IP addresses used by Site Finder, VeriSign could circumvent the patches by changing the IP addresses used by Site Finder. DNS server patches might also be rendered ineffective by changes in VeriSign's method of wildcard redirection.
The authors are aware of the possibility that ISPs might allow Site Finder for a time, then patch their name servers, then "unpatch" the servers to restore access. For example, legal pressure from VeriSign might bring about such a result. This possibility is more than speculative: ISP Adelphia disabled Site Finder for September 20-22 (plus a portion of September 19 and 23), but has since re-enabled it, as shown in the table below and in the chart at right.
Network Site Finder Accesses (on September dates, by users with Alexa) 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Adelphia 1 31 172 210 99 0 1 0 7 60 98 236 161 216 242
Data Anomalies and Improvements to Our Approach
Our results show two patterns in drop-off effects that we initially considered surprising, but that we now understand to be typical. First, we often observe a drop-off over a period of days, rather than an instantaneous and complete drop-off, as might be expected since patching a name server patch is typically discrete and the results immediate for all users of that ISP's name server. However, if a patch were installed other than at precisely midnight, it would necessarily create a day of partial traffic to Site Finder, yielding at least one day of intermediate traffic levels. Second, some Site Finder traffic persists from networks that seem to have patched their name servers to disable Site Finder. However, there is typically no prohibition on using a remote, non-default name server not operated by one's ISP -- indeed, one author does so as a matter of course -- and to the extent that a few sampled Alexa users do so, the observed data gives the expected result of de minimis access continuing to be received from even those networks that have generally disabled Site Finder.
Classifying networks as patched or unpatched in some instances raises difficult questions of technical judgment about the representativeness and completeness of Alexa traffic data and about the reliability of reported network block assignments. In this initial research, we have erred on the side of caution -- reporting only network blocks, like those above, for which the pattern of traffic strongly supports an inference of name server patching. In future work, we will endeavor to introduce a statistical model that compares Site Finder traffic prior to a possible patch with traffic subsequent to that date, determining the likelihood that a patch was in fact applied. More formally, we will test the hypothesis that all Site Finder requests from a given network block are draws from the same distribution, against the alternative that the draws come from a distribution with a mean that drops during the course of the period studied. (For networks that seem to have applied Site Finder particularly quickly, our analysis will be informed by the total number of requests by Alexa users in an IP block to web sites other than Site Finder.) As additional days and weeks of Site Finder traffic data become available, the power of our inferences will increase, allowing us to conclude with greater certainty that patches have or have not been applied.
Pattern of Usage of the Site Finder Site
Since Alexa logs retain information about the specific directories and filenames visited on the Site Finder site (though not the URL parameters of HTTP GET requests, i.e. the portion of a URL following a question mark), Alexa logs allow the analysis of which portions of the Site Finder site have proven most popular with users. Among Site Finder requests included in our logs, the breakdown is as follows:
URL Proportion of Accesses sitefinder.verisign.com/lpc - default Site Finder page 81.58% sitefinder.verisign.com/spc 16.19% sitefinder.verisign.com 1.6% sitefinder.verisign.com/help.jsp 0.18% sitefinder.verisign.com/preferences.jsp 0.16% sitefinder.verisign.com/terms.jsp 0.15% sitefinder.verisign.com/privacy.jsp 0.11% sitefinder.verisign.com/whatsthis.jsp <0.01% sitefinder.verisign.com/pdf/sitefinderdevguide.pdf <0.01%
Notable among these results is the extreme prominence of /lpc requests for the default pages returned by the Site Finder service in response to user requests for nonexistent domains. These /lpc requests result from users' involuntary entry of nonexistent .COM and .NET domain names, and they typically reflect users' initial entry into the Site Finder site. Second most popular are /spc requests for categories of content or for the Site Finder search engine, both featured prominently on the /lpc listings, and reflecting actual intentional user actions. Other pages of the Site Finder service are considerably less popular.
Alexa log data allow assessment of the rate at which users view Site Finder's Terms of Service document, a question previously raised by Yale Law researcher James Grimmelmann in VeriSign Hijacks DNS Typos ... And Creates Binding Contracts? as well as by Gene Gaines in VeriSign Now Owns Your Use of .COM and .NET. As the table above shows, the Terms of Service are viewed roughly one five hundredth as often as the main /lpc result listings. Though some users surely visit multiple /lpc pages due to multiple requests for nonexistent domains, the extreme ratio of page views confirms that most users do not read Site Finder's Terms of Service. With an extended analysis of the data on hand, the authors could determine what portion of user-sessions include a visit to Site Finder's Terms of Service.
With additional log data from Alexa, reporting users' browsing immediately after obtaining a Site Finder page, the authors could determine users' specific behavior upon reaching a Site Finder page. In particular, the authors could determine what proportion of users click on a suggested or sponsored link; what proportion enter a search term or choose a Site Finder category; and what proportion continue browsing in some other way (pressing the Back button, retyping a URL, or closing their web browsers). Meanwhile, the results above indicate that at most one fifth of Site Finder visitors use the search engine or category listings (and, to the extent that some visitors perform multiple searches or browse multiple categories, even fewer distinct visitors use these features).
Alexa log files also provide data as to usage of other parts of the verisign.com web site, beyond Site Finder. Between September 15 and 20, Site Finder constituted roughly 86% of page-views on verisign.com.
Submit Additional Data
The authors solicit submissions from ISPs and network operators -- including those who have patched their name servers as well as those who know that they have not patched their name servers. Such information will assist in calibrating our analysis and confirming its accuracy. Submissions are welcomed via email to author Edelman, or via the ISP Response to Site Finder Submission Form.
This project uses data provided by Alexa Internet, generously and without charge. The authors offer their sincere thanks for this data, without which the project would be impossible.
Last Updated: October 7, 2003 - Sign up for notification of major updates and related work.