Technical
Appendix
Empirical Analysis of Internet Filtering in China
The appendix sections below
offer technical details beyond those of the main report. It contains the following
sections:
Blocking of Entire Web Sites
and Entire Servers
Reporting Criteria and the "Blocking
Quotient" of Reported Sites
DNS
Filtering/Redirection and Its Implications
Independent
Filtering Implementations and Corresponding Circumvention Techniques
Other
Effects of Chinese Filtering: Routing and Email
Blocking of Entire Web Sites and Entire Servers
We conducted testing of only one URL per Web host based on our background knowledge, reinforced by subsequent testing, that when the default page of a site was filtered, the entirety of that site was typically filtered.
To test the hypothesis of entire-site blocking, we formed a sample of web hosts found to be inaccessible, and we checked whether an arbitrary subdirectory on each such site was also inaccessible. Though the arbitrary directory name we chose was intended not to exist on the servers, web servers return a "not found" error message in response to a non-existent request. We confirmed that these error pages themselves were inaccessible in a total of 99.8% of tests. We attribute the other 0.2% of results to anomalies such as transient network errors that may have wrongly rendered the web host inaccessible in the first instance when the host was not intended to be blocked.
At the moment, then, it seems that when the default page ("front page") of a host is blocked, all other pages on that host are also blocked. (Of course, the reverse need not be the case, and the authors have separately confirmed multiple instances in which it is not the case.)
When an entire host is filtered, our data show that this filtering typically operates on the basis of the host's IP addresses rather than on the basis of its one or several domain names. To make this confirmation, we observed that when many web sites are hosted on a single web server (as is typical in commercial "shared hosting" at the lowest monthly rates), blocking by China of one web site on a given server (with a given IP address) typically entails blocking of all other web sites on that server. For example, we found a total of 308 distinct (by domain name and differing page content) blocked sites all hosted on the server at IP address 216.34.94.186, a parking/redirection server used by domain name registrar Dotster. To the extent that this server in fact hosts additional sites beyond those we tested, it is highly likely that they too were blocked. Indeed, a representative of domain name registrar enom reported to the authors that its primary domain name forwarding service had been blocked by China -- rendering unreachable literally hundreds of thousands of domain names that rely on that server.
While filtering of a host's top-level page predicts the filtering of all other pages on that site, such filtering is not technically mandated. Indeed, midway through our testing, the authors learned of and confirmed the blocking of certain pages on otherwise-accessible sites. At least some of this blocking appears to be triggered by one of relatively few keywords in page URLs or contents; this therefore represents a technical layer of blocking wholly distinct from (and seemingly rarer than) that which results in an entire site being made unavailable.
Since blocking typically affects an entire web server, our reporting includes all Yahoo and Google/DMOZ categories that reference any pages on affected web servers.
Reporting Criteria and the "Blocking Quotient" of Reported Sites
In order to sort out intentional blocks from mere unintentional network blockages or other variation we tested candidate URLs multiple times and through multiple proxies. In many cases, sites were unavailable only on one occasion, or unavailable from one proxy in China while available from another. While such phenomena might represent intentional blocking that is simply limited in time or regional scope, we operationalize the notion that a URL is blocked "in China" only when it has been found to be unavailable on at least two occasions, and from at least two distinct proxies, all while still accessible from the United States. Variations in blocking across proxies, if not due to transient network failures, could reflect a distribution of authority to make and implement blocking decisions from one region to the next or a technical burden or delay to readily programming key routers across China to block an undesirable URL.
To the extent that blocking varies across networks and across geographic locations, to describe a URL or entire Web site as "blocked in China" may be inexact -- a site can be found accessible in some places and simultaneously inaccessible in others. In the absence of further data about political decision-making and technical implementation, we can be only as precise as the data is accurate -- and we therefore apply a threshold of overall inaccessibility to determine that a site is "blocked in China."
We have received reports indicating that certain locations -- for example, hotels predominantly frequented by western visitors -- have significantly less stringent filtering policies. Our reporting of sites "blocked in China" should not be taken to describe Internet access from these locations.
Having tested all sites on multiple occasions from multiple distinct locations within China, the authors have found some sites that were blocked consistently -- on all occasions, from all locations -- while other sites were blocked less often. The "blocking quotient" slider in our reporting seeks to characterize this observation: a wide red bar signifies a site blocked more frequently, while a narrower bar denotes intermittent blocking or blocking observed from relatively fewer locations within China. We report this measurement with a slider rather than a number to reflect the uncertainty necessarily associated with these measurements and the resulting analysis.
DNS Filtering/Redirection and Its Implications
For some 1,043 of sites tested, we confirmed that DNS servers in China report a web server other than the official web sever actually designated via each site's authoritative name servers. We call this phenomenon "DNS redirection," though others sometimes refer to the situation as "DNS hijacking." Consistent with prior reporting by Dynamic Internet Technology, our data show that such sites were consistently unreachable in their entirety.
Currently, when a user in China requests a site affected by DNS redirection, the user's computer is told that the site's domain name is associated with the IP address 64.33.88.161. That IP address is associated with the host www.falundafa.ca, the site of a Canadian organization that promotes the practice of Falun Gong. However, that address is itself blocked by Chinese border routers, preventing such requests from reaching either the falundafa server or any other. As a result, Chinese users are unable to reach the entirety of these many sites, including their respective default pages as well as their subsidiary pages.
While the authors cannot know for sure the specific rationale for implementing this additional method of filtering by Chinese network staff, we suggest two possible understandings. First, this method of filtering might be intended to supplement border router filtering; depending on the specific method of implementation, it might be in some way more efficient or easily updated by Chinese network staff, and compliance of ISPs can be more easily monitored remotely via ordinary DNS tools such as dig. Second, this method of filtering is a likely precursor to efforts both to monitor accesses to specific sites and to revise or replace content on those sites with other content specifically provided by Chinese network staff ; either approach would rely on proxy servers to be placed at specified IP addresses and would require that requests for designated sites in some way be redirected to those addresses. While this second theory is largely speculative, it rings true given related efforts to replace Google (see the authors' prior Replacement of Google with Alternative Search Systems in China) and subsequent filtering of certain Google search terms (including the names of key political figures and the terms required to use the Google cache).
Independent Filtering Implementations and Corresponding Circumvention Techniques
We have observed certain idiosyncrasies in Chinese methods of Internet filtering, and in some instances we have found methods to circumvent particular aspects of filtering. Based on this data, we can draw inferences about particular methods of filtering. In this section, we detail these anomalies as well as their implications.
These final two methods of filtering -- on the basis of keywords in URLs and HTML responses -- are not the primary focus of our reporting. Instead, our current work focuses on web sites filtered in their entirety; in future work, we will seek to document the specific keywords found to be prohibited in searches, URLs, and HTML response pages, and more important, the evolving prevalence of each type of filtering.
Other Effects of Chinese Filtering: Routing and Email
Routing. The authors have observed that some American ISPs route packets through China towards destinations beyond China (in particular, to Hong Kong). When the desired web servers are blocked from China, such a routing typically yields to filtering by network equipment in China of an American user's request. In response to this problem, affected American ISPs can address the situation by manually altering the routes used to reach hosts in Hong Kong and elsewhere. However, affected ISPs are often unaware of the situation, and an effective response requires delay and/or causes additional expense as an affected ISP finds the necessary partner ISPs and establishes peering relationships with them.
Email. When border routers in China discard packets destined to or received from certain hosts, we understand that they typically do so without regard for the specified protocol of communications. As a result, email messages are typically filtered when sent to or received from blocked sites. The authors understand that additional filtering efforts may specifically target certain controversial emails, and the authors plan to document this situation in detail in future work.
Other Protocols: Filtering on the basis of server IP address can restrict additional protocols of Internet communications. For example, FTP is as affected as the web by blocking of a requested server's IP address. The authors have also received reports of failures of instant messaging software, likely reflecting difficulty in passing packets to and from designated servers.