Google search result exclusions
Statement of issues and call for data
Jonathan Zittrain* and Benjamin Edelman** - Berkman Center for Internet & Society, Harvard Law School
The authors are studying exclusions from search engine search results, and have found some 113 sites excluded, in whole or in part, from the French google.fr and German google.de compared with google.com. Learn more about the situation and context, test the exclusions for yourself, and submit further sites suspected to be excluded.
The authors are studying Internet filtering worldwide, primarily in the context of affirmative actions by governments to restrict the sites viewed by their respective users. We are also interested in requests or demands to private parties that they assist in preventing particular citizens' exposure to locally unwanted or illicit data or activities. A well-known example is the attempt by the French judiciary, acting on complaint of a French NGO, to prevent those on French territory from viewing Yahoo auctions that include the display of Nazi memorabilia, and Yahoo's response.
Another developing set of examples, commonly worked more through informal arrangements and understandings rather than lawsuits, relates to search engines. Though search engines cannot prevent direct access to a site of interest, an exclusion from a search engine may nonetheless have a similar effect on a site's ability to reach its intended visitors. Private parties have sought to have search results omitted whose corresponding sites allegedly infringe copyright or other rights. For example, the Church of Scientology, invoking the "safe harbor" provisions of the U.S. Digital Millennium Copyright Act, convinced Google to remove references to certain web pages critical of the religion and allegedly containing copyrighted material; another example may be found with the Internet Archive's policy of possibly (but not necessarily) acceding to requests to remove its caches of a given Web site should the site's authors (or perhaps others) object on the basis of copyright infringement. While a few removals may be widely publicized by third parties noticing the exclusions (as in the case of the Church of Scientology requests), most others are systemically unknown, and we know of no direct way to determine a full list of all sites or pages excluded from country-specific versions of Google (or from archive.org or from other web services and search engine results). We have asked for Google's assistance in this matter, since they'd be in a position to directly reveal the list of filtered material; we are told the request is under consideration. Google does notify the Chilling Effects Clearinghouse of DMCA-based requests for removals, which are then posted on the Chilling Effects site.
The policy for removals based on government invocation of local laws remains itself somewhat shrouded. For example, while Google's terms of service explains that it removes search results in response to DMCA notifications and sets forth some other situations where Google might remove search results, there is no mention of government-mandated (or -requested) removals, though there is mention of the prospect in at least one apparent message from Google support staff. This may well be due to the fact that there are no industry "best practices" or other easy means of determining an ideal course of action by search engines when confronted with requests or demands for exclusions grounded in part in alleged legal requirements.
The authors are currently seeking to document differences between results generated at google.com and those at google.fr and google.de, Google's counterparts intended for French and German audiences. Specifically, we have noticed that while google.fr and google.de use google.com's database concordance of 2,469,940,685 web pages (Google's count as of Oct. 20, 2002), the French and German sites seem to screen search results corresponding to sites with content that might be sensitive or illegal in the respective countries. Such filtering by Google at the technical level on the basis of threatened or implied legal liability or responsibility is wholly separate from national, third-party interception and filtering of Google searches and results pages, as in the September 2002 replacement of google.com with other search engines in China (see documentation and screenshots) and subsequent unfolding restrictions on certain Google searches in China.
Google.com vs. google.fr & google.de
To help understand the sorts of pressures placed upon intermediaries like search engines and their respective reactions, the authors began searching for search result discrepancies between results from google.com versus those from google.fr and google.de. We conducted this search by using a list of several thousand sites known or likely to be controversial, most for their inclusion of white supremacy or related content, including one site to which we had been alerted the discrepancies existed. For each site, we then searched google.com, google.fr, and google.de (in a manner described below) to determine the number of pages reported indexed respectively. We have created a list of those sites found to yield different results in google.com versus google.fr and/or google.de. This list reflects testing of October 4 to 21, 2002. Many such sites seem to offer Neo-Nazi, white supremacy, or other content objectionable or illegal in France and Germany, though other affected sites are more difficult to cleanly categorize.
Readers can review a specific listing of sites found to be excluded from google.fr and .de. Note that each site in the listing includes a "confirm" link that lets interested readers check whether google.fr and .de still delivers more limited results than google.com.
A note on search criteria: The authors' searches use standard Google search syntax to request 1) pages on the specified web site (using the site:stormfront.org restriction), and 2) pages that lack a phrase of gibberish (using the exclusion syntax -asdfasdf), since some search term must be specified. Similar searches for other sites confirm that these search criteria provide a reliable estimate of the number of pages indexed by Google on a given web site.
The implication of these results -- confirmed in our subsequent searches on google.com versus google.fr and .de for the terms at issue -- is that the French and German versions of Google simply omit search results from the sites excluded from their respective versions of Google, and that this anecdotally appears to be because of pressure applied or perceived by the respective governments. Compare a search for Stormfront on google.com with the same search on google.de. Notice that the google.de results simply omit the three stormfront.org results that are at the top of the google.com list. (Confirm this for yourself with popups -- left window gives google.com results, right is google.de. Or, view our screenshot of the comparison.)
Of the sites excluded from Google results in France and Germany, some contain content known to be controversial, but the exclusion of others is less obvious. Seth Finkelstein points out that some of these names may have been transferred from one registrant to another, resulting in a significant change in the content available; however, Google may have failed to update its filtering list to reflect such transfers. Seth also notes that Google's filtering systems seem to fail to remove all pages that specify a port number (www4.stormfront.org:81 for example), suggesting that the filtering may be a relatively simple end-of-process add-on attached to the ordinary Google search logic.
Call for data
Since the filtering documented above may be but a part of such practices, we hope to augment our public database of such examples by seeding new searches with sites already known to be restricted, perhaps because someone simply searched on a known site and was surprised to find no results. We have built this page as a means for those with further information on this subject submit to details for verification and inclusion in our study. As an experiment in "open research," we will analyze submissions and publish results promptly on this page.
The authors offer two ways to get involved with research and testing in this area. First, those with information about Google filtering generally (or filtering by other search engines) can use this form to submit such information to the authors for inclusion in this site. Second, anyone interested can use the Real-Time Testing System, below, to test google.fr and google.de filtering of a specified site. Note that uses of this system are logged for future study, analysis, and publication.
See a separate listing of user-submitted sites found inaccessible to date.
Additional notes, comments, ideas, and errata
Note that Internet users in France and Germany need not use google.fr or google.de. While Google's geolocation systems typically automatically offer these sites to users in the corresponding countries, users retain the option to use the ordinary English-language google.com site.
In the authors' testing, every site found to be removed from German google.de results (65 sites total) was also removed from French google.fr results. A further 48 sites were removed only from google.fr results. However, the authors found no sites blocked only in German results but listed in France. The authors have further found that google.ch (Switzerland) exclusions seem to match results in France.
In addition to the search result exclusions described above, google.com and google.de/.fr may differ in other respects also. The authors have confirmed that the images.google.de advanced image search form fails to offer the user the option to enable or disable SafeSearch (filtering of sexually explicit images), while the corresponding images.google.com page lets the user choose whether or not to invoke this filtering. Sources in Germany suggest that all google.de image searches are performed with SafeSearch engaged to filter images thought to be sexually explicit. The authors note, however, that images.google.fr's images search does offer the user a choice as to the inclusion of sexually explicit images. The authors further note a divergence in results on images.google.com versus the .de and .fr sites (confirm via popup - left window gives google.com results, right gives google.de; confirm via screenshot). However, the cache feature of google.de and .fr seems to continue to provide archives even of the web sites excluded from these versions of Google.
Support for this project is provided by the Berkman Center for Internet & Society at Harvard Law School. Thanks to Wolfgang Bleh, Seth Finkelstein, Alvar Freude, and Aaron Swartz for guidance and suggestions.
* Jack N. and Lillian R.
Berkman Assistant Professor of Entrepreneurial Legal Studies, Harvard Law School.
** J.D. Candidate, Harvard Law School, 2005.