Sited Blocked by ADL HateFilter
Benjamin Edelman - Berkman Center for Internet & Society - Harvard Law School
This research is part of a series of projects with Professor Jonathan Zittrain.

[ Background - Methods - Results - Future Work ]

Abstract

Like numerous other Internet filtering programs, the Anti-Defamation League's HateFilter attempts to prevent users from knowing which specific web sites are deemed off-limits. However, this research presents a method for efficiently determining which specific sites are blocked, and this site reports results. Numerous sites are blocked that no longer offer content meeting ADL's definitions (if they ever did), including sites now offering other substantive content, sites that offer only error messages, and sites that no longer exist.

 

Related Projects

Background

As Internet use increased over the past decade, some Internet users have become inreasingly concerned with undesired exposure to content they deem offensive. In the United Staes, this concern typically focuses on exposure to pornography, prompting a series of laws (CDA, COPA, CIPA) intended to limit access to sexually explicit materials. But in Europe and among certain communities within the United States, this concern often focuses on hate speech -- sites that, as the ADL puts it, "advocate hatred, bigotry or even violence towards ... groups on the basis of their ... immutable characteristics."

In response to controversial materials available on the Internet, concerned citizens, companies, and governments have three distinct methods for preventing access.

Internet filters operate by checking each web page request against a list of prohibited sites, refusing those requests that match the prohibitions. In principle, filters might block only content consistent with their definitions. In practice, empirical study of filtering systems shows wide-scale overblocking of content beyond filters definitions. For example, the author found in his 2001 Sites Blocked by Internet Filtering Programs that commercial filtering programs Cyberpatrol, N2H2, Smartfilter, and Websense each overblocked substantial amounts of content, totalling many tens of thousands of sites. See also related work by Peacefire and by Seth Finkelstein. These various filtering errors are problematic in that they restrict access to content that should, according to filtering administrator's intentions, remain accessible.

With a few exceptions (e.g. NetNanny), the makers of filtering software keep their lists secret -- unavailable for review by users, system administrators, or the policy staff who select filtering systems in institutions like libraries and public schools. In many instances there is also reason to fear skewed incentives -- that filters emphasize blocking all content deemed objectionable, placing a lesser emphasis on allowing access to content not objectionable. To increase transparency as to the quality of filtering lists, the author therefore endeavors to document specific sites blocked -- first for the purpose of documenting overblocks, and second for the purpose of providing specific guidance, to potential or current users of filtering software, as to what specific web content is blocked by which filtering programs.

 

Return to top

Methodology

The Filter and Its Implementation

ADL HateFilter consists of two distinct components merged together by the Anti-Defamation League: First, ADL provides a software program called ICRAfilter, developed by the Internet Content Rating Association (a British non-profit receiving funding from, among other sources, the European Union's Safer Internet Action Plan). Second, ADL provides an encoded list of sites that "in the judgment of the Anti-Defamation League, advocate hatred, bigotry or even violence towards Jews or other groups on the basis of their religion, race, ethnicity, sexual orientation or other immutable characteristics." (cite)

Like other filtering systems, HateFilter seeks to keep secret its list of blocked sites. The ICRAfilter framework provides explicit guidance as to how to do so: the ICRA Template Creation Pack instructions link to a specification for encoding of URLs via hashing. As the Template Creation Pack explains, this process "put[s] each URL through a one-way encryption process" although it note that "Hash is the correct term, not encryption." (p 11) The Whatis.com glossary provides a more explicit definition of hashing: "the transformation of a string of characters into a ... fixed-length value or key that represents the original."

The ICRA Template Creation Pack continues by suggesting that hashing is "the electronic equivalent of making an omelette; there's no way you can take an omelette and reverse the process to put the eggs back in their shells." (p 11) As described in greater detail below -- this claim seems to overstate the effectiveness of the ICRA hashing system.

Research Methods

The ADL HateFilter installation includes a file called ADL20020318.PRF that follows the format specified in the ICRA Template Creation Pack and lists a total of 434 hashed URLs. Each hashed URL appears roughly as follows in the file:

"hashed:sha1=5BB879B924B85E0F38E9EABF2B25CC6CA46A3C4C"

Consistent with the labeling in this file and with the hashing specification, the 40-character "5BB..." string is an SHA1 hash of the URL http://1488.com . (Confirm using web-based SHA1 applet. To confirm, enter http://1488.com into the "password" box and observe that the resulting string, labeled "digest as hexadecimal," matches the "5BB..." listed above.)

The author obtained from the ADL20020318.PRF file all 434 distinct hashed URLs to be blocked by ADL HateFilter. He then hashed a large number of domain names and URLs in order to attempt to find input strings generating these hashed outputs. Initial input strings included all COM, NET, and ORG domain names that existed as of the summer of 2002. Subsequent input strings included all domain names, URLs, and truncated URLs (truncated hierarchically at directory boundaries) that Google suggested were related (using Google's related: syntax) to the matches found from initial input strings. Subsequent input strings also included all IP addresss currently associated with the matches from prior inputs.

These methods yielded a total of 226 matches -- domain names (with and without directory specifications) and IP addresses found to be blocked by ADL HateFilter. The targets of the remaining 208 hashed URLs remain unknown.

Research Methods in Context

In the past, testing of filtering software has followed two divergent methodologies. First, some researchers have sought to directly analyze the entire list of sites blocked by a given program, typically by decrypting its encrypted list ("the decryption method"). This was the method use by Eddy Janson and Matthew Skala, co-authors of The Breaking of CyberPatrol. Second, others have sought to indirectly analyze blocked sites by attempting access to a large number of URLs through a filtered Internet connection, monitoring which succeed and which are blocked ("the empirical method"). This was the method used by the author in his previous Sites Blocked by Internet Filtering Programs as well as his subsequent studies of filtering in China and Saudi Arabia.

The current project reflects something of a combination of these two methods. The current project follows the decryption method in that it begins by analyzing the actual block list, as encoded by the creators of the filtering software. However, it follows the empirical method in that entries on the block list are found by guessing -- by inquiring as to a series of URLs that might be blocked. Finally, it follows a novel method in that the method of determining whether a given URL is blocked is not simply comparing the URL to the unencrypted block list (as in the decryption method) nor actually seeking to access that URL (as under the empirical method) but instead performing a series of mathematical operations on the URL and comparing the result with the encoded entry on the block list.

Legal Concerns

This general line of research presents significant legal concerns. However, the author believes his methods do not give rise to liability.

This research does not give rise to liability under DMCA because the author did not circumvent a technological protection system. Indeed, the author did not decrypt anything. Instead, this research merely reflects performing a series of mathematical operations (the hash function described above) on certain URLs, followed by observing that certain hashed results match entries in a file.

This research does not give rise to liability under ADL HateFitler license agreement for at least two reasons. First, the author obtained the necessary ADL HateFilter configuration files without assenting to the ADL HateFilter license agreement. Instead, he obtained the ADL20020318.PRF file in the following way: Rather than selecting "I Agree" and pressing "Next" when prompted by the HateFilter installation program, the author browsed the contents of his \Temp directory, observed the presence of an icra.cab file, and made a copy of it for subsequent decompression and study; he subsequently cancelled the HateFilter installation without accepting its license agreement. Second, even had the author assented to the ADL HateFilter license agreement, its effectiveness is disputable. For example, accepting the license agreement entails affirming that the software will be used only by children under the installer's legal guardianship -- a term particularly unlikely to be enforced by most courts.

 

Return to top

Results

The listing below reports 226 distinct web sites and URLs found to be blocked by ADL HateFilter. Each listing includes screenshots, Google Directory category listings, and inbound link counts (where available)

Listing of specific blocked sites

Note that research has matched only 226 of 434 entries blocked by ADL HateFilter -- only slightly more than half the specific URLs blocked by HateFilter. The additional blocked URLs remain unknown.

The ADL HateFilter block list exists as a file on the hard disk of a computer with HateFilter installed, and the current design of HateFilter does not allow the automatic updating of this file from any remote server. (In contrast, certain commercial filters offer updates as often as once per day.) Indeed, the ADL HateFilter block list has not changed since March 18, 2002.

A portion of the specific sites blocked by HateFilter offer content that is likely to be controversial and that may well meet the ADL's filtering criteria. But other blocked sites definitely do not meet HateFilter's blocking criteria.

At present, ADL HateFilter's blocks seem to include the following categories of content:

Category Number of Such Sites % of Such Sites
Content meeting ADL's blocking criteria - "advocate hatred, bigotry, or even violence towards ... groups on the basis of their ... immutable characteristics"
Substantive content beyond ADL's criteria
Nonsubstantive content - domain registration services, sponsored links
Error messages

HateFilter includes the ability to block particular portions of sites (directories and/or specific files) as well as entire domain names. In some instances, ADL uses this functionality to block only portions of targeted sites, while leaving the remainder available. For example, HateFilter blocks dspace.dial.pipex.com/finalconflict but does not block other pages on the dspace.dial.pipex.com server, a server shared by many users of the Pipex service. However, in other instances HateFilter's blocks may be unnecessarilly broad -- blocking the entirety of a sensitive site when in fact only certain pages meet ADL's blocking criteria.

 

Return to top

Future Work

This research tracks that contemplated in the author's Edelman v. N2H2, seeking a declaratory judgment to protect research as to the controversial commercial filtering program N2H2. However, the corresponding N2H2 research is expected to require full decryption, unlike the hashing sufficient for this project. In addition, the author has received threats of suit from N2H2, whereas ADL has issued no such threats.

Prior work, including the author's Sites Blocked by Internet Filtering Programs, documents thousands of mistakes made by numerous Internet filters. That even ADL's HateFilter makes similar mistakes, despite a far narrower substantive focus and a far smaller block list, confirms the author's prior sense and prior conclusion that filtering is bound to result in numerous inaccurate blocking decisions. The author has identified a series of specific factors that tend to exacerbate the errors made by filtering programs:

[relate to CIPA being upheld, importance]

That ADL attempts to keep its block list secret is particularly noteworthy given ADL's practice in other context of "naming names." (See, among others, ADL's Terrorist Organization Profiles, audits of Anti-Semitic Incidents, and reports of recent hate crimes.)

Future work -- compare with other programs, track changes (or lack of changes) over time, match rest of block list.

 

 

This page is hosted on a server operated by the Berkman Center for Internet & Society at Harvard Law School, using space and resources made available to the author in his capacity as a Berkman Center student fellow for academic and other scholarly work. The work is his own, and the Berkman Center does not express a position on its contents.

Thumbnails by ABCDrawHTML.


Last Updated: September 14, 2003 - Sign up for notification of major updates and related work.