Mirror As You Link

From i4bi wiki
Jump to navigation Jump to search

Mirror As You Link (MAYL)

The Problem

Currently, there exists a wide array of individuals who would like to have online content preserved in some fashion, either because they fear that their own content is at risk, or because they are in some way reliant on someone else's content. A variety of these circumstances might include:

  - individuals and organizations who fear attacks or censorship through distributed denial of service attacks (DDOS)
  - political activists, demonstrators, and reporters who fear censorship by governments or other political entities through 
  - smaller websites that may suddenly become so popular as to be overloaded (commonly referred to as the Slashdot effect)
  - scholars and academics who wish to preserve copies of cited sources
  - individuals who are interested in online archiving, preservation, and web-robustness, generally


The Idea

Mirror As You Link (MAYL) is a project that seeks to help people preserve online content by making it easier (and more common) to mirror content. The idea is not to have cluttered redundancy as was common with the early Internet, as the content of one website would be more or less reproduced on a number of other sites, but rather to have content reproduced in the background, waiting for a situation where the original website itself is no longer available.

In practice, MAYL will work like this: An independent reporter links to the website of a political activist from Iran, and uses MAYL to ensure that if the activist's website goes down, the content that she linked to will still be available. While the activist's website is still up and available, the mirrored is not apparent or intrusive. Later, however, the activist's website goes down as a result of a DDOS attack. Without MAYL, the reporter would no longer be able to provide access to the activists content. However, with MAYL implemented, when a reader clicks on the link to the activist's website and finds that it is down, the reader will automatically be redirected to the mirror of the website, rendering the DDOS attack ineffective in silencing the activist.

Such an implementation would provide a means for those running servers to preserve online content, but would not provide a means for the bulk of netizens to do so. As such, we also envision creating a tool, much like TinyURL or WebCite, that would allow users without websites of their own to create mirrors of content. This service could be marketed as an online source preservation tool. Although the initial implementation would be centralized, the end goal is to create an Apache extension that could be adopted by those who would like to offer their own similar service. Once implemented, this type of content mirroring will enhance web-robustness in general, and create opportunities for wider application e.g. scholarship and archives.

Implementation Proposal

STAGE 1: Apache Extension TECHNICAL: A. There exists an open-source MAYL Apache extension available on package managers, sourceforge, etc. B. The extension enables the following two backend calls for Web page authors:

   		<SNAPSHOT ref=http... name="mysnapshot-9/23/11">: If unique(ref,name) does not exist, make a server-side copy of it and present a MAYL link.
   		<BACKUP ref=... name=... every=24h>: As snapshot, but update the snapshot every 24h.
   		Various other configuration options might exist.

C. The server administrator (as opposed to Web page authors) can install server policy that overrides author-created MAYL links, e.g.

   		"Update backups no more than once every 48h."
   		"Use no more than 1MB for snapshots."
   		etc.

D. The snapshot target can also install policy at "mirrors.txt", e.g.

   		"Make a backup of me no more than once every 48h."
   		"Also mirror the following pages..."
   		"Contact me@gmail.com if you have any questions."
   		etc.

E. When a MAYL link is followed, the server shows the target page in an iframe that says,

   		"A MAYL snapshot exists of this page. Would you like to see it? Yes/Dismiss/About"

LEGAL/SOCIAL: Our hope is to encourage server adoption by separating administrator and author liability. In particular, we want a server to be able to say in its terms of service, "You are responsible for all content mirrored by MAYL links," and achieve DMCA safe-harbor by complying with takedown requests on specific links. This seems in no way different from content on the actual pages (e.g. pictures), and we don't believe servers would be inheriting any new legal responsibilities.

We would also hope that mirrors.txt serves as an opt-in notice, and that the contact info in mirrors.txt is a location to forward legal requests to (e.g. DMCA takedowns).

We could pitch this to the Wikimedia Foundation as "Install this extension and you will have time-aware snapshot citations."


STAGE 2: Web Service TECHNICAL: A. We provide a Web site almost identical to tinyurl. It has the MAYL Apache extension. B. Users with an account can create MAYL snapshot and backup "tinyurls" to use on their own pages. C. Optionally, like tinyurl we can provide browser extensions that plug into this service.

LEGAL/SOCIAL: The purpose of this service is to encourage MAYL adoption by authors who do not control their own servers. Our terms of service states that users are responsible for the copies they make. We do not provide a content finding service: users are responsible for making others aware of their copies. If necessary, we could strictly refuse to make snapshots of pages without mirrors.txt installed. My goal is that we would have the same legal status as dropbox and tinyurl.


STAGE 3: Distribution TECHNICAL: A. A mirrors.txt file may optionally identify a bittorrent/DHT tracker. B. If so, then a MAYL-enabled server will join the torrent rather than taking snapshots. C. When the shapshot is requested, the server provides the page distributed in the torrent. D. The torrent is cryptographically signed such that only the owner of the page can update.

LEGAL/SOCIAL: This final stage of the project provides robustness against content inspection, secure updates of the target content, etc.