[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[dvd-discuss] Fwd: Net Archive Turns Back 10 Billion Pages of Time

>X-Authentication-Warning: deer-park.reservoir.com: majordomo set 
>sender to bounce-dcsb@reservoir.com using -f
>X-Sender: rahettinga@earthlink.net
>Date: Thu, 25 Oct 2001 14:14:51 -0400
>To: Digital Bearer Settlement List <dbs@philodox.com>, dcsb@ai.mit.edu
>From: "R. A. Hettinga" <rah@shipwright.com>
>Subject: Net Archive Turns Back 10 Billion Pages of Time
>Sender: bounce-dcsb@reservoir.com
>Reply-To: "R. A. Hettinga" <rah@shipwright.com>
>Net Archive Turns Back 10 Billion Pages of Time
>Web: Project launching today offers more text than Library of Congress.
>Times Staff Writer
>October 25 2001
>SAN FRANCISCO -- An Internet archive containing more text than any library
>in history will open its digital doors today, giving researchers and the
>public access to just about everything posted on the World Wide Web over
>the last five years.
>The free archive, created by a San Francisco computer entrepreneur named
>Brewster Kahle, allows academics to conduct the electronic equivalent of
>archeological digs, rooting through reams of material illustrating the
>evolution of the Web and its role in American society.
>The Internet Archive, informally called the Wayback Machine, holds more
>than 10 billion Web pages dating to 1996, including millions that had
>vanished as dot-coms collapsed, big companies scaled back or updated their
>offerings, and hobbyist Webmasters lost interest.
>Researchers and academics have likened Kahle to a modern-day Andrew
>Carnegie, the steel baron who endowed many of the nation's finest libraries.
>"Libraries are dedicated to collecting and making available the permanent
>historical record," said Diane Kresh, the Library of Congress' director for
>public service collections. She said trolling the Net is as significant as
>gathering books or periodicals.
>Want to see what the Heaven's Gate cult page looked like before the group's
>mass suicide? There it is. Want to see how Yahoo's pages have changed since
>1996? Step this way. Pages published by everyone from Fortune 500 companies
>to renegade porn merchants are stashed in the Internet Archive.
>The five-year, multimillion-dollar project has amassed five times as much
>text as the Library of Congress, which helped fund the archive along with
>Compaq Computer Corp., the National Science Foundation and the Smithsonian
>Institution. The more-than 100 terabytes of data are housed on 300 modified
>Hewlett-Packard desktop computers in a basement at San Francisco's Presidio.
>The effort to record Internet history has been directed and largely
>financed by Kahle, a 41-year-old former supercomputer technologist who sold
>one Web firm to America Online and another to Amazon.com.
>"The opportunity of our time is to offer universal access to all of human
>knowledge," Kahle said Wednesday from his office in the Presidio, a
>decommissioned military base near the Golden Gate Bridge. "We're at a
>unique point in time to offer universal access to anyone who walks into a
>library in Uganda."
>The Internet Archive uses automated "bots" to scour the Web. They capture
>sites and return what they find to the computers at the Presidio. The
>archive updates every two months. Once captured, the sites are organized
>chronologically. Users type in a Web address, and the archive displays
>versions of that site since 1996.
>Sites that require passwords or block bots are not captured. And if someone
>objects to their site being copied, the archive removes it.
>As smaller, less accessible versions of the archive were being compiled,
>Kahle's 30 staffers got a few complaints. After the staff explained that it
>wasn't personal, that they were copying everyone's sites, the vast majority
>decided they didn't mind, Kahle said.
>"Most people say, 'You're crazy, but go for it,' " Kahle said. "People want
>to be part of history."
>Candidates to use the service, at web.archive.org, include academics,
>journalists and researchers.
>"It will allow researchers to study the evolution of the Web in a way that
>is unprecedented," said research scientist Ed Chi of the Xerox Palo Alto
>Research Center. He said Xerox PARC scientists already are working on new
>user interfaces based on what the archive showed them about how people
>looked for information.
>Early on, "we suspect people will go look for their own pages and see if
>they can get copies of things that they've lost," Kahle said. "We're not
>exactly sure how this is going to be used. We're looking forward to being
>Like many Internet pioneers, however, Kahle faces unfamiliar risks along
>with the opportunities. The Internet Archive may be a massive violation of
>copyright law.
>"Brewster is taking an extraordinarily personal risk, because this is
>potentially a criminal offense," said Lawrence Lessig, an expert on
>intellectual property in cyberspace at Stanford University.
>Kahle doesn't anticipate getting sued, let alone serving jail time. His
>plan is to post whatever he can--and keep the archive growing.
>"We're not here to test laws," Kahle said. "We're trying to build a world
>we want to live in. The world without a library is a world without a
>memory, and that would be tragic."
>The legal questions may take years to resolve, Kahle and Lessig said.
>Consider the Industry Standard. At least some of that defunct magazine's
>articles are back online through Kahle's archive. But shareholder IDG paid
>more than $1 million for the Standard's assets, including rights to those
>stories. An IDG spokeswoman declined to say whether the company would ask
>the archive to drop the articles.
>Kahle said he isn't worrying about the hypotheticals. He's more excited
>about finding early www.whitehouse.gov pages from 1996 that dealt with
>airport safety and bioterrorism.
>Even better is what's to come.
>"The woman who is going to be elected president in 2024 is in high school
>now, and I bet she has a home page," Kahle said. "We have the future
>president's home page!"
>For information about reprinting this article, go to
>R. A. Hettinga <mailto: rah@ibuc.com>
>The Internet Bearer Underwriting Corporation <http://www.ibuc.com/>
>44 Farquhar Street, Boston, MA 02131 USA
>"... however it may deserve respect for its usefulness and antiquity,
>[predicting the end of the world] has not been found agreeable to
>experience." -- Edward Gibbon, 'Decline and Fall of the Roman Empire'
>For help on using this list (especially unsubscribing), send a message to
>"dcsb-request@reservoir.com" with one line of text: "help".