Library Innovation Lab

From Berkman Klein Google Summer of Code Wiki
Revision as of 16:47, 8 March 2012 by WikiSysop (talk | contribs)
Jump to navigation Jump to search

About

The Library Innovation Lab is a small group within the Harvard University Library system that implements in software ideas about how libraries can be ever more valuable. We hack libraries...in the good sense of discovering and delivering more capability and value.

Find more information on the Library Innovation Lab here: librarylab.law.harvard.edu

STRUCTURED SITE SCRAPER: From site to collection

Many local libraries, historical societies, and cultural groups have created web sites displaying collections of digitized photos, scanned documents, oral histories, audio files, etc. Frequently these local treasures are on sites designed purely with end-user browsing in mind. They would be far more useful if they were more widely searchable and browsable. The team developing the Digital Public Library of America's software platform -- a metadata server -- would like to be able to gather metadata about such sites, discovering the heritage items they point to, capturing as much of the explicit metadata as possible (captions, labels, etc.), and using the structure of the site as a heuristic for parsing the collection's structure. This metadata would then be assimilated into the appropriate schema and would be imported into the DPLA's meta-catalog. The local curators would first be shown the data as parsed so they can make corrections to the content and structure. In addition, a site map would be generated for the local curators.