Library Innovation Lab: Difference between revisions

From Berkman Klein Google Summer of Code Wiki
Jump to navigation Jump to search
(New page: Two potential projects: 1. Syllabus parser. Design, structure and populate an open repository of the information in college syllabi. [Note that this project will be done in conjunction w...)
 
(moved a project)
 
(9 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Two potential projects:
=About=
The Library Innovation Lab is a small group within the Harvard University Library system that implements in software ideas about how libraries can be ever more valuable. We hack libraries...in the good sense of discovering and delivering more capability and value.  Find more information on the Library Innovation Lab here: [http://librarylab.law.harvard.edu/ librarylab.law.harvard.edu]


==Long Tail Browser (Library Interestingness)==


1. Syllabus parser. Design, structure and populate an open repository of the information in college syllabi. [Note that this project will be done in conjunction with the Harvard Library Innovation Lab.]
The Harvard Library system has available a rich set of metadata about the 12 million books and other items in its collection. This includes "event" data such as circulation records broken down by school and borrower type, which works have been called back from loans early, items on reserve, items ordered by the Harvard Coop, and more. It isn't hard to come up with useful ranking algorithms that employ  this data. It is more challenging to devise interestingness algorithms. And it is more challenging still to devise algorithms that will find interesting and relevant works in the long tail. We are therefore proposing a project that explores using every scrap of metadata to provide search results based on interestingness, and that discerns interestingness in items in the long tail.


*Assuming we get permission, figure out how to retrieve syllabi from Google. (If we don't get permission, we have a starter set of 500,000+ syllabi.)
*Figure out how to parse the multiple and free-form formats syllabi are found in.
*Design an appropriate and open data model for the information in syllabi.
*Build a Web site with that provides useful end-user and API access to the syllabus data.


Mentor: [mailto:mphillips@law.harvard.edu mphillips@law.harvard.edu]


2. Scholarly semantic web builder. The aim is to crawl the Google Books corpus looking for useful relationships among scholarly works. Such relationships only begin with citations/footnotes. What other semantic cues can be unearthed to see how scholarly books relate? [Note that this project will be done in conjunction with the Harvard Library Innovation Lab.]
General Questions: [mailto:berkmancenterharvard@gmail.com berkmancenterharvard@gmail.com]
 
*Research the sorts of relations between books that would be of high value to scholars and researchers, in addition to footnotes.
*Crawl the Google Books corpus to discover these relations.
*Make these relations accessible in an open way, especially in conjunction with the ShelfLife app that provides community-based wayfaring through Harvard Library's holdings for scholars and researchers.
*Create interesting and understandable analytics based on the discovered relationships.

Latest revision as of 13:55, 20 March 2012

About

The Library Innovation Lab is a small group within the Harvard University Library system that implements in software ideas about how libraries can be ever more valuable. We hack libraries...in the good sense of discovering and delivering more capability and value. Find more information on the Library Innovation Lab here: librarylab.law.harvard.edu

Long Tail Browser (Library Interestingness)

The Harvard Library system has available a rich set of metadata about the 12 million books and other items in its collection. This includes "event" data such as circulation records broken down by school and borrower type, which works have been called back from loans early, items on reserve, items ordered by the Harvard Coop, and more. It isn't hard to come up with useful ranking algorithms that employ this data. It is more challenging to devise interestingness algorithms. And it is more challenging still to devise algorithms that will find interesting and relevant works in the long tail. We are therefore proposing a project that explores using every scrap of metadata to provide search results based on interestingness, and that discerns interestingness in items in the long tail.


Mentor: mphillips@law.harvard.edu

General Questions: berkmancenterharvard@gmail.com