Library Innovation Lab

From Berkman Klein Google Summer of Code Wiki
Revision as of 16:05, 8 March 2011 by WikiSysop (talk | contribs) (New page: Two potential projects: 1. Syllabus parser. Design, structure and populate an open repository of the information in college syllabi. [Note that this project will be done in conjunction w...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Two potential projects:


1. Syllabus parser. Design, structure and populate an open repository of the information in college syllabi. [Note that this project will be done in conjunction with the Harvard Library Innovation Lab.]

  • Assuming we get permission, figure out how to retrieve syllabi from Google. (If we don't get permission, we have a starter set of 500,000+ syllabi.)
  • Figure out how to parse the multiple and free-form formats syllabi are found in.
  • Design an appropriate and open data model for the information in syllabi.
  • Build a Web site with that provides useful end-user and API access to the syllabus data.


2. Scholarly semantic web builder. The aim is to crawl the Google Books corpus looking for useful relationships among scholarly works. Such relationships only begin with citations/footnotes. What other semantic cues can be unearthed to see how scholarly books relate? [Note that this project will be done in conjunction with the Harvard Library Innovation Lab.]

  • Research the sorts of relations between books that would be of high value to scholars and researchers, in addition to footnotes.
  • Crawl the Google Books corpus to discover these relations.
  • Make these relations accessible in an open way, especially in conjunction with the ShelfLife app that provides community-based wayfaring through Harvard Library's holdings for scholars and researchers.
  • Create interesting and understandable analytics based on the discovered relationships.