From Berkman Klein Google Summer of Code Wiki
Jump to navigation
Jump to search
|
|
| Line 1: |
Line 1: |
| Paper Machines is the project of a metaLAB-affiliated scholar seeking to develop a scripting, analysis, and visualization toolkit for rapidly transforming the ephemeral, paper-based archives of development and advocacy organizations into digital textual archives durable and flexible enough to be used by scholars, journalists, and political actors.
| |
|
| |
|
| Working with the scholar, the coder will specify and develop tools for batch-processing large numbers of scanned documents into corpora parsable by regular expression, named entities, geospatial, and other data types, and deliver those copora to a web-based interface for displaying analyses and visualizations.
| |
|
| |
| The incumbent will report to Joann Guldi, historian (Harvard Society of Fellows) and Matthew Battles (metaLAB).
| |
|
| |
| Skills desired include: Python, Ruby, Processing, and development of web interfaces and applications. Understanding of the needs of data-driven digital humanities research using large textual corpora in pdf, plain-text, html, and xml formats.
| |
Revision as of 21:57, 13 March 2012