Paper Machines: Difference between revisions

From Berkman Klein Google Summer of Code Wiki
Jump to navigation Jump to search
(Description of Paper Machines.)
 
No edit summary
 
(3 intermediate revisions by the same user not shown)
Line 3: Line 3:
Working with the scholar, the coder will specify and develop tools for batch-processing large numbers of scanned documents into corpora parsable by regular expression, named entities, geospatial, and other data types, and deliver those copora to a web-based interface for displaying analyses and visualizations.
Working with the scholar, the coder will specify and develop tools for batch-processing large numbers of scanned documents into corpora parsable by regular expression, named entities, geospatial, and other data types, and deliver those copora to a web-based interface for displaying analyses and visualizations.


The incumbent will report to Joann Guldi, historian (Harvard Society of Fellows) and Matthew Battles (metaLAB).
The incumbent will report to Jo Guldi, historian (Harvard Society of Fellows) and Matthew Battles (metaLAB).


Skills desired include: Python, Ruby, Processing, and development of web interfaces and applications. Understanding of the needs of data-driven digital humanities research using large textual corpora in pdf, plain-text, html, and xml formats.
Skills:  
*Python, Ruby, Processing  
*HTML & CSS
*Understanding of data-driven digital humanities research and the processing of large textual corpora in pdf, plain-text, html, and xml formats.
 
 
Mentor: [mailto:matthew@metalab.harvard.edu matthew@metalab.harvard.edu]
 
General Questions: [mailto:berkmancenterharvard@gmail.com berkmancenterharvard@gmail.com]

Latest revision as of 13:39, 19 March 2012

Paper Machines is the project of a metaLAB-affiliated scholar seeking to develop a scripting, analysis, and visualization toolkit for rapidly transforming the ephemeral, paper-based archives of development and advocacy organizations into digital textual archives durable and flexible enough to be used by scholars, journalists, and political actors.

Working with the scholar, the coder will specify and develop tools for batch-processing large numbers of scanned documents into corpora parsable by regular expression, named entities, geospatial, and other data types, and deliver those copora to a web-based interface for displaying analyses and visualizations.

The incumbent will report to Jo Guldi, historian (Harvard Society of Fellows) and Matthew Battles (metaLAB).

Skills:

  • Python, Ruby, Processing
  • HTML & CSS
  • Understanding of data-driven digital humanities research and the processing of large textual corpora in pdf, plain-text, html, and xml formats.


Mentor: matthew@metalab.harvard.edu

General Questions: berkmancenterharvard@gmail.com