https://cyber.harvard.edu/gsoc/api.php?action=feedcontributions&user=BerkmanSysop&feedformat=atomBerkman Klein Google Summer of Code Wiki - User contributions [en]2024-03-28T13:01:14ZUser contributionsMediaWiki 1.39.5https://cyber.harvard.edu/gsoc/?title=MediaCloud&diff=668MediaCloud2019-04-02T18:37:34Z<p>BerkmanSysop: /* Build a tool to do some cool visualizations */</p>
<hr />
<div>[https://mediacloud.org/ Media Cloud] is an open source platform for studying media ecosystems. Media Cloud is joint project between [https://cyber.harvard.edu The Berkman Klein Center for Internet and Society at Harvard University] and [https://civic.mit.edu The Center for Civic Media] at MIT's Media Lab. <br />
<br />
By tracking hundreds of millions of stories published online or broadcast via television, our suite of tools allows researchers to track how stories and ideas spread through media, and how different corners of the media ecosystem report on stories.<br />
<br />
Our platform is designed to aggregate, analyze, deliver and visualize information, answering complex quantitative and qualitative questions about the content of online media.<br />
<br />
*'''Aggregate''': We have aggregated billions of online stories from an ever-growing set of 1,200,000+ digital media sources. We ingest data via RSS feeds and a set of robots that spider the web to fetch information from a variety of sources in near real-time.<br />
<br />
*'''Analyze''': To query our extensive library of data, we have developed a suite of analytical tools that allow you to explore relationships between professional and citizen media, and between online and offline sources.<br />
<br />
*'''Deliver and Visualize''': Our suite of tools provides opportunities to present data in formats that you can visualize in your own interfaces. These include the use of graphs, geographic maps, word clouds, network visualizations.<br />
<br />
'''Project URL''': https://mediacloud.org/<br />
<br />
'''Project on GitHub''': https://github.com/berkmancenter/mediacloud<br />
<br />
'''Project Mentors''': [mailto:linas@media.mit.edu Linas Valiukas], [mailto:hroberts@cyber.law.harvard.edu Hal Roberts]<br />
<br />
=Project Ideas=<br />
<br />
==Create a self-contained, browser-based page HTML -> article HTML extractor==<br />
'''Problem Statement:'''<br />
For every fetched news article, we have to figure out which part of the page HTML contains the article body itself. We currently use readability-lxml (https://github.com/buriy/python-readability) for that task. However, readability-lxml is aging fast and is not necessarily still the best library around to extract body of the article from the HTML page. Also, more and more articles get loaded using JavaScript due to an ongoing "frontend everywhere!" frenzy, and our Python extractor doesn’t execute or support JavaScript. Lastly, various CDNs, e.g. Cloudflare, are blocking our crawler just because our user agent doesn't have JavaScript enabled.<br />
<br />
I think inevitably we'll have to switch to running a headless browser, loading each and every downloaded story in it, and then applying a well-supported third-party library, e.g. Mozilla's Readability (https://github.com/mozilla/readability), to extract article title, author and body.<br />
<br />
===Development Tasks===<br />
<br />
*Set up a chromeless browser<br />
*Set up Readability<br />
*Develop a HTTP service that accepts a parameter URL (and/or HTML body), loads it in the browser, runs Readability's magic, and returns the extracted HTML back to the requester.<br />
*Package everything in a nice Docker image<br />
<br />
There are similar projects like this, e.g. https://github.com/schollz/readable, so you’ll need to do some research into whether such a thing exists already before submitting a proposal. Maybe an existing tool could be improved instead of redoing everything from scratch?<br />
<br />
<br />
<br />
<br />
==Write a spec to a new generation of our API==<br />
<br />
'''Problem Statement:'''<br />
Create a specification for a new version of our API (https://github.com/berkmancenter/mediacloud/tree/release/doc/api_2_0_spec). Our existing API (implemented in Perl) is inconsistent among its different major parts, and is goofily un-REST-ish in several places. We would like to reimplement it to Python and use a modern framework for API specification (OpenAPI), implementation, and testing.<br />
<br />
===Development Tasks===<br />
*With the help of the team, identify which API calls can be renamed to more sensible names, extended or deprecated<br />
*With the help of the team, rewrite API call descriptions to make them more comprehensible<br />
*Rewrite a API spec using a chosen tool (e.g. OpenAPI) using best RESTful practices<br />
*Set up API demo (e.g. using Swagger UI)<br />
<br />
<br />
<br />
==Rewrite Ultimate Sitemap Parser to yield results instead of returning them==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). The current implementation of the sitemap parser fetches all of the sitemap links and returns it to a caller in a single easy-to-use object. However, it turns out that some websites have *massive* sitemap trees! In those cases, the sitemap parser uses up lots of RAM for its operation, and the client is forced to wait for a long time to get sitemap fetching + parsing results. For those reasons, we’d like the sitemap parser to “yield” links found in a sitemap instead of “returning” them while also maintaining a nice, comprehensible interface to the sitemap parser: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/2<br />
<br />
===Development Tasks===<br />
* Rewrite sitemap parser to yield found sitemap links instead of returning them to conserve memory and make results usable faster<br />
<br />
<br />
<br />
==Make Ultimate Sitemap Parser use asyncio==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). In production, fetching and parsing XML sitemaps it’s mostly a CPU-intensive operation as the most time gets spent on gunzipping sitemaps, parsing XML and creating objects out of them, but my guess is that the sitemap parser could be made 10-20% faster by doing I/O (namely the fetching part) asynchronously.<br />
<br />
===Development Tasks===<br />
*Rewrite sitemap parser to fetch sitemaps asynchronously<br />
*Find other ways where I/O could be made asynchronous<br />
<br />
==Detect sitemap if it’s not linked from robots.txt in Ultimate Sitemap Parser==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). Most of sitemaps are being linked to in website’s robots.txt, but some are not. We would like to try common paths of sitemaps (e.g. /sitemap.xml[.gz]) on every site nonetheless: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/8<br />
<br />
=== Development Tasks ===<br />
* Update the module in such a way that it tries common sitemap locations independently from robots.txt<br />
<br />
<br />
<br />
==Add RSS / Atom sitemap support to our Ultimate Sitemap Parser==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). Most of those sitemaps are implemented in Sitemap XML protocol, but a small number of sitemaps are published in RSS / Atom, and we’d like to have support for those too: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/3<br />
<br />
=== Development Tasks ===<br />
*Add RSS / Atom support to Ultimate Sitemap Parser<br />
<br />
<br />
<br />
==Build a tool to do some cool visualizations==<br />
'''Problem Statement''' Since 2008, we have collected more than a half billion news articles that we have post-processed and indexed. We know quite a lot about them -- which news articles were the most linked to from other similar articles, the most and least popular / influential articles (based on shares on Facebook, tweet count, or clicks on an article's Bit.ly shortened link), specific language and terms used to describe the subject matter in each of the articles, etc., and there's a lot of potential to learn much more. Can you use your design and coding skills to help us out in visualising some of this data, e.g. create a cool network map visualization tool?<br />
<br />
===Development Tasks===<br />
*Build any visualization tool based on our extensive data and tool set:<br />
**Figure out what you'd like to visualise and how are you going to do it<br />
**Use Gephi, a tool of your choice, or create your very own tool to implement your visualisation<br />
<br />
==Create PostgreSQL-based job queue==<br />
''Problem Description''. In more than eight (or is it nine by now?) years since we've been running Media Cloud, we have tried multiple job queue tools (e.g. Gearman) that we could use for dividing and conquering our workload. Unfortunately, all the tools (including the current one -- go look into the codebase to figure out which one it is now) have left us deeply unhappy because of one reason or another. If there's one tool which hasn’t let us down, it’s PostgreSQL. So, we'd like to also try running our job queue on Postgres. Can you implement it for us?<br />
<br />
===Development Tasks===<br />
*Write a spec, complete with code samples, on how to implement the following job queue:<br />
**Preferably programming language-agnostic, i.e. should run as a bunch of PL/pgSQL functions.<br />
***Maybe that's a bad idea, I don't know, you tell us.<br />
*Features:<br />
**Add jobs with names and JSON arguments<br />
**Cancel jobs by their ID<br />
**Track job's progress (and log?) by their ID<br />
**Get job ID using its JSON parameters<br />
**Merge jobs with identical JSON arguments into a single job<br />
**See job stats per task, i.e. how many jobs are queued for every task<br />
**Retry failed jobs<br />
**Report job failure, complete with error messages<br />
**Proper locking (for inspiration, see https://github.com/chanks/que)<br />
**Doesn't catch fire with tens of millions of queued jobs<br />
*(Bonus points) Actually implement the queue! If you don't get to doing this over the summer, it's fine, we would be happy with a proven spec.<br />
<br />
<br />
<br />
==Implement a method to detect subtopics of a topic==<br />
*Problem Statement.* As described elsewhere, a "topic" is subject discussed by the media that we are researching. Almost every big topic contains subtopics, e.g. the matters of immigration, racism, email server security and a plethora of other subjects were discussed during the last US election. We would like to investigate ways of how we could automatically detect those subtopics, possibly using the [Louvain method](https://en.wikipedia.org/wiki/Louvain_Modularity).<br />
<br />
===Development Tasks===<br />
Develop a proof of concept (un)supervised ML tool for detecting subtopics of a chosen subject ("topic").<br />
<br />
<br />
<br />
==Do your own freehand project==<br />
Problem Statement. If you had more than half a billion (!) news articles from all around the world stored in a single place, extracted from HTML into text, split into sentences, words, and made searchable, what would you do? Propose us something we didn't think of, and we will surely consider it!<br />
<br />
===Development Tasks. ===<br />
Left as an exercise to the student.<br />
<br />
<br />
<br />
=Skill Requirements for Potential Candidates=<br />
*Working knowledge of Perl or Python<br />
*Familiarity with relational databases, preferably PostgreSQL<br />
*Some pedantism<br />
*Willingness to propose, debate and object to ideas<br />
*Keen to work with us on writing your GSoC project proposal, as opposed to just submitting a long shot without any feedback and hoping for the best<br />
*Shown effort into learning what Media Cloud is all about; some ideas:<br />
**Make a pull request to our main code repository (https://github.com/berkmancenter/mediacloud),<br />
**Craft us an email with a smart question or two,<br />
**Try our our tools (see https://mediacloud.org/),<br />
**Run Media Cloud yourself and collect some news articles (see https://github.com/berkmancenter/mediacloud/blob/master/doc/vagrant.markdown),<br />
**Sign up and check out our API (see https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/api_2_0_spec.md, https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/topics_api_2_0_spec.md, and the API client at https://pypi.python.org/pypi/mediacloud/).</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=MediaCloud&diff=667MediaCloud2019-04-02T18:36:29Z<p>BerkmanSysop: </p>
<hr />
<div>[https://mediacloud.org/ Media Cloud] is an open source platform for studying media ecosystems. Media Cloud is joint project between [https://cyber.harvard.edu The Berkman Klein Center for Internet and Society at Harvard University] and [https://civic.mit.edu The Center for Civic Media] at MIT's Media Lab. <br />
<br />
By tracking hundreds of millions of stories published online or broadcast via television, our suite of tools allows researchers to track how stories and ideas spread through media, and how different corners of the media ecosystem report on stories.<br />
<br />
Our platform is designed to aggregate, analyze, deliver and visualize information, answering complex quantitative and qualitative questions about the content of online media.<br />
<br />
*'''Aggregate''': We have aggregated billions of online stories from an ever-growing set of 1,200,000+ digital media sources. We ingest data via RSS feeds and a set of robots that spider the web to fetch information from a variety of sources in near real-time.<br />
<br />
*'''Analyze''': To query our extensive library of data, we have developed a suite of analytical tools that allow you to explore relationships between professional and citizen media, and between online and offline sources.<br />
<br />
*'''Deliver and Visualize''': Our suite of tools provides opportunities to present data in formats that you can visualize in your own interfaces. These include the use of graphs, geographic maps, word clouds, network visualizations.<br />
<br />
'''Project URL''': https://mediacloud.org/<br />
<br />
'''Project on GitHub''': https://github.com/berkmancenter/mediacloud<br />
<br />
'''Project Mentors''': [mailto:linas@media.mit.edu Linas Valiukas], [mailto:hroberts@cyber.law.harvard.edu Hal Roberts]<br />
<br />
=Project Ideas=<br />
<br />
==Create a self-contained, browser-based page HTML -> article HTML extractor==<br />
'''Problem Statement:'''<br />
For every fetched news article, we have to figure out which part of the page HTML contains the article body itself. We currently use readability-lxml (https://github.com/buriy/python-readability) for that task. However, readability-lxml is aging fast and is not necessarily still the best library around to extract body of the article from the HTML page. Also, more and more articles get loaded using JavaScript due to an ongoing "frontend everywhere!" frenzy, and our Python extractor doesn’t execute or support JavaScript. Lastly, various CDNs, e.g. Cloudflare, are blocking our crawler just because our user agent doesn't have JavaScript enabled.<br />
<br />
I think inevitably we'll have to switch to running a headless browser, loading each and every downloaded story in it, and then applying a well-supported third-party library, e.g. Mozilla's Readability (https://github.com/mozilla/readability), to extract article title, author and body.<br />
<br />
===Development Tasks===<br />
<br />
*Set up a chromeless browser<br />
*Set up Readability<br />
*Develop a HTTP service that accepts a parameter URL (and/or HTML body), loads it in the browser, runs Readability's magic, and returns the extracted HTML back to the requester.<br />
*Package everything in a nice Docker image<br />
<br />
There are similar projects like this, e.g. https://github.com/schollz/readable, so you’ll need to do some research into whether such a thing exists already before submitting a proposal. Maybe an existing tool could be improved instead of redoing everything from scratch?<br />
<br />
<br />
<br />
<br />
==Write a spec to a new generation of our API==<br />
<br />
'''Problem Statement:'''<br />
Create a specification for a new version of our API (https://github.com/berkmancenter/mediacloud/tree/release/doc/api_2_0_spec). Our existing API (implemented in Perl) is inconsistent among its different major parts, and is goofily un-REST-ish in several places. We would like to reimplement it to Python and use a modern framework for API specification (OpenAPI), implementation, and testing.<br />
<br />
===Development Tasks===<br />
*With the help of the team, identify which API calls can be renamed to more sensible names, extended or deprecated<br />
*With the help of the team, rewrite API call descriptions to make them more comprehensible<br />
*Rewrite a API spec using a chosen tool (e.g. OpenAPI) using best RESTful practices<br />
*Set up API demo (e.g. using Swagger UI)<br />
<br />
<br />
<br />
==Rewrite Ultimate Sitemap Parser to yield results instead of returning them==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). The current implementation of the sitemap parser fetches all of the sitemap links and returns it to a caller in a single easy-to-use object. However, it turns out that some websites have *massive* sitemap trees! In those cases, the sitemap parser uses up lots of RAM for its operation, and the client is forced to wait for a long time to get sitemap fetching + parsing results. For those reasons, we’d like the sitemap parser to “yield” links found in a sitemap instead of “returning” them while also maintaining a nice, comprehensible interface to the sitemap parser: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/2<br />
<br />
===Development Tasks===<br />
* Rewrite sitemap parser to yield found sitemap links instead of returning them to conserve memory and make results usable faster<br />
<br />
<br />
<br />
==Make Ultimate Sitemap Parser use asyncio==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). In production, fetching and parsing XML sitemaps it’s mostly a CPU-intensive operation as the most time gets spent on gunzipping sitemaps, parsing XML and creating objects out of them, but my guess is that the sitemap parser could be made 10-20% faster by doing I/O (namely the fetching part) asynchronously.<br />
<br />
===Development Tasks===<br />
*Rewrite sitemap parser to fetch sitemaps asynchronously<br />
*Find other ways where I/O could be made asynchronous<br />
<br />
==Detect sitemap if it’s not linked from robots.txt in Ultimate Sitemap Parser==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). Most of sitemaps are being linked to in website’s robots.txt, but some are not. We would like to try common paths of sitemaps (e.g. /sitemap.xml[.gz]) on every site nonetheless: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/8<br />
<br />
=== Development Tasks ===<br />
* Update the module in such a way that it tries common sitemap locations independently from robots.txt<br />
<br />
<br />
<br />
==Add RSS / Atom sitemap support to our Ultimate Sitemap Parser==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). Most of those sitemaps are implemented in Sitemap XML protocol, but a small number of sitemaps are published in RSS / Atom, and we’d like to have support for those too: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/3<br />
<br />
=== Development Tasks ===<br />
*Add RSS / Atom support to Ultimate Sitemap Parser<br />
<br />
<br />
<br />
==Build a tool to do some cool visualizations==<br />
''Problem Statement'' Since 2008, we have collected more than a half billion news articles that we have post-processed and indexed. We know quite a lot about them -- which news articles were the most linked to from other similar articles, the most and least popular / influential articles (based on shares on Facebook, tweet count, or clicks on an article's Bit.ly shortened link), specific language and terms used to describe the subject matter in each of the articles, etc., and there's a lot of potential to learn much more. Can you use your design and coding skills to help us out in visualising some of this data, e.g. create a cool network map visualization tool?<br />
<br />
===Development Tasks===<br />
*Build any visualization tool based on our extensive data and tool set:<br />
**Figure out what you'd like to visualise and how are you going to do it<br />
**Use Gephi, a tool of your choice, or create your very own tool to implement your visualisation<br />
<br />
<br />
<br />
==Create PostgreSQL-based job queue==<br />
''Problem Description''. In more than eight (or is it nine by now?) years since we've been running Media Cloud, we have tried multiple job queue tools (e.g. Gearman) that we could use for dividing and conquering our workload. Unfortunately, all the tools (including the current one -- go look into the codebase to figure out which one it is now) have left us deeply unhappy because of one reason or another. If there's one tool which hasn’t let us down, it’s PostgreSQL. So, we'd like to also try running our job queue on Postgres. Can you implement it for us?<br />
<br />
===Development Tasks===<br />
*Write a spec, complete with code samples, on how to implement the following job queue:<br />
**Preferably programming language-agnostic, i.e. should run as a bunch of PL/pgSQL functions.<br />
***Maybe that's a bad idea, I don't know, you tell us.<br />
*Features:<br />
**Add jobs with names and JSON arguments<br />
**Cancel jobs by their ID<br />
**Track job's progress (and log?) by their ID<br />
**Get job ID using its JSON parameters<br />
**Merge jobs with identical JSON arguments into a single job<br />
**See job stats per task, i.e. how many jobs are queued for every task<br />
**Retry failed jobs<br />
**Report job failure, complete with error messages<br />
**Proper locking (for inspiration, see https://github.com/chanks/que)<br />
**Doesn't catch fire with tens of millions of queued jobs<br />
*(Bonus points) Actually implement the queue! If you don't get to doing this over the summer, it's fine, we would be happy with a proven spec.<br />
<br />
<br />
<br />
==Implement a method to detect subtopics of a topic==<br />
*Problem Statement.* As described elsewhere, a "topic" is subject discussed by the media that we are researching. Almost every big topic contains subtopics, e.g. the matters of immigration, racism, email server security and a plethora of other subjects were discussed during the last US election. We would like to investigate ways of how we could automatically detect those subtopics, possibly using the [Louvain method](https://en.wikipedia.org/wiki/Louvain_Modularity).<br />
<br />
===Development Tasks===<br />
Develop a proof of concept (un)supervised ML tool for detecting subtopics of a chosen subject ("topic").<br />
<br />
<br />
<br />
==Do your own freehand project==<br />
Problem Statement. If you had more than half a billion (!) news articles from all around the world stored in a single place, extracted from HTML into text, split into sentences, words, and made searchable, what would you do? Propose us something we didn't think of, and we will surely consider it!<br />
<br />
===Development Tasks. ===<br />
Left as an exercise to the student.<br />
<br />
<br />
<br />
=Skill Requirements for Potential Candidates=<br />
*Working knowledge of Perl or Python<br />
*Familiarity with relational databases, preferably PostgreSQL<br />
*Some pedantism<br />
*Willingness to propose, debate and object to ideas<br />
*Keen to work with us on writing your GSoC project proposal, as opposed to just submitting a long shot without any feedback and hoping for the best<br />
*Shown effort into learning what Media Cloud is all about; some ideas:<br />
**Make a pull request to our main code repository (https://github.com/berkmancenter/mediacloud),<br />
**Craft us an email with a smart question or two,<br />
**Try our our tools (see https://mediacloud.org/),<br />
**Run Media Cloud yourself and collect some news articles (see https://github.com/berkmancenter/mediacloud/blob/master/doc/vagrant.markdown),<br />
**Sign up and check out our API (see https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/api_2_0_spec.md, https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/topics_api_2_0_spec.md, and the API client at https://pypi.python.org/pypi/mediacloud/).</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=MediaCloud&diff=666MediaCloud2019-04-02T18:35:51Z<p>BerkmanSysop: </p>
<hr />
<div>[https://mediacloud.org/ Media Cloud] is an open source platform for studying media ecosystems. Media Cloud is joint project between [https://cyber.harvard.edu The Berkman Klein Center for Internet and Society at Harvard University] and [https://civic.mit.edu The Center for Civic Media] at MIT's Media Lab. <br />
<br />
By tracking hundreds of millions of stories published online or broadcast via television, our suite of tools allows researchers to track how stories and ideas spread through media, and how different corners of the media ecosystem report on stories.<br />
<br />
Our platform is designed to aggregate, analyze, deliver and visualize information, answering complex quantitative and qualitative questions about the content of online media.<br />
<br />
*Aggregate: We have aggregated billions of online stories from an ever-growing set of 1,200,000+ digital media sources. We ingest data via RSS feeds and a set of robots that spider the web to fetch information from a variety of sources in near real-time.<br />
<br />
*Analyze: To query our extensive library of data, we have developed a suite of analytical tools that allow you to explore relationships between professional and citizen media, and between online and offline sources.<br />
<br />
*Deliver and Visualize: Our suite of tools provides opportunities to present data in formats that you can visualize in your own interfaces. These include the use of graphs, geographic maps, word clouds, network visualizations.<br />
<br />
Project URL: https://mediacloud.org/<br />
<br />
Project on GitHub: https://github.com/berkmancenter/mediacloud<br />
<br />
Project Mentors: [mailto:linas@media.mit.edu Linas Valiukas], [mailto:hroberts@cyber.law.harvard.edu Hal Roberts]<br />
<br />
=Project Ideas=<br />
<br />
==Create a self-contained, browser-based page HTML -> article HTML extractor==<br />
'''Problem Statement:'''<br />
For every fetched news article, we have to figure out which part of the page HTML contains the article body itself. We currently use readability-lxml (https://github.com/buriy/python-readability) for that task. However, readability-lxml is aging fast and is not necessarily still the best library around to extract body of the article from the HTML page. Also, more and more articles get loaded using JavaScript due to an ongoing "frontend everywhere!" frenzy, and our Python extractor doesn’t execute or support JavaScript. Lastly, various CDNs, e.g. Cloudflare, are blocking our crawler just because our user agent doesn't have JavaScript enabled.<br />
<br />
I think inevitably we'll have to switch to running a headless browser, loading each and every downloaded story in it, and then applying a well-supported third-party library, e.g. Mozilla's Readability (https://github.com/mozilla/readability), to extract article title, author and body.<br />
<br />
===Development Tasks===<br />
<br />
*Set up a chromeless browser<br />
*Set up Readability<br />
*Develop a HTTP service that accepts a parameter URL (and/or HTML body), loads it in the browser, runs Readability's magic, and returns the extracted HTML back to the requester.<br />
*Package everything in a nice Docker image<br />
<br />
There are similar projects like this, e.g. https://github.com/schollz/readable, so you’ll need to do some research into whether such a thing exists already before submitting a proposal. Maybe an existing tool could be improved instead of redoing everything from scratch?<br />
<br />
<br />
<br />
<br />
==Write a spec to a new generation of our API==<br />
<br />
'''Problem Statement:'''<br />
Create a specification for a new version of our API (https://github.com/berkmancenter/mediacloud/tree/release/doc/api_2_0_spec). Our existing API (implemented in Perl) is inconsistent among its different major parts, and is goofily un-REST-ish in several places. We would like to reimplement it to Python and use a modern framework for API specification (OpenAPI), implementation, and testing.<br />
<br />
===Development Tasks===<br />
*With the help of the team, identify which API calls can be renamed to more sensible names, extended or deprecated<br />
*With the help of the team, rewrite API call descriptions to make them more comprehensible<br />
*Rewrite a API spec using a chosen tool (e.g. OpenAPI) using best RESTful practices<br />
*Set up API demo (e.g. using Swagger UI)<br />
<br />
<br />
<br />
==Rewrite Ultimate Sitemap Parser to yield results instead of returning them==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). The current implementation of the sitemap parser fetches all of the sitemap links and returns it to a caller in a single easy-to-use object. However, it turns out that some websites have *massive* sitemap trees! In those cases, the sitemap parser uses up lots of RAM for its operation, and the client is forced to wait for a long time to get sitemap fetching + parsing results. For those reasons, we’d like the sitemap parser to “yield” links found in a sitemap instead of “returning” them while also maintaining a nice, comprehensible interface to the sitemap parser: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/2<br />
<br />
===Development Tasks===<br />
* Rewrite sitemap parser to yield found sitemap links instead of returning them to conserve memory and make results usable faster<br />
<br />
<br />
<br />
==Make Ultimate Sitemap Parser use asyncio==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). In production, fetching and parsing XML sitemaps it’s mostly a CPU-intensive operation as the most time gets spent on gunzipping sitemaps, parsing XML and creating objects out of them, but my guess is that the sitemap parser could be made 10-20% faster by doing I/O (namely the fetching part) asynchronously.<br />
<br />
===Development Tasks===<br />
*Rewrite sitemap parser to fetch sitemaps asynchronously<br />
*Find other ways where I/O could be made asynchronous<br />
<br />
==Detect sitemap if it’s not linked from robots.txt in Ultimate Sitemap Parser==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). Most of sitemaps are being linked to in website’s robots.txt, but some are not. We would like to try common paths of sitemaps (e.g. /sitemap.xml[.gz]) on every site nonetheless: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/8<br />
<br />
=== Development Tasks ===<br />
* Update the module in such a way that it tries common sitemap locations independently from robots.txt<br />
<br />
<br />
<br />
==Add RSS / Atom sitemap support to our Ultimate Sitemap Parser==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). Most of those sitemaps are implemented in Sitemap XML protocol, but a small number of sitemaps are published in RSS / Atom, and we’d like to have support for those too: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/3<br />
<br />
=== Development Tasks ===<br />
*Add RSS / Atom support to Ultimate Sitemap Parser<br />
<br />
<br />
<br />
==Build a tool to do some cool visualizations==<br />
''Problem Statement'' Since 2008, we have collected more than a half billion news articles that we have post-processed and indexed. We know quite a lot about them -- which news articles were the most linked to from other similar articles, the most and least popular / influential articles (based on shares on Facebook, tweet count, or clicks on an article's Bit.ly shortened link), specific language and terms used to describe the subject matter in each of the articles, etc., and there's a lot of potential to learn much more. Can you use your design and coding skills to help us out in visualising some of this data, e.g. create a cool network map visualization tool?<br />
<br />
===Development Tasks===<br />
*Build any visualization tool based on our extensive data and tool set:<br />
**Figure out what you'd like to visualise and how are you going to do it<br />
**Use Gephi, a tool of your choice, or create your very own tool to implement your visualisation<br />
<br />
<br />
<br />
==Create PostgreSQL-based job queue==<br />
''Problem Description''. In more than eight (or is it nine by now?) years since we've been running Media Cloud, we have tried multiple job queue tools (e.g. Gearman) that we could use for dividing and conquering our workload. Unfortunately, all the tools (including the current one -- go look into the codebase to figure out which one it is now) have left us deeply unhappy because of one reason or another. If there's one tool which hasn’t let us down, it’s PostgreSQL. So, we'd like to also try running our job queue on Postgres. Can you implement it for us?<br />
<br />
===Development Tasks===<br />
*Write a spec, complete with code samples, on how to implement the following job queue:<br />
**Preferably programming language-agnostic, i.e. should run as a bunch of PL/pgSQL functions.<br />
***Maybe that's a bad idea, I don't know, you tell us.<br />
*Features:<br />
**Add jobs with names and JSON arguments<br />
**Cancel jobs by their ID<br />
**Track job's progress (and log?) by their ID<br />
**Get job ID using its JSON parameters<br />
**Merge jobs with identical JSON arguments into a single job<br />
**See job stats per task, i.e. how many jobs are queued for every task<br />
**Retry failed jobs<br />
**Report job failure, complete with error messages<br />
**Proper locking (for inspiration, see https://github.com/chanks/que)<br />
**Doesn't catch fire with tens of millions of queued jobs<br />
*(Bonus points) Actually implement the queue! If you don't get to doing this over the summer, it's fine, we would be happy with a proven spec.<br />
<br />
<br />
<br />
==Implement a method to detect subtopics of a topic==<br />
*Problem Statement.* As described elsewhere, a "topic" is subject discussed by the media that we are researching. Almost every big topic contains subtopics, e.g. the matters of immigration, racism, email server security and a plethora of other subjects were discussed during the last US election. We would like to investigate ways of how we could automatically detect those subtopics, possibly using the [Louvain method](https://en.wikipedia.org/wiki/Louvain_Modularity).<br />
<br />
===Development Tasks===<br />
Develop a proof of concept (un)supervised ML tool for detecting subtopics of a chosen subject ("topic").<br />
<br />
<br />
<br />
==Do your own freehand project==<br />
Problem Statement. If you had more than half a billion (!) news articles from all around the world stored in a single place, extracted from HTML into text, split into sentences, words, and made searchable, what would you do? Propose us something we didn't think of, and we will surely consider it!<br />
<br />
===Development Tasks. ===<br />
Left as an exercise to the student.<br />
<br />
<br />
<br />
=Skill Requirements for Potential Candidates=<br />
*Working knowledge of Perl or Python<br />
*Familiarity with relational databases, preferably PostgreSQL<br />
*Some pedantism<br />
*Willingness to propose, debate and object to ideas<br />
*Keen to work with us on writing your GSoC project proposal, as opposed to just submitting a long shot without any feedback and hoping for the best<br />
*Shown effort into learning what Media Cloud is all about; some ideas:<br />
**Make a pull request to our main code repository (https://github.com/berkmancenter/mediacloud),<br />
**Craft us an email with a smart question or two,<br />
**Try our our tools (see https://mediacloud.org/),<br />
**Run Media Cloud yourself and collect some news articles (see https://github.com/berkmancenter/mediacloud/blob/master/doc/vagrant.markdown),<br />
**Sign up and check out our API (see https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/api_2_0_spec.md, https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/topics_api_2_0_spec.md, and the API client at https://pypi.python.org/pypi/mediacloud/).</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=MediaCloud&diff=665MediaCloud2019-04-02T18:35:01Z<p>BerkmanSysop: </p>
<hr />
<div>[https://mediacloud.org/ Media Cloud] is an open source platform for studying media ecosystems. Media Cloud is joint project between [https://cyber.harvard.edu The Berkman Klein Center for Internet and Society at Harvard University] and [https://civic.mit.edu The Center for Civic Media] at MIT's Media Lab. <br />
<br />
By tracking hundreds of millions of stories published online or broadcast via television, our suite of tools allows researchers to track how stories and ideas spread through media, and how different corners of the media ecosystem report on stories.<br />
<br />
Our platform is designed to aggregate, analyze, deliver and visualize information, answering complex quantitative and qualitative questions about the content of online media.<br />
<br />
Aggregate. We have aggregated billions of online stories from an ever-growing set of 1,200,000+ digital media sources. We ingest data via RSS feeds and a set of robots that spider the web to fetch information from a variety of sources in near real-time.<br />
<br />
Analyze. To query our extensive library of data, we have developed a suite of analytical tools that allow you to explore relationships between professional and citizen media, and between online and offline sources.<br />
<br />
Deliver and Visualize. Our suite of tools provides opportunities to present data in formats that you can visualize in your own interfaces. These include the use of graphs, geographic maps, word clouds, network visualizations.<br />
<br />
Project URL: https://mediacloud.org/<br />
<br />
Project on GitHub: https://github.com/berkmancenter/mediacloud<br />
<br />
Project Mentors: [mailto:linas@media.mit.edu Linas Valiukas], [mailto:hroberts@cyber.law.harvard.edu Hal Roberts]<br />
<br />
=Project Ideas=<br />
<br />
==Create a self-contained, browser-based page HTML -> article HTML extractor==<br />
'''Problem Statement:'''<br />
For every fetched news article, we have to figure out which part of the page HTML contains the article body itself. We currently use readability-lxml (https://github.com/buriy/python-readability) for that task. However, readability-lxml is aging fast and is not necessarily still the best library around to extract body of the article from the HTML page. Also, more and more articles get loaded using JavaScript due to an ongoing "frontend everywhere!" frenzy, and our Python extractor doesn’t execute or support JavaScript. Lastly, various CDNs, e.g. Cloudflare, are blocking our crawler just because our user agent doesn't have JavaScript enabled.<br />
<br />
I think inevitably we'll have to switch to running a headless browser, loading each and every downloaded story in it, and then applying a well-supported third-party library, e.g. Mozilla's Readability (https://github.com/mozilla/readability), to extract article title, author and body.<br />
<br />
===Development Tasks===<br />
<br />
*Set up a chromeless browser<br />
*Set up Readability<br />
*Develop a HTTP service that accepts a parameter URL (and/or HTML body), loads it in the browser, runs Readability's magic, and returns the extracted HTML back to the requester.<br />
*Package everything in a nice Docker image<br />
<br />
There are similar projects like this, e.g. https://github.com/schollz/readable, so you’ll need to do some research into whether such a thing exists already before submitting a proposal. Maybe an existing tool could be improved instead of redoing everything from scratch?<br />
<br />
<br />
<br />
<br />
==Write a spec to a new generation of our API==<br />
<br />
'''Problem Statement:'''<br />
Create a specification for a new version of our API (https://github.com/berkmancenter/mediacloud/tree/release/doc/api_2_0_spec). Our existing API (implemented in Perl) is inconsistent among its different major parts, and is goofily un-REST-ish in several places. We would like to reimplement it to Python and use a modern framework for API specification (OpenAPI), implementation, and testing.<br />
<br />
===Development Tasks===<br />
*With the help of the team, identify which API calls can be renamed to more sensible names, extended or deprecated<br />
*With the help of the team, rewrite API call descriptions to make them more comprehensible<br />
*Rewrite a API spec using a chosen tool (e.g. OpenAPI) using best RESTful practices<br />
*Set up API demo (e.g. using Swagger UI)<br />
<br />
<br />
<br />
==Rewrite Ultimate Sitemap Parser to yield results instead of returning them==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). The current implementation of the sitemap parser fetches all of the sitemap links and returns it to a caller in a single easy-to-use object. However, it turns out that some websites have *massive* sitemap trees! In those cases, the sitemap parser uses up lots of RAM for its operation, and the client is forced to wait for a long time to get sitemap fetching + parsing results. For those reasons, we’d like the sitemap parser to “yield” links found in a sitemap instead of “returning” them while also maintaining a nice, comprehensible interface to the sitemap parser: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/2<br />
<br />
===Development Tasks===<br />
* Rewrite sitemap parser to yield found sitemap links instead of returning them to conserve memory and make results usable faster<br />
<br />
<br />
<br />
==Make Ultimate Sitemap Parser use asyncio==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). In production, fetching and parsing XML sitemaps it’s mostly a CPU-intensive operation as the most time gets spent on gunzipping sitemaps, parsing XML and creating objects out of them, but my guess is that the sitemap parser could be made 10-20% faster by doing I/O (namely the fetching part) asynchronously.<br />
<br />
===Development Tasks===<br />
*Rewrite sitemap parser to fetch sitemaps asynchronously<br />
*Find other ways where I/O could be made asynchronous<br />
<br />
==Detect sitemap if it’s not linked from robots.txt in Ultimate Sitemap Parser==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). Most of sitemaps are being linked to in website’s robots.txt, but some are not. We would like to try common paths of sitemaps (e.g. /sitemap.xml[.gz]) on every site nonetheless: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/8<br />
<br />
=== Development Tasks ===<br />
* Update the module in such a way that it tries common sitemap locations independently from robots.txt<br />
<br />
<br />
<br />
==Add RSS / Atom sitemap support to our Ultimate Sitemap Parser==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). Most of those sitemaps are implemented in Sitemap XML protocol, but a small number of sitemaps are published in RSS / Atom, and we’d like to have support for those too: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/3<br />
<br />
=== Development Tasks ===<br />
*Add RSS / Atom support to Ultimate Sitemap Parser<br />
<br />
<br />
<br />
==Build a tool to do some cool visualizations==<br />
''Problem Statement'' Since 2008, we have collected more than a half billion news articles that we have post-processed and indexed. We know quite a lot about them -- which news articles were the most linked to from other similar articles, the most and least popular / influential articles (based on shares on Facebook, tweet count, or clicks on an article's Bit.ly shortened link), specific language and terms used to describe the subject matter in each of the articles, etc., and there's a lot of potential to learn much more. Can you use your design and coding skills to help us out in visualising some of this data, e.g. create a cool network map visualization tool?<br />
<br />
===Development Tasks===<br />
*Build any visualization tool based on our extensive data and tool set:<br />
**Figure out what you'd like to visualise and how are you going to do it<br />
**Use Gephi, a tool of your choice, or create your very own tool to implement your visualisation<br />
<br />
<br />
<br />
==Create PostgreSQL-based job queue==<br />
''Problem Description''. In more than eight (or is it nine by now?) years since we've been running Media Cloud, we have tried multiple job queue tools (e.g. Gearman) that we could use for dividing and conquering our workload. Unfortunately, all the tools (including the current one -- go look into the codebase to figure out which one it is now) have left us deeply unhappy because of one reason or another. If there's one tool which hasn’t let us down, it’s PostgreSQL. So, we'd like to also try running our job queue on Postgres. Can you implement it for us?<br />
<br />
===Development Tasks===<br />
*Write a spec, complete with code samples, on how to implement the following job queue:<br />
**Preferably programming language-agnostic, i.e. should run as a bunch of PL/pgSQL functions.<br />
***Maybe that's a bad idea, I don't know, you tell us.<br />
*Features:<br />
**Add jobs with names and JSON arguments<br />
**Cancel jobs by their ID<br />
**Track job's progress (and log?) by their ID<br />
**Get job ID using its JSON parameters<br />
**Merge jobs with identical JSON arguments into a single job<br />
**See job stats per task, i.e. how many jobs are queued for every task<br />
**Retry failed jobs<br />
**Report job failure, complete with error messages<br />
**Proper locking (for inspiration, see https://github.com/chanks/que)<br />
**Doesn't catch fire with tens of millions of queued jobs<br />
*(Bonus points) Actually implement the queue! If you don't get to doing this over the summer, it's fine, we would be happy with a proven spec.<br />
<br />
<br />
<br />
==Implement a method to detect subtopics of a topic==<br />
*Problem Statement.* As described elsewhere, a "topic" is subject discussed by the media that we are researching. Almost every big topic contains subtopics, e.g. the matters of immigration, racism, email server security and a plethora of other subjects were discussed during the last US election. We would like to investigate ways of how we could automatically detect those subtopics, possibly using the [Louvain method](https://en.wikipedia.org/wiki/Louvain_Modularity).<br />
<br />
===Development Tasks===<br />
Develop a proof of concept (un)supervised ML tool for detecting subtopics of a chosen subject ("topic").<br />
<br />
<br />
<br />
==Do your own freehand project==<br />
Problem Statement. If you had more than half a billion (!) news articles from all around the world stored in a single place, extracted from HTML into text, split into sentences, words, and made searchable, what would you do? Propose us something we didn't think of, and we will surely consider it!<br />
<br />
===Development Tasks. ===<br />
Left as an exercise to the student.<br />
<br />
<br />
<br />
=Skill Requirements for Potential Candidates=<br />
*Working knowledge of Perl or Python<br />
*Familiarity with relational databases, preferably PostgreSQL<br />
*Some pedantism<br />
*Willingness to propose, debate and object to ideas<br />
*Keen to work with us on writing your GSoC project proposal, as opposed to just submitting a long shot without any feedback and hoping for the best<br />
*Shown effort into learning what Media Cloud is all about; some ideas:<br />
**Make a pull request to our main code repository (https://github.com/berkmancenter/mediacloud),<br />
**Craft us an email with a smart question or two,<br />
**Try our our tools (see https://mediacloud.org/),<br />
**Run Media Cloud yourself and collect some news articles (see https://github.com/berkmancenter/mediacloud/blob/master/doc/vagrant.markdown),<br />
**Sign up and check out our API (see https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/api_2_0_spec.md, https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/topics_api_2_0_spec.md, and the API client at https://pypi.python.org/pypi/mediacloud/).</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=MediaCloud&diff=664MediaCloud2019-04-02T18:32:20Z<p>BerkmanSysop: /* Projects */</p>
<hr />
<div>[https://mediacloud.org/ Media Cloud] is an open source platform for studying media ecosystems. Media Cloud is joint project between [https://cyber.harvard.edu The Berkman Klein Center for Internet and Society at Harvard University] and [https://civic.mit.edu The Center for Civic Media] at MIT's Media Lab. <br />
<br />
By tracking hundreds of millions of stories published online or broadcast via television, our suite of tools allows researchers to track how stories and ideas spread through media, and how different corners of the media ecosystem report on stories.<br />
<br />
Our platform is designed to aggregate, analyze, deliver and visualize information, answering complex quantitative and qualitative questions about the content of online media.<br />
<br />
Aggregate. We have aggregated billions of online stories from an ever-growing set of 1,200,000+ digital media sources. We ingest data via RSS feeds and a set of robots that spider the web to fetch information from a variety of sources in near real-time.<br />
<br />
Analyze. To query our extensive library of data, we have developed a suite of analytical tools that allow you to explore relationships between professional and citizen media, and between online and offline sources.<br />
<br />
Deliver and Visualize. Our suite of tools provides opportunities to present data in formats that you can visualize in your own interfaces. These include the use of graphs, geographic maps, word clouds, network visualizations.<br />
<br />
Project URL: https://mediacloud.org/<br />
<br />
Project on GitHub: https://github.com/berkmancenter/mediacloud<br />
<br />
Project Mentors: [mailto:linas@media.mit.edu Linas Valiukas], [mailto:hroberts@cyber.law.harvard.edu Hal Roberts]<br />
<br />
=Project Ideas=<br />
<br />
==Create a self-contained, browser-based page HTML -> article HTML extractor==<br />
'''Problem Statement:'''<br />
For every fetched news article, we have to figure out which part of the page HTML contains the article body itself. We currently use readability-lxml (https://github.com/buriy/python-readability) for that task. However, readability-lxml is aging fast and is not necessarily still the best library around to extract body of the article from the HTML page. Also, more and more articles get loaded using JavaScript due to an ongoing "frontend everywhere!" frenzy, and our Python extractor doesn’t execute or support JavaScript. Lastly, various CDNs, e.g. Cloudflare, are blocking our crawler just because our user agent doesn't have JavaScript enabled.<br />
<br />
I think inevitably we'll have to switch to running a headless browser, loading each and every downloaded story in it, and then applying a well-supported third-party library, e.g. Mozilla's Readability (https://github.com/mozilla/readability), to extract article title, author and body.<br />
<br />
===Development Tasks===<br />
<br />
*Set up a chromeless browser<br />
*Set up Readability<br />
*Develop a HTTP service that accepts a parameter URL (and/or HTML body), loads it in the browser, runs Readability's magic, and returns the extracted HTML back to the requester.<br />
*Package everything in a nice Docker image<br />
<br />
There are similar projects like this, e.g. https://github.com/schollz/readable, so you’ll need to do some research into whether such a thing exists already before submitting a proposal. Maybe an existing tool could be improved instead of redoing everything from scratch?<br />
<br />
==Write a spec to a new generation of our API==<br />
<br />
'''Problem Statement:'''<br />
Create a specification for a new version of our API (https://github.com/berkmancenter/mediacloud/tree/release/doc/api_2_0_spec). Our existing API (implemented in Perl) is inconsistent among its different major parts, and is goofily un-REST-ish in several places. We would like to reimplement it to Python and use a modern framework for API specification (OpenAPI), implementation, and testing.<br />
<br />
===Development Tasks===<br />
*With the help of the team, identify which API calls can be renamed to more sensible names, extended or deprecated<br />
*With the help of the team, rewrite API call descriptions to make them more comprehensible<br />
*Rewrite a API spec using a chosen tool (e.g. OpenAPI) using best RESTful practices<br />
*Set up API demo (e.g. using Swagger UI)<br />
<br />
<br />
==Rewrite Ultimate Sitemap Parser to yield results instead of returning them==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). The current implementation of the sitemap parser fetches all of the sitemap links and returns it to a caller in a single easy-to-use object. However, it turns out that some websites have *massive* sitemap trees! In those cases, the sitemap parser uses up lots of RAM for its operation, and the client is forced to wait for a long time to get sitemap fetching + parsing results. For those reasons, we’d like the sitemap parser to “yield” links found in a sitemap instead of “returning” them while also maintaining a nice, comprehensible interface to the sitemap parser: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/2<br />
<br />
===Development Tasks===<br />
* Rewrite sitemap parser to yield found sitemap links instead of returning them to conserve memory and make results usable faster<br />
<br />
<br />
==Make Ultimate Sitemap Parser use asyncio==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). In production, fetching and parsing XML sitemaps it’s mostly a CPU-intensive operation as the most time gets spent on gunzipping sitemaps, parsing XML and creating objects out of them, but my guess is that the sitemap parser could be made 10-20% faster by doing I/O (namely the fetching part) asynchronously.<br />
<br />
===Development Tasks===<br />
*Rewrite sitemap parser to fetch sitemaps asynchronously<br />
*Find other ways where I/O could be made asynchronous<br />
<br />
==Detect sitemap if it’s not linked from robots.txt in Ultimate Sitemap Parser==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). Most of sitemaps are being linked to in website’s robots.txt, but some are not. We would like to try common paths of sitemaps (e.g. /sitemap.xml[.gz]) on every site nonetheless: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/8<br />
<br />
=== Development Tasks ===<br />
* Update the module in such a way that it tries common sitemap locations independently from robots.txt<br />
<br />
<br />
==Add RSS / Atom sitemap support to our Ultimate Sitemap Parser==<br />
'''Problem Statement:'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). Most of those sitemaps are implemented in Sitemap XML protocol, but a small number of sitemaps are published in RSS / Atom, and we’d like to have support for those too: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/3<br />
<br />
=== Development Tasks ===<br />
*Add RSS / Atom support to Ultimate Sitemap Parser<br />
<br />
==Build a tool to do some cool visualizations==<br />
''Problem Statement'' Since 2008, we have collected more than a half billion news articles that we have post-processed and indexed. We know quite a lot about them -- which news articles were the most linked to from other similar articles, the most and least popular / influential articles (based on shares on Facebook, tweet count, or clicks on an article's Bit.ly shortened link), specific language and terms used to describe the subject matter in each of the articles, etc., and there's a lot of potential to learn much more. Can you use your design and coding skills to help us out in visualising some of this data, e.g. create a cool network map visualization tool?<br />
<br />
===Development Tasks===<br />
*Build any visualization tool based on our extensive data and tool set:<br />
**Figure out what you'd like to visualise and how are you going to do it<br />
**Use Gephi, a tool of your choice, or create your very own tool to implement your visualisation<br />
<br />
==Create PostgreSQL-based job queue==<br />
''Problem Description''. In more than eight (or is it nine by now?) years since we've been running Media Cloud, we have tried multiple job queue tools (e.g. Gearman) that we could use for dividing and conquering our workload. Unfortunately, all the tools (including the current one -- go look into the codebase to figure out which one it is now) have left us deeply unhappy because of one reason or another. If there's one tool which hasn’t let us down, it’s PostgreSQL. So, we'd like to also try running our job queue on Postgres. Can you implement it for us?<br />
<br />
===Development Tasks===<br />
*Write a spec, complete with code samples, on how to implement the following job queue:<br />
**Preferably programming language-agnostic, i.e. should run as a bunch of PL/pgSQL functions.<br />
***Maybe that's a bad idea, I don't know, you tell us.<br />
*Features:<br />
**Add jobs with names and JSON arguments<br />
**Cancel jobs by their ID<br />
**Track job's progress (and log?) by their ID<br />
**Get job ID using its JSON parameters<br />
**Merge jobs with identical JSON arguments into a single job<br />
**See job stats per task, i.e. how many jobs are queued for every task<br />
**Retry failed jobs<br />
**Report job failure, complete with error messages<br />
**Proper locking (for inspiration, see https://github.com/chanks/que)<br />
**Doesn't catch fire with tens of millions of queued jobs<br />
*(Bonus points) Actually implement the queue! If you don't get to doing this over the summer, it's fine, we would be happy with a proven spec.<br />
<br />
<br />
==Implement a method to detect subtopics of a topic==<br />
*Problem Statement.* As described elsewhere, a "topic" is subject discussed by the media that we are researching. Almost every big topic contains subtopics, e.g. the matters of immigration, racism, email server security and a plethora of other subjects were discussed during the last US election. We would like to investigate ways of how we could automatically detect those subtopics, possibly using the [Louvain method](https://en.wikipedia.org/wiki/Louvain_Modularity).<br />
<br />
===Development Tasks===<br />
Develop a proof of concept (un)supervised ML tool for detecting subtopics of a chosen subject ("topic").<br />
<br />
==Do your own freehand project==<br />
Problem Statement. If you had more than half a billion (!) news articles from all around the world stored in a single place, extracted from HTML into text, split into sentences, words, and made searchable, what would you do? Propose us something we didn't think of, and we will surely consider it!<br />
<br />
===Development Tasks. ===<br />
Left as an exercise to the student.<br />
<br />
===Skill Requirements for Potential Candidates===<br />
*Working knowledge of Perl or Python<br />
*Familiarity with relational databases, preferably PostgreSQL<br />
*Some pedantism<br />
*Willingness to propose, debate and object to ideas<br />
*Keen to work with us on writing your GSoC project proposal, as opposed to just submitting a long shot without any feedback and hoping for the best<br />
*Shown effort into learning what Media Cloud is all about; some ideas:<br />
**Make a pull request to our main code repository (https://github.com/berkmancenter/mediacloud),<br />
**Craft us an email with a smart question or two,<br />
**Try our our tools (see https://mediacloud.org/),<br />
**Run Media Cloud yourself and collect some news articles (see https://github.com/berkmancenter/mediacloud/blob/master/doc/vagrant.markdown),<br />
**Sign up and check out our API (see https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/api_2_0_spec.md, https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/topics_api_2_0_spec.md, and the API client at https://pypi.python.org/pypi/mediacloud/).<br />
<br />
==Requirements==<br />
*Working knowledge of Perl or Python<br />
*Familiarity with relational databases, preferably PostgreSQL<br />
*Some pedantism<br />
*Willingness to propose, debate and object to ideas<br />
*Keen to work with us on writing your GSoC project proposal, as opposed to just submitting a long shot without any feedback and hoping for the best<br />
*Shown effort into learning what Media Cloud is all about, e.g.:<br />
**Make a pull request to our main code repository; (https://github.com/berkmancenter/mediacloud),<br />
**Craft us an email with a smart question or two,<br />
**Try our our tools (see https://dashboard.mediacloud.org/#demo, https://sources.mediacloud.org/, http://globe.mediameter.org/, http://focus.mediameter.org/)<br />
**Run Media Cloud yourself and collect some news articles (see https://github.com/berkmancenter/mediacloud/blob/master/doc/vagrant.markdown),<br />
**Sign up and check out our API (see https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/api_2_0_spec.md, https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/topics_api_2_0_spec.md, and the API client at https://pypi.python.org/pypi/mediacloud/).</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=MediaCloud&diff=663MediaCloud2019-04-02T18:27:54Z<p>BerkmanSysop: /* =Make Ultimate Sitemap Parser use asyncio */</p>
<hr />
<div>[https://mediacloud.org/ Media Cloud] is an open source platform for studying media ecosystems. Media Cloud is joint project between [https://cyber.harvard.edu The Berkman Klein Center for Internet and Society at Harvard University] and [https://civic.mit.edu The Center for Civic Media] at MIT's Media Lab. <br />
<br />
By tracking hundreds of millions of stories published online or broadcast via television, our suite of tools allows researchers to track how stories and ideas spread through media, and how different corners of the media ecosystem report on stories.<br />
<br />
Our platform is designed to aggregate, analyze, deliver and visualize information, answering complex quantitative and qualitative questions about the content of online media.<br />
<br />
Aggregate. We have aggregated billions of online stories from an ever-growing set of 1,200,000+ digital media sources. We ingest data via RSS feeds and a set of robots that spider the web to fetch information from a variety of sources in near real-time.<br />
<br />
Analyze. To query our extensive library of data, we have developed a suite of analytical tools that allow you to explore relationships between professional and citizen media, and between online and offline sources.<br />
<br />
Deliver and Visualize. Our suite of tools provides opportunities to present data in formats that you can visualize in your own interfaces. These include the use of graphs, geographic maps, word clouds, network visualizations.<br />
<br />
Project URL: https://mediacloud.org/<br />
<br />
Project on GitHub: https://github.com/berkmancenter/mediacloud<br />
<br />
Project Mentors: [mailto:linas@media.mit.edu Linas Valiukas], [mailto:hroberts@cyber.law.harvard.edu Hal Roberts]<br />
<br />
===Projects===<br />
<br />
====Create a self-contained, browser-based page HTML -> article HTML extractor====<br />
'''Problem Statement'''<br />
For every fetched news article, we have to figure out which part of the page HTML contains the article body itself. We currently use readability-lxml (https://github.com/buriy/python-readability) for that task. However, readability-lxml is aging fast and is not necessarily still the best library around to extract body of the article from the HTML page. Also, more and more articles get loaded using JavaScript due to an ongoing "frontend everywhere!" frenzy, and our Python extractor doesn’t execute or support JavaScript. Lastly, various CDNs, e.g. Cloudflare, are blocking our crawler just because our user agent doesn't have JavaScript enabled.<br />
<br />
I think inevitably we'll have to switch to running a headless browser, loading each and every downloaded story in it, and then applying a well-supported third-party library, e.g. Mozilla's Readability (https://github.com/mozilla/readability), to extract article title, author and body.<br />
<br />
=====Development Tasks=====<br />
<br />
*Set up a chromeless browser<br />
*Set up Readability<br />
*Develop a HTTP service that accepts a parameter URL (and/or HTML body), loads it in the browser, runs Readability's magic, and returns the extracted HTML back to the requester.<br />
*Package everything in a nice Docker image<br />
<br />
There are similar projects like this, e.g. https://github.com/schollz/readable, so you’ll need to do some research into whether such a thing exists already before submitting a proposal. Maybe an existing tool could be improved instead of redoing everything from scratch?<br />
<br />
====Write a spec to a new generation of our API====<br />
<br />
'''Problem Statement'''<br />
Create a specification for a new version of our API (https://github.com/berkmancenter/mediacloud/tree/release/doc/api_2_0_spec). Our existing API (implemented in Perl) is inconsistent among its different major parts, and is goofily un-REST-ish in several places. We would like to reimplement it to Python and use a modern framework for API specification (OpenAPI), implementation, and testing.<br />
<br />
=====Development Tasks=====<br />
*With the help of the team, identify which API calls can be renamed to more sensible names, extended or deprecated<br />
*With the help of the team, rewrite API call descriptions to make them more comprehensible<br />
*Rewrite a API spec using a chosen tool (e.g. OpenAPI) using best RESTful practices<br />
*Set up API demo (e.g. using Swagger UI)<br />
<br />
<br />
====Rewrite Ultimate Sitemap Parser to yield results instead of returning them====<br />
'''Problem Statement'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). The current implementation of the sitemap parser fetches all of the sitemap links and returns it to a caller in a single easy-to-use object. However, it turns out that some websites have *massive* sitemap trees! In those cases, the sitemap parser uses up lots of RAM for its operation, and the client is forced to wait for a long time to get sitemap fetching + parsing results. For those reasons, we’d like the sitemap parser to “yield” links found in a sitemap instead of “returning” them while also maintaining a nice, comprehensible interface to the sitemap parser: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/2<br />
<br />
=====Development Tasks=====<br />
* Rewrite sitemap parser to yield found sitemap links instead of returning them to conserve memory and make results usable faster<br />
<br />
<br />
====Make Ultimate Sitemap Parser use asyncio====<br />
'''Problem Statement'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). In production, fetching and parsing XML sitemaps it’s mostly a CPU-intensive operation as the most time gets spent on gunzipping sitemaps, parsing XML and creating objects out of them, but my guess is that the sitemap parser could be made 10-20% faster by doing I/O (namely the fetching part) asynchronously.<br />
<br />
=====Development Tasks=====<br />
*Rewrite sitemap parser to fetch sitemaps asynchronously<br />
*Find other ways where I/O could be made asynchronous<br />
<br />
=====Detect sitemap if it’s not linked from robots.txt in Ultimate Sitemap Parser=====<br />
'''Problem statement'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). Most of sitemaps are being linked to in website’s robots.txt, but some are not. We would like to try common paths of sitemaps (e.g. /sitemap.xml[.gz]) on every site nonetheless: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/8<br />
<br />
===== Development Tasks =====<br />
* Update the module in such a way that it tries common sitemap locations independently from robots.txt<br />
<br />
<br />
====Add RSS / Atom sitemap support to our Ultimate Sitemap Parser====<br />
'''Problem Statement'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). Most of those sitemaps are implemented in Sitemap XML protocol, but a small number of sitemaps are published in RSS / Atom, and we’d like to have support for those too: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/3<br />
<br />
===== Development Tasks =====<br />
*Add RSS / Atom support to Ultimate Sitemap Parser<br />
<br />
====Build a tool to do some cool visualizations====<br />
''Problem Statement'' Since 2008, we have collected more than a half billion news articles that we have post-processed and indexed. We know quite a lot about them -- which news articles were the most linked to from other similar articles, the most and least popular / influential articles (based on shares on Facebook, tweet count, or clicks on an article's Bit.ly shortened link), specific language and terms used to describe the subject matter in each of the articles, etc., and there's a lot of potential to learn much more. Can you use your design and coding skills to help us out in visualising some of this data, e.g. create a cool network map visualization tool?<br />
<br />
=====Development Tasks=====<br />
*Build any visualization tool based on our extensive data and tool set:<br />
**Figure out what you'd like to visualise and how are you going to do it<br />
**Use Gephi, a tool of your choice, or create your very own tool to implement your visualisation<br />
<br />
====Create PostgreSQL-based job queue====<br />
''Problem Description''. In more than eight (or is it nine by now?) years since we've been running Media Cloud, we have tried multiple job queue tools (e.g. Gearman) that we could use for dividing and conquering our workload. Unfortunately, all the tools (including the current one -- go look into the codebase to figure out which one it is now) have left us deeply unhappy because of one reason or another. If there's one tool which hasn’t let us down, it’s PostgreSQL. So, we'd like to also try running our job queue on Postgres. Can you implement it for us?<br />
<br />
=====Development Tasks=====<br />
*Write a spec, complete with code samples, on how to implement the following job queue:<br />
**Preferably programming language-agnostic, i.e. should run as a bunch of PL/pgSQL functions.<br />
***Maybe that's a bad idea, I don't know, you tell us.<br />
*Features:<br />
**Add jobs with names and JSON arguments<br />
**Cancel jobs by their ID<br />
**Track job's progress (and log?) by their ID<br />
**Get job ID using its JSON parameters<br />
**Merge jobs with identical JSON arguments into a single job<br />
**See job stats per task, i.e. how many jobs are queued for every task<br />
**Retry failed jobs<br />
**Report job failure, complete with error messages<br />
**Proper locking (for inspiration, see https://github.com/chanks/que)<br />
**Doesn't catch fire with tens of millions of queued jobs<br />
*(Bonus points) Actually implement the queue! If you don't get to doing this over the summer, it's fine, we would be happy with a proven spec.<br />
<br />
<br />
====Implement a method to detect subtopics of a topic====<br />
*Problem Statement.* As described elsewhere, a "topic" is subject discussed by the media that we are researching. Almost every big topic contains subtopics, e.g. the matters of immigration, racism, email server security and a plethora of other subjects were discussed during the last US election. We would like to investigate ways of how we could automatically detect those subtopics, possibly using the [Louvain method](https://en.wikipedia.org/wiki/Louvain_Modularity).<br />
<br />
=====Development Tasks=====<br />
Develop a proof of concept (un)supervised ML tool for detecting subtopics of a chosen subject ("topic").<br />
<br />
====Do your own freehand project====<br />
Problem Statement. If you had more than half a billion (!) news articles from all around the world stored in a single place, extracted from HTML into text, split into sentences, words, and made searchable, what would you do? Propose us something we didn't think of, and we will surely consider it!<br />
<br />
=====Development Tasks. =====<br />
Left as an exercise to the student.<br />
<br />
===Skill Requirements for Potential Candidates===<br />
*Working knowledge of Perl or Python<br />
*Familiarity with relational databases, preferably PostgreSQL<br />
*Some pedantism<br />
*Willingness to propose, debate and object to ideas<br />
*Keen to work with us on writing your GSoC project proposal, as opposed to just submitting a long shot without any feedback and hoping for the best<br />
*Shown effort into learning what Media Cloud is all about; some ideas:<br />
**Make a pull request to our main code repository (https://github.com/berkmancenter/mediacloud),<br />
**Craft us an email with a smart question or two,<br />
**Try our our tools (see https://mediacloud.org/),<br />
**Run Media Cloud yourself and collect some news articles (see https://github.com/berkmancenter/mediacloud/blob/master/doc/vagrant.markdown),<br />
**Sign up and check out our API (see https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/api_2_0_spec.md, https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/topics_api_2_0_spec.md, and the API client at https://pypi.python.org/pypi/mediacloud/).<br />
<br />
==Requirements==<br />
*Working knowledge of Perl or Python<br />
*Familiarity with relational databases, preferably PostgreSQL<br />
*Some pedantism<br />
*Willingness to propose, debate and object to ideas<br />
*Keen to work with us on writing your GSoC project proposal, as opposed to just submitting a long shot without any feedback and hoping for the best<br />
*Shown effort into learning what Media Cloud is all about, e.g.:<br />
**Make a pull request to our main code repository; (https://github.com/berkmancenter/mediacloud),<br />
**Craft us an email with a smart question or two,<br />
**Try our our tools (see https://dashboard.mediacloud.org/#demo, https://sources.mediacloud.org/, http://globe.mediameter.org/, http://focus.mediameter.org/)<br />
**Run Media Cloud yourself and collect some news articles (see https://github.com/berkmancenter/mediacloud/blob/master/doc/vagrant.markdown),<br />
**Sign up and check out our API (see https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/api_2_0_spec.md, https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/topics_api_2_0_spec.md, and the API client at https://pypi.python.org/pypi/mediacloud/).</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=MediaCloud&diff=662MediaCloud2019-04-02T18:27:33Z<p>BerkmanSysop: /* Projects */</p>
<hr />
<div>[https://mediacloud.org/ Media Cloud] is an open source platform for studying media ecosystems. Media Cloud is joint project between [https://cyber.harvard.edu The Berkman Klein Center for Internet and Society at Harvard University] and [https://civic.mit.edu The Center for Civic Media] at MIT's Media Lab. <br />
<br />
By tracking hundreds of millions of stories published online or broadcast via television, our suite of tools allows researchers to track how stories and ideas spread through media, and how different corners of the media ecosystem report on stories.<br />
<br />
Our platform is designed to aggregate, analyze, deliver and visualize information, answering complex quantitative and qualitative questions about the content of online media.<br />
<br />
Aggregate. We have aggregated billions of online stories from an ever-growing set of 1,200,000+ digital media sources. We ingest data via RSS feeds and a set of robots that spider the web to fetch information from a variety of sources in near real-time.<br />
<br />
Analyze. To query our extensive library of data, we have developed a suite of analytical tools that allow you to explore relationships between professional and citizen media, and between online and offline sources.<br />
<br />
Deliver and Visualize. Our suite of tools provides opportunities to present data in formats that you can visualize in your own interfaces. These include the use of graphs, geographic maps, word clouds, network visualizations.<br />
<br />
Project URL: https://mediacloud.org/<br />
<br />
Project on GitHub: https://github.com/berkmancenter/mediacloud<br />
<br />
Project Mentors: [mailto:linas@media.mit.edu Linas Valiukas], [mailto:hroberts@cyber.law.harvard.edu Hal Roberts]<br />
<br />
===Projects===<br />
<br />
====Create a self-contained, browser-based page HTML -> article HTML extractor====<br />
'''Problem Statement'''<br />
For every fetched news article, we have to figure out which part of the page HTML contains the article body itself. We currently use readability-lxml (https://github.com/buriy/python-readability) for that task. However, readability-lxml is aging fast and is not necessarily still the best library around to extract body of the article from the HTML page. Also, more and more articles get loaded using JavaScript due to an ongoing "frontend everywhere!" frenzy, and our Python extractor doesn’t execute or support JavaScript. Lastly, various CDNs, e.g. Cloudflare, are blocking our crawler just because our user agent doesn't have JavaScript enabled.<br />
<br />
I think inevitably we'll have to switch to running a headless browser, loading each and every downloaded story in it, and then applying a well-supported third-party library, e.g. Mozilla's Readability (https://github.com/mozilla/readability), to extract article title, author and body.<br />
<br />
=====Development Tasks=====<br />
<br />
*Set up a chromeless browser<br />
*Set up Readability<br />
*Develop a HTTP service that accepts a parameter URL (and/or HTML body), loads it in the browser, runs Readability's magic, and returns the extracted HTML back to the requester.<br />
*Package everything in a nice Docker image<br />
<br />
There are similar projects like this, e.g. https://github.com/schollz/readable, so you’ll need to do some research into whether such a thing exists already before submitting a proposal. Maybe an existing tool could be improved instead of redoing everything from scratch?<br />
<br />
====Write a spec to a new generation of our API====<br />
<br />
'''Problem Statement'''<br />
Create a specification for a new version of our API (https://github.com/berkmancenter/mediacloud/tree/release/doc/api_2_0_spec). Our existing API (implemented in Perl) is inconsistent among its different major parts, and is goofily un-REST-ish in several places. We would like to reimplement it to Python and use a modern framework for API specification (OpenAPI), implementation, and testing.<br />
<br />
=====Development Tasks=====<br />
*With the help of the team, identify which API calls can be renamed to more sensible names, extended or deprecated<br />
*With the help of the team, rewrite API call descriptions to make them more comprehensible<br />
*Rewrite a API spec using a chosen tool (e.g. OpenAPI) using best RESTful practices<br />
*Set up API demo (e.g. using Swagger UI)<br />
<br />
<br />
====Rewrite Ultimate Sitemap Parser to yield results instead of returning them====<br />
'''Problem Statement'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). The current implementation of the sitemap parser fetches all of the sitemap links and returns it to a caller in a single easy-to-use object. However, it turns out that some websites have *massive* sitemap trees! In those cases, the sitemap parser uses up lots of RAM for its operation, and the client is forced to wait for a long time to get sitemap fetching + parsing results. For those reasons, we’d like the sitemap parser to “yield” links found in a sitemap instead of “returning” them while also maintaining a nice, comprehensible interface to the sitemap parser: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/2<br />
<br />
=====Development Tasks=====<br />
* Rewrite sitemap parser to yield found sitemap links instead of returning them to conserve memory and make results usable faster<br />
<br />
<br />
====Make Ultimate Sitemap Parser use asyncio===<br />
'''Problem Statement'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). In production, fetching and parsing XML sitemaps it’s mostly a CPU-intensive operation as the most time gets spent on gunzipping sitemaps, parsing XML and creating objects out of them, but my guess is that the sitemap parser could be made 10-20% faster by doing I/O (namely the fetching part) asynchronously.<br />
<br />
=====Development Tasks=====<br />
*Rewrite sitemap parser to fetch sitemaps asynchronously<br />
*Find other ways where I/O could be made asynchronous<br />
<br />
=====Detect sitemap if it’s not linked from robots.txt in Ultimate Sitemap Parser=====<br />
'''Problem statement'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). Most of sitemaps are being linked to in website’s robots.txt, but some are not. We would like to try common paths of sitemaps (e.g. /sitemap.xml[.gz]) on every site nonetheless: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/8<br />
<br />
===== Development Tasks =====<br />
* Update the module in such a way that it tries common sitemap locations independently from robots.txt<br />
<br />
<br />
====Add RSS / Atom sitemap support to our Ultimate Sitemap Parser====<br />
'''Problem Statement'''<br />
Ultimate Sitemap Parser (https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser) is our Python module that we use to fetch trees of website sitemaps (https://www.sitemaps.org/). Most of those sitemaps are implemented in Sitemap XML protocol, but a small number of sitemaps are published in RSS / Atom, and we’d like to have support for those too: https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser/issues/3<br />
<br />
===== Development Tasks =====<br />
*Add RSS / Atom support to Ultimate Sitemap Parser<br />
<br />
====Build a tool to do some cool visualizations====<br />
''Problem Statement'' Since 2008, we have collected more than a half billion news articles that we have post-processed and indexed. We know quite a lot about them -- which news articles were the most linked to from other similar articles, the most and least popular / influential articles (based on shares on Facebook, tweet count, or clicks on an article's Bit.ly shortened link), specific language and terms used to describe the subject matter in each of the articles, etc., and there's a lot of potential to learn much more. Can you use your design and coding skills to help us out in visualising some of this data, e.g. create a cool network map visualization tool?<br />
<br />
=====Development Tasks=====<br />
*Build any visualization tool based on our extensive data and tool set:<br />
**Figure out what you'd like to visualise and how are you going to do it<br />
**Use Gephi, a tool of your choice, or create your very own tool to implement your visualisation<br />
<br />
====Create PostgreSQL-based job queue====<br />
''Problem Description''. In more than eight (or is it nine by now?) years since we've been running Media Cloud, we have tried multiple job queue tools (e.g. Gearman) that we could use for dividing and conquering our workload. Unfortunately, all the tools (including the current one -- go look into the codebase to figure out which one it is now) have left us deeply unhappy because of one reason or another. If there's one tool which hasn’t let us down, it’s PostgreSQL. So, we'd like to also try running our job queue on Postgres. Can you implement it for us?<br />
<br />
=====Development Tasks=====<br />
*Write a spec, complete with code samples, on how to implement the following job queue:<br />
**Preferably programming language-agnostic, i.e. should run as a bunch of PL/pgSQL functions.<br />
***Maybe that's a bad idea, I don't know, you tell us.<br />
*Features:<br />
**Add jobs with names and JSON arguments<br />
**Cancel jobs by their ID<br />
**Track job's progress (and log?) by their ID<br />
**Get job ID using its JSON parameters<br />
**Merge jobs with identical JSON arguments into a single job<br />
**See job stats per task, i.e. how many jobs are queued for every task<br />
**Retry failed jobs<br />
**Report job failure, complete with error messages<br />
**Proper locking (for inspiration, see https://github.com/chanks/que)<br />
**Doesn't catch fire with tens of millions of queued jobs<br />
*(Bonus points) Actually implement the queue! If you don't get to doing this over the summer, it's fine, we would be happy with a proven spec.<br />
<br />
<br />
====Implement a method to detect subtopics of a topic====<br />
*Problem Statement.* As described elsewhere, a "topic" is subject discussed by the media that we are researching. Almost every big topic contains subtopics, e.g. the matters of immigration, racism, email server security and a plethora of other subjects were discussed during the last US election. We would like to investigate ways of how we could automatically detect those subtopics, possibly using the [Louvain method](https://en.wikipedia.org/wiki/Louvain_Modularity).<br />
<br />
=====Development Tasks=====<br />
Develop a proof of concept (un)supervised ML tool for detecting subtopics of a chosen subject ("topic").<br />
<br />
====Do your own freehand project====<br />
Problem Statement. If you had more than half a billion (!) news articles from all around the world stored in a single place, extracted from HTML into text, split into sentences, words, and made searchable, what would you do? Propose us something we didn't think of, and we will surely consider it!<br />
<br />
=====Development Tasks. =====<br />
Left as an exercise to the student.<br />
<br />
<br />
===Skill Requirements for Potential Candidates===<br />
*Working knowledge of Perl or Python<br />
*Familiarity with relational databases, preferably PostgreSQL<br />
*Some pedantism<br />
*Willingness to propose, debate and object to ideas<br />
*Keen to work with us on writing your GSoC project proposal, as opposed to just submitting a long shot without any feedback and hoping for the best<br />
*Shown effort into learning what Media Cloud is all about; some ideas:<br />
**Make a pull request to our main code repository (https://github.com/berkmancenter/mediacloud),<br />
**Craft us an email with a smart question or two,<br />
**Try our our tools (see https://mediacloud.org/),<br />
**Run Media Cloud yourself and collect some news articles (see https://github.com/berkmancenter/mediacloud/blob/master/doc/vagrant.markdown),<br />
**Sign up and check out our API (see https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/api_2_0_spec.md, https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/topics_api_2_0_spec.md, and the API client at https://pypi.python.org/pypi/mediacloud/).<br />
<br />
==Requirements==<br />
*Working knowledge of Perl or Python<br />
*Familiarity with relational databases, preferably PostgreSQL<br />
*Some pedantism<br />
*Willingness to propose, debate and object to ideas<br />
*Keen to work with us on writing your GSoC project proposal, as opposed to just submitting a long shot without any feedback and hoping for the best<br />
*Shown effort into learning what Media Cloud is all about, e.g.:<br />
**Make a pull request to our main code repository; (https://github.com/berkmancenter/mediacloud),<br />
**Craft us an email with a smart question or two,<br />
**Try our our tools (see https://dashboard.mediacloud.org/#demo, https://sources.mediacloud.org/, http://globe.mediameter.org/, http://focus.mediameter.org/)<br />
**Run Media Cloud yourself and collect some news articles (see https://github.com/berkmancenter/mediacloud/blob/master/doc/vagrant.markdown),<br />
**Sign up and check out our API (see https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/api_2_0_spec.md, https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/topics_api_2_0_spec.md, and the API client at https://pypi.python.org/pypi/mediacloud/).</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=MediaCloud&diff=661MediaCloud2019-04-02T18:16:10Z<p>BerkmanSysop: /* Projects */</p>
<hr />
<div>[https://mediacloud.org/ Media Cloud] is an open source platform for studying media ecosystems. Media Cloud is joint project between [https://cyber.harvard.edu The Berkman Klein Center for Internet and Society at Harvard University] and [https://civic.mit.edu The Center for Civic Media] at MIT's Media Lab. <br />
<br />
By tracking hundreds of millions of stories published online or broadcast via television, our suite of tools allows researchers to track how stories and ideas spread through media, and how different corners of the media ecosystem report on stories.<br />
<br />
Our platform is designed to aggregate, analyze, deliver and visualize information, answering complex quantitative and qualitative questions about the content of online media.<br />
<br />
Aggregate. We have aggregated billions of online stories from an ever-growing set of 1,200,000+ digital media sources. We ingest data via RSS feeds and a set of robots that spider the web to fetch information from a variety of sources in near real-time.<br />
<br />
Analyze. To query our extensive library of data, we have developed a suite of analytical tools that allow you to explore relationships between professional and citizen media, and between online and offline sources.<br />
<br />
Deliver and Visualize. Our suite of tools provides opportunities to present data in formats that you can visualize in your own interfaces. These include the use of graphs, geographic maps, word clouds, network visualizations.<br />
<br />
Project URL: https://mediacloud.org/<br />
<br />
Project on GitHub: https://github.com/berkmancenter/mediacloud<br />
<br />
Project Mentors: [mailto:linas@media.mit.edu Linas Valiukas], [mailto:hroberts@cyber.law.harvard.edu Hal Roberts]<br />
<br />
===Projects===<br />
<br />
====Create a self-contained, browser-based page HTML -> article HTML extractor====<br />
'''Problem Statement'''<br />
For every fetched news article, we have to figure out which part of the page HTML contains the article body itself. We currently use readability-lxml (https://github.com/buriy/python-readability) for that task. However, readability-lxml is aging fast and is not necessarily still the best library around to extract body of the article from the HTML page. Also, more and more articles get loaded using JavaScript due to an ongoing "frontend everywhere!" frenzy, and our Python extractor doesn’t execute or support JavaScript. Lastly, various CDNs, e.g. Cloudflare, are blocking our crawler just because our user agent doesn't have JavaScript enabled.<br />
<br />
I think inevitably we'll have to switch to running a headless browser, loading each and every downloaded story in it, and then applying a well-supported third-party library, e.g. Mozilla's Readability (https://github.com/mozilla/readability), to extract article title, author and body.<br />
<br />
=====Development Tasks=====<br />
<br />
*Set up a chromeless browser<br />
*Set up Readability<br />
*Develop a HTTP service that accepts a parameter URL (and/or HTML body), loads it in the browser, runs Readability's magic, and returns the extracted HTML back to the requester.<br />
*Package everything in a nice Docker image<br />
<br />
There are similar projects like this, e.g. https://github.com/schollz/readable, so you’ll need to do some research into whether such a thing exists already before submitting a proposal. Maybe an existing tool could be improved instead of redoing everything from scratch?<br />
<br />
====Build a tool to do some cool visualizations====<br />
''Problem Statement''. Since 2008, we have collected more than a half billion news articles that we have post-processed and indexed. We know quite a lot about them -- which news articles were the most linked to from other similar articles, the most and least popular / influential articles (based on shares on Facebook, tweet count, or clicks on an article's Bit.ly shortened link), specific language and terms used to describe the subject matter in each of the articles, etc., and there's a lot of potential to learn much more. Can you use your design and coding skills to help us out in visualising some of this data, e.g. create a cool network map visualization tool?<br />
<br />
=====Development Tasks=====<br />
*Build any visualization tool based on our extensive data and tool set:<br />
**Figure out what you'd like to visualise and how are you going to do it<br />
**Use Gephi, a tool of your choice, or create your very own tool to implement your visualisation<br />
<br />
<br />
====Create PostgreSQL-based job queue====<br />
''Problem Description''. In more than eight (or is it nine by now?) years since we've been running Media Cloud, we have tried multiple job queue tools (e.g. Gearman) that we could use for dividing and conquering our workload. Unfortunately, all the tools (including the current one -- go look into the codebase to figure out which one it is now) have left us deeply unhappy because of one reason or another. If there's one tool which hasn’t let us down, it’s PostgreSQL. So, we'd like to also try running our job queue on Postgres. Can you implement it for us?<br />
<br />
=====Development Tasks=====<br />
*Write a spec, complete with code samples, on how to implement the following job queue:<br />
**Preferably programming language-agnostic, i.e. should run as a bunch of PL/pgSQL functions.<br />
***Maybe that's a bad idea, I don't know, you tell us.<br />
*Features:<br />
**Add jobs with names and JSON arguments<br />
**Cancel jobs by their ID<br />
**Track job's progress (and log?) by their ID<br />
**Get job ID using its JSON parameters<br />
**Merge jobs with identical JSON arguments into a single job<br />
**See job stats per task, i.e. how many jobs are queued for every task<br />
**Retry failed jobs<br />
**Report job failure, complete with error messages<br />
**Proper locking (for inspiration, see https://github.com/chanks/que)<br />
**Doesn't catch fire with tens of millions of queued jobs<br />
*(Bonus points) Actually implement the queue! If you don't get to doing this over the summer, it's fine, we would be happy with a proven spec.<br />
<br />
<br />
====Implement a method to detect subtopics of a topic====<br />
*Problem Statement.* As described elsewhere, a "topic" is subject discussed by the media that we are researching. Almost every big topic contains subtopics, e.g. the matters of immigration, racism, email server security and a plethora of other subjects were discussed during the last US election. We would like to investigate ways of how we could automatically detect those subtopics, possibly using the [Louvain method](https://en.wikipedia.org/wiki/Louvain_Modularity).<br />
<br />
=====Development Tasks=====<br />
Develop a proof of concept (un)supervised ML tool for detecting subtopics of a chosen subject ("topic").<br />
<br />
====Do your own freehand project====<br />
Problem Statement. If you had more than half a billion (!) news articles from all around the world stored in a single place, extracted from HTML into text, split into sentences, words, and made searchable, what would you do? Propose us something we didn't think of, and we will surely consider it!<br />
<br />
=====Development Tasks. =====<br />
Left as an exercise to the student.<br />
<br />
==Requirements==<br />
*Working knowledge of Perl or Python<br />
*Familiarity with relational databases, preferably PostgreSQL<br />
*Some pedantism<br />
*Willingness to propose, debate and object to ideas<br />
*Keen to work with us on writing your GSoC project proposal, as opposed to just submitting a long shot without any feedback and hoping for the best<br />
*Shown effort into learning what Media Cloud is all about, e.g.:<br />
**Make a pull request to our main code repository; (https://github.com/berkmancenter/mediacloud),<br />
**Craft us an email with a smart question or two,<br />
**Try our our tools (see https://dashboard.mediacloud.org/#demo, https://sources.mediacloud.org/, http://globe.mediameter.org/, http://focus.mediameter.org/)<br />
**Run Media Cloud yourself and collect some news articles (see https://github.com/berkmancenter/mediacloud/blob/master/doc/vagrant.markdown),<br />
**Sign up and check out our API (see https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/api_2_0_spec.md, https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/topics_api_2_0_spec.md, and the API client at https://pypi.python.org/pypi/mediacloud/).</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=MediaCloud&diff=660MediaCloud2019-04-02T18:11:58Z<p>BerkmanSysop: </p>
<hr />
<div>[https://mediacloud.org/ Media Cloud] is an open source platform for studying media ecosystems. Media Cloud is joint project between [https://cyber.harvard.edu The Berkman Klein Center for Internet and Society at Harvard University] and [https://civic.mit.edu The Center for Civic Media] at MIT's Media Lab. <br />
<br />
By tracking hundreds of millions of stories published online or broadcast via television, our suite of tools allows researchers to track how stories and ideas spread through media, and how different corners of the media ecosystem report on stories.<br />
<br />
Our platform is designed to aggregate, analyze, deliver and visualize information, answering complex quantitative and qualitative questions about the content of online media.<br />
<br />
Aggregate. We have aggregated billions of online stories from an ever-growing set of 1,200,000+ digital media sources. We ingest data via RSS feeds and a set of robots that spider the web to fetch information from a variety of sources in near real-time.<br />
<br />
Analyze. To query our extensive library of data, we have developed a suite of analytical tools that allow you to explore relationships between professional and citizen media, and between online and offline sources.<br />
<br />
Deliver and Visualize. Our suite of tools provides opportunities to present data in formats that you can visualize in your own interfaces. These include the use of graphs, geographic maps, word clouds, network visualizations.<br />
<br />
Project URL: https://mediacloud.org/<br />
<br />
Project on GitHub: https://github.com/berkmancenter/mediacloud<br />
<br />
Project Mentors: [mailto:linas@media.mit.edu Linas Valiukas], [mailto:hroberts@cyber.law.harvard.edu Hal Roberts]<br />
<br />
===Projects===<br />
<br />
====Build a tool to do some cool visualizations====<br />
''Problem Statement''. Since 2008, we have collected more than a half billion news articles that we have post-processed and indexed. We know quite a lot about them -- which news articles were the most linked to from other similar articles, the most and least popular / influential articles (based on shares on Facebook, tweet count, or clicks on an article's Bit.ly shortened link), specific language and terms used to describe the subject matter in each of the articles, etc., and there's a lot of potential to learn much more. Can you use your design and coding skills to help us out in visualising some of this data, e.g. create a cool network map visualization tool?<br />
<br />
=====Development Tasks=====<br />
*Build any visualization tool based on our extensive data and tool set:<br />
**Figure out what you'd like to visualise and how are you going to do it<br />
**Use Gephi, a tool of your choice, or create your very own tool to implement your visualisation<br />
<br />
<br />
====Create PostgreSQL-based job queue====<br />
''Problem Description''. In more than eight (or is it nine by now?) years since we've been running Media Cloud, we have tried multiple job queue tools (e.g. Gearman) that we could use for dividing and conquering our workload. Unfortunately, all the tools (including the current one -- go look into the codebase to figure out which one it is now) have left us deeply unhappy because of one reason or another. If there's one tool which hasn’t let us down, it’s PostgreSQL. So, we'd like to also try running our job queue on Postgres. Can you implement it for us?<br />
<br />
=====Development Tasks=====<br />
*Write a spec, complete with code samples, on how to implement the following job queue:<br />
**Preferably programming language-agnostic, i.e. should run as a bunch of PL/pgSQL functions.<br />
***Maybe that's a bad idea, I don't know, you tell us.<br />
*Features:<br />
**Add jobs with names and JSON arguments<br />
**Cancel jobs by their ID<br />
**Track job's progress (and log?) by their ID<br />
**Get job ID using its JSON parameters<br />
**Merge jobs with identical JSON arguments into a single job<br />
**See job stats per task, i.e. how many jobs are queued for every task<br />
**Retry failed jobs<br />
**Report job failure, complete with error messages<br />
**Proper locking (for inspiration, see https://github.com/chanks/que)<br />
**Doesn't catch fire with tens of millions of queued jobs<br />
*(Bonus points) Actually implement the queue! If you don't get to doing this over the summer, it's fine, we would be happy with a proven spec.<br />
<br />
<br />
====Implement a method to detect subtopics of a topic====<br />
*Problem Statement.* As described elsewhere, a "topic" is subject discussed by the media that we are researching. Almost every big topic contains subtopics, e.g. the matters of immigration, racism, email server security and a plethora of other subjects were discussed during the last US election. We would like to investigate ways of how we could automatically detect those subtopics, possibly using the [Louvain method](https://en.wikipedia.org/wiki/Louvain_Modularity).<br />
<br />
=====Development Tasks=====<br />
Develop a proof of concept (un)supervised ML tool for detecting subtopics of a chosen subject ("topic").<br />
<br />
====Do your own freehand project====<br />
Problem Statement. If you had more than half a billion (!) news articles from all around the world stored in a single place, extracted from HTML into text, split into sentences, words, and made searchable, what would you do? Propose us something we didn't think of, and we will surely consider it!<br />
<br />
=====Development Tasks. =====<br />
Left as an exercise to the student.<br />
<br />
==Requirements==<br />
*Working knowledge of Perl or Python<br />
*Familiarity with relational databases, preferably PostgreSQL<br />
*Some pedantism<br />
*Willingness to propose, debate and object to ideas<br />
*Keen to work with us on writing your GSoC project proposal, as opposed to just submitting a long shot without any feedback and hoping for the best<br />
*Shown effort into learning what Media Cloud is all about, e.g.:<br />
**Make a pull request to our main code repository; (https://github.com/berkmancenter/mediacloud),<br />
**Craft us an email with a smart question or two,<br />
**Try our our tools (see https://dashboard.mediacloud.org/#demo, https://sources.mediacloud.org/, http://globe.mediameter.org/, http://focus.mediameter.org/)<br />
**Run Media Cloud yourself and collect some news articles (see https://github.com/berkmancenter/mediacloud/blob/master/doc/vagrant.markdown),<br />
**Sign up and check out our API (see https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/api_2_0_spec.md, https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/topics_api_2_0_spec.md, and the API client at https://pypi.python.org/pypi/mediacloud/).</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=GSoC_FAQ&diff=659GSoC FAQ2019-03-28T19:49:09Z<p>BerkmanSysop: </p>
<hr />
<div>This is the page for some of the frequently asked questions by the prospective participants of the Google Summer of Code.<br />
<br />
__TOC__ <br />
==General Questions==<br />
<br />
===Am I required to be local?===<br />
'''Q:''' If someone is selected as a student coder through the Summer of Code, will they need to be in the Boston area over the summer?<br />
<br />
'''A:''' No, we are not asking anyone to move to Boston for the summer. While you are most welcome to come and work at the Berkman Klein Center if you are selected for an internship, we will not force anyone accepting an internship to move. We have worked with students from around the world.<br />
<br />
===Do I need to keep regular business hours?===<br />
'''Q:''' If I'm selected, can I work any time of the day that I want?<br />
<br />
'''A:''' We are going to favor coders that are available for a significant amount of time Monday to Friday during [[http://en.wikipedia.org/wiki/Eastern_Standard_Time_Zone EST]] business hours. We have found that synchronous communication is key to working together effectively. Consider this a requirement.<br />
<br />
===Can I have other jobs/internships/consulting gigs?===<br />
'''Q:''' I've got a consulting gig, an internship or another job lined up. Cool?<br />
<br />
'''A:''' No. We want your full attention for the summer - it's a real job, with a real commitment from both sides.<br />
<br />
===Do I have to be Harvard-affiliated already?===<br />
'''Q:''' Is this limited to Harvard and/or Berkman Klein coders?<br />
<br />
'''A:''' No. This is open to any and all that would like to apply and we warmly welcome new folks to our community. In fact we very much look foroward to contributors joining from out side the Harvard community as it gives us a fresh perspective.<br />
<br />
===Will applying to a certain project give me an advantage?===<br />
'''Q:''' Are some projects a higher priority than others? If so, what are they?<br />
<br />
'''A:''' No. Our selection depends on the strength of the applicants and the strength of the applications. We are most interested in finding the right candidates.<br />
<br />
===Are there any preferred languages/frameworks?===<br />
'''Q:''' What are they?<br />
<br />
'''A:''' We prefer that the language and framework for a proposal match the language or framework an existing project is written in. For projects that are orthogonal to an existing code base, we prefer Ruby, JavaScript, Python, or PHP. We prefer MVC frameworks or micro-frameworks: Rails or Sinatra for Ruby, Django for Python, et cetera. Our PHP work mostly extends Drupal or WordPress. There is some flexibility in frameworks but less in languages. We are not interested in proprietary languages at all, nor in .NET.<br />
<br />
These preferences are based on the skill set of the Berkman Klein geek team and our long-term ability to maintain and host a limited set of languages.<br />
<br />
===Do you accept late applications?===<br />
'''Q:''' I'm really late to this. Can I still apply after the deadline?<br />
<br />
'''A:''' No. April 9, 2019 at 13:00 (EDT) / 18:00 (UTC) is a [https://developers.google.com/open-source/gsoc/timeline hard deadline]. Google's policies do not permit us to consider late applications.<br />
<br />
===Where can I get more information?===<br />
'''Q:''' I'm still confused. Where can I find out more about GSoC program specifics?<br />
<br />
'''A:''' No fear! The [https://summerofcode.withgoogle.com GSoC homepage] has some great general resources for interested students. We also suggest checking out:<br />
<br />
* The [https://google.github.io/gsocguides/student/ GSoC Guides]<br />
* The official [https://developers.google.com/open-source/gsoc/faq GSoC FAQ]<br />
<br />
===How will the Berkman Klein Center for Internet & Society be evaluating students?===<br />
<br />
'''Q:''' How can I show that I am interested, energized, knowledgeable and likeable? I really want to work on one of the projects that is listed on your ideas page and I’m unsure of where to show my interest.<br />
<br />
'''A:''' We are looking for a student that is technically skilled, has good communication skills, is a hard worker, and has set enough time aside to allow them to succeed. Student applicants can show technical skill and knowledge by sharing code for projects they have written previously or are currently working on. We appreciate previous experience in free/open source projects (e.g. contributions visible in GitHub), in projects related to the one applying, or in previous GSoCs. Student applicants can also check out the codebase, if one for the projects exists, and become familiar with the underlying languages and technologies.<br />
<br />
Student applicants can show their communication skills by having thoughtful discussions with the project mentors and responding to communications in a timely fashion. We would love for our student applicants to be able to ask good questions. By this, we mean having taken the diligence to check out the FAQs, documentation, taken time to check out the code, and tried to install and/or run the app. <br />
<br />
We’d love you to be able to be familiar with the project itself so that you know you would be excited to work on it over the course of the summer. We want to provide you with the information you need to be excited about the project and know it is a fit for you.<br />
<br />
===Can I start contributing now?!===<br />
<br />
'''Q:''' Can I start submitting pull requests? What can I do to get ahead in the application process? I want to get started!<br />
<br />
'''A:''' We appreciate the excitement, but we'd prefer applicants hold off. We trust you can fix the small bugs, but we're more interested in your approach to solving larger problems. The bigger things are more interesting to you and to us. Think about how you would spend your time this summer. What would you do to improve the project besides the small fixes? We are interested both in your technical skill, but also your holistic understanding of the project. Ultimately, we will judge your application by the thoughtfulness of your proposal.<br />
<br />
===What does “loosely defined” mean?===<br />
<br />
'''Q:''' I want a specific list of issues to tackle. Why don’t you provide any of that for the “loosely defined” projects?<br />
<br />
'''A:''' For some projects, part of the exercise for you this summer will be figuring out your own path forward. For spme projects we’re in the very early stages--we haven’t even figured out schema yet. We want you to be creative, define your own methods, and identify areas you think are worth exploring. While we welcome fresh and new ideas this blue sky approach can be very difficult. If one of these "loosely defined" projects is something you want to pursue we highly recommend taking the time to think very thoroughly through the project and be in contact with the mentor to discuss the project in more detail.<br />
<br />
===How many projects can I apply to?===<br />
<br />
'''Q:''' Wow there are a lot of very interesting projects that The Berkman Klein Center has and I want to apply to work on a few of them. What should I do?<br />
<br />
'''A:''' We will certainly accept applications to multiple projects and applying to more then one doesn't mean that a candidate will be rejected. That said we encourage students to focus on one project as it often yields better and more thought out responses to the challenges posed by each of the projects. If you are accepted you will only be working on one project.<br />
<br />
<br />
===What should I include in my application?===<br />
<br />
'''Q:''' I don't know what to include in my application?<br />
<br />
'''A:''' We have provided an application template that you can use. This includes some basic information for us to get to know you and a section to see what your proposal is. In your proposal we would love to see well thought out approaches to how you plan to implement the features you wish to take on for the summe. A well thought out application will include a timeline with milestones for your work throughout the summer.</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=GSoC_FAQ&diff=658GSoC FAQ2019-03-28T18:39:00Z<p>BerkmanSysop: /* Do you accept late applications? */</p>
<hr />
<div>This is the page for some of the frequently asked questions by the prospective participants of the Google Summer of Code.<br />
<br />
__TOC__ <br />
==General Questions==<br />
<br />
===Am I required to be local?===<br />
'''Q:''' If someone is selected as a student coder through the Summer of Code, will they need to be in the Boston area over the summer?<br />
<br />
'''A:''' No, we are not asking anyone to move to Boston for the summer. While you are most welcome to come and work at the Berkman Klein Center if you are selected for an internship, we will not force anyone accepting an internship to move. We have worked with students from around the world.<br />
<br />
===Do I need to keep regular business hours?===<br />
'''Q:''' If I'm selected, can I work any time of the day that I want?<br />
<br />
'''A:''' We are going to favor coders that are available for a significant amount of time Monday to Friday during [[http://en.wikipedia.org/wiki/Eastern_Standard_Time_Zone EST]] business hours. We have found that synchronous communication is key to working together effectively. Consider this a requirement.<br />
<br />
===Can I have other jobs/internships/consulting gigs?===<br />
'''Q:''' I've got a consulting gig, an internship or another job lined up. Cool?<br />
<br />
'''A:''' No. We want your full attention for the summer - it's a real job, with a real commitment from both sides.<br />
<br />
===Do I have to be Harvard-affiliated already?===<br />
'''Q:''' Is this limited to Harvard and/or Berkman Klein coders?<br />
<br />
'''A:''' No. This is open to any and all that would like to apply and we warmly welcome new folks to our community. In fact we very much look foroward to contributors joining from out side the Harvard community as it gives us a fresh perspective.<br />
<br />
===Will applying to a certain project give me an advantage?===<br />
'''Q:''' Are some projects a higher priority than others? If so, what are they?<br />
<br />
'''A:''' No. Our selection depends on the strength of the applicants and the strength of the applications. We are most interested in finding the right candidates.<br />
<br />
===Are there any preferred languages/frameworks?===<br />
'''Q:''' What are they?<br />
<br />
'''A:''' We prefer that the language and framework for a proposal match the language or framework an existing project is written in. For projects that are orthogonal to an existing code base, we prefer Ruby, JavaScript, Python, or PHP. We prefer MVC frameworks or micro-frameworks: Rails or Sinatra for Ruby, Django for Python, et cetera. Our PHP work mostly extends Drupal or WordPress. There is some flexibility in frameworks but less in languages. We are not interested in proprietary languages at all, nor in .NET.<br />
<br />
These preferences are based on the skill set of the Berkman Klein geek team and our long-term ability to maintain and host a limited set of languages.<br />
<br />
===Do you accept late applications?===<br />
'''Q:''' I'm really late to this. Can I still apply after the deadline?<br />
<br />
'''A:''' No. April 9, 2019 at 13:00 (EDT) / 18:00 (UTC) is a [https://developers.google.com/open-source/gsoc/timeline hard deadline]. Google's policies do not permit us to consider late applications.<br />
<br />
===Where can I get more information?===<br />
'''Q:''' I'm still confused. Where can I find out more about GSoC program specifics?<br />
<br />
'''A:''' No fear! The [https://summerofcode.withgoogle.com GSoC homepage] has some great general resources for interested students. We also suggest checking out:<br />
<br />
* The [https://google.github.io/gsocguides/student/ GSoC Guides]<br />
* The official [https://developers.google.com/open-source/gsoc/faq GSoC FAQ]<br />
<br />
===How will the Berkman Klein Center for Internet & Society be evaluating students?===<br />
<br />
'''Q:''' How can I show that I am interested, energized, knowledgeable and likeable? I really want to work on one of the projects that is listed on your ideas page and I’m unsure of where to show my interest.<br />
<br />
'''A:''' We are looking for a student that is technically skilled, has good communication skills, is a hard worker, and has set enough time aside to allow them to succeed. Student applicants can show technical skill and knowledge by sharing code for projects they have written previously or are currently working on. We appreciate previous experience in free/open source projects (e.g. contributions visible in GitHub), in projects related to the one applying, or in previous GSoCs. Student applicants can also check out the codebase, if one for the projects exists, and become familiar with the underlying languages and technologies.<br />
<br />
Student applicants can show their communication skills by having thoughtful discussions with the project mentors and responding to communications in a timely fashion. We would love for our student applicants to be able to ask good questions. By this, we mean having taken the diligence to check out the FAQs, documentation, taken time to check out the code, and tried to install and/or run the app. <br />
<br />
We’d love you to be able to be familiar with the project itself so that you know you would be excited to work on it over the course of the summer. We want to provide you with the information you need to be excited about the project and know it is a fit for you.<br />
<br />
===Can I start contributing now?!===<br />
<br />
'''Q:''' Can I start submitting pull requests? What can I do to get ahead in the application process? I want to get started!<br />
<br />
'''A:''' We appreciate the excitement, but we'd prefer applicants hold off. We trust you can fix the small bugs, but we're more interested in your approach to solving larger problems. The bigger things are more interesting to you and to us. Think about how you would spend your time this summer. What would you do to improve the project besides the small fixes? We are interested both in your technical skill, but also your holistic understanding of the project. Ultimately, we will judge your application by the thoughtfulness of your proposal.<br />
<br />
===What does “loosely defined” mean?===<br />
<br />
'''Q:''' I want a specific list of issues to tackle. Why don’t you provide any of that for the “loosely defined” projects?<br />
<br />
'''A:''' For some projects, part of the exercise for you this summer will be figuring out your own path forward. For spme projects we’re in the very early stages--we haven’t even figured out schema yet. We want you to be creative, define your own methods, and identify areas you think are worth exploring. While we welcome fresh and new ideas this blue sky approach can be very difficult. If one of these "loosely defined" projects is something you want to pursue we highly recommend taking the time to think very thoroughly through the project and be in contact with the mentor to discuss the project in more detail.<br />
<br />
===How many projects can I apply to?===<br />
<br />
'''Q:''' Wow there are a lot of very interesting projects that The Berkman Klein Center has and I want to apply to work on a few of them. What should I do?<br />
<br />
'''A:''' We will certainly accept applications to multiple projects and applying to more then one doesn't mean that a candidate will be rejected. That said we encourage students to focus on one project as it often yields better and more thought out responses to the challenges posed by each of the projects. If you are accepted you will only be working on one project.</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Redmine_Application_Tracker&diff=656Redmine Application Tracker2019-03-18T14:55:59Z<p>BerkmanSysop: old project</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
Application Tracker an open source tool to help us manage applications to our employment, fellowship, and internship opportunities. It is a Redmine plugin that serves as an integral part of our internal workflow, while also allowing applicants to better manage and control their application materials and profile.<br />
<br />
Application Tracker was originally built for us as a summer project for GSoC 2010. Check out the initial project description here: http://cyber.law.harvard.edu/gsoc2010/Application_Tracker<br />
<br />
The code can be found here: https://github.com/berkmancenter/redmine_app_tracker<br />
<br />
More information on Redmine can be found here: http://www.redmine.org/<br />
<br />
There are several small and large enhancements we would like to make to this tool. The biggest would be better integration with Redmine itself. This project will be a great opportunity to learn more about Ruby on Rails as well as plugins for Redmine. We would love to hear your thoughts and suggestions so if you are interested in this project, contact our organization and the mentor can discuss the enhancements further.</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Paper_Machines/&diff=655Paper Machines/2019-03-18T14:55:48Z<p>BerkmanSysop: old project</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
Paper Machines is the project of a metaLAB-affiliated scholar seeking to develop a scripting, analysis, and visualization toolkit for rapidly transforming the ephemeral, paper-based archives of development and advocacy organizations into digital textual archives durable and flexible enough to be used by scholars, journalists, and political actors.<br />
<br />
Working with the scholar, the coder will specify and develop tools for batch-processing large numbers of scanned documents into corpora parsable by regular expression, named entities, geospatial, and other data types, and deliver those copora to a web-based interface for displaying analyses and visualizations.<br />
<br />
The incumbent will report to Joann Guldi, historian (Harvard Society of Fellows) and Matthew Battles (metaLAB).<br />
<br />
Skills desired include: Python, Ruby, Processing, and development of web interfaces and applications. Understanding of the needs of data-driven digital humanities research using large textual corpora in pdf, plain-text, html, and xml formats.</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Video_Player&diff=654Video Player2019-03-18T14:55:03Z<p>BerkmanSysop: old project</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
If you are interested in the mechanics behind video-heavy sites like TED, Ustream, and YouTube - this is the place for you!<br />
<br />
Some of the features we are looking to code/build into our video player:<br />
<br />
*Embedding/sharing/social bookmarking options<br />
*Custom embedding option for longer content - video snippets via user selected start and end points, and auto-generating embed code<br />
*Allow user to "pop-out", resize, select viewing quality options on video window<br />
*Allow "chaptering" and linking to timecode within video timeline<br />
*Ability to embed basic text and links on top of video<br />
*Ability to broadcast slides and presenter side-by-side<br />
*Enhancements to our live webcast environment, including spaces for live chatting and twitter hashtag stream<br />
<br />
We'd love to pick a few of the above enhancements to work on over the summer. We will share the results both in our interactive space, and with the world as an open source features library.</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Tools_for_Time_Travel&diff=653Tools for Time Travel2019-03-18T14:54:29Z<p>BerkmanSysop: old project</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
Tools for Time Travel is a project, in cooperation with the [http://librarylab.law.harvard.edu/ Harvard Library Innovation Lab], to build tools for "time capsule encryption." Time capsule encryption allows messages to be sent securely into the future so they cannot be read by anyone, including their intended recipient, until a particular date or event.<br />
<br />
We are researching a number of separate tools and techniques:<br />
<br />
*A distributed network for generating and publishing distributed M-of-N public/private keypairs according to a set schedule.<br />
*A desktop tool for sharding and recombining files using mathematically secure secret sharing techniques.<br />
*A tool for storing digital shards in a visual format on paper for longterm archival storage.<br />
<br />
=== Ideal candidate criteria ===<br />
<br />
We are looking for candidates who want to work on one of these projects or develop new tools and techniques related to time capsule encryption (anything from cryptocurrency integration to undersea data beacons -- you tell us).<br />
<br />
In your proposal, please explain what specific strategies or algorithms you believe will be most successful for time capsule encryption; what adversaries your strategy is designed to defend against; and why users would benefit from using your strategy over existing options.</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Teem&diff=652Teem2019-03-18T14:54:09Z<p>BerkmanSysop: old project</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
==Teem==<br />
Teem, an app focused on increasing the participation and sustainability of commons-based peer production communities (e.g. wikis, makerspaces), open online communities (networks, open organizations) or social movements (social centers, collectives). After doing intensive social research and prototype testing, we are aware of the main needs of the different roles within a community (following the classical 1-9-90 rule: core, occasional collaborators and users), and the tools they typically lack (related to management and internal organization). The app is grounded on these findings to reduce the frustrations of all participants and increase participation (90s=>9, 9s=>1)... while providing a kind-of project management tool for communities (but informal/liquid/open to fit the context) together with a work-space with collaborative edition (like a google-doc) and a group chat (like a whatsapp/telegram group). You have a quick presentation of this in http://tiny.cc/teem-slides and the current web-app in http://teem.works There is also an Android app encapsulating the web-app: http://tiny.cc/teemapp Code in https://github.com/P2Pvalue/teem<br />
<br />
===Ideal candidate:===<br />
Teem is interested in proactive candidates with experience in Javascript, HTML and CSS, and ideally experience with the AngularJS framework. Qualities that we would welcome are initiative, creativity, and interest/experience with communities and/or social movements. You may check Github’s open issues and the project ideas below to have an overall ideas of the possible evolutions of Teem. Of course, GSoC candidates are encouraged to adapt our proposals to their interests and we are very open to new ideas or unexpected evolutions of chosen ones.<br />
<br />
==Project ideas:==<br />
<br />
====Reputation-based or gratitude-based immaterial rewards====<br />
Participants in collaborative communities are sometimes rewarded in several ways, e.g. reputation (e.g. Ebay), badges (e.g. Stack Exchange), thanks (e.g. Open Subtitles). We would like to experiment with different types of rewards, in order to see how participants react to them. We are especially interested in exploring the “thanks” as a reward. This may derive in a research article eventually.<br />
<br />
''Knowledge recommended'': Javascript (ideally AngularJS), HTML and CSS.<br />
<br />
''Mentors'': Samer [mailto:shassan@cyber.law.harvard.edu shassan@cyber.law.harvard.edu], Antonio [mailto:antoniotenorio@ucm.es antoniotenorio@ucm.es]<br />
<br />
====Meeting minutes tool====<br />
<br />
Meetings are crucial in collaborative communities, but there does not exist a proper tool that provides an appropriate solution to taking meeting minutes and its multiple issues: sorting the agenda, prioritize certain points, curating a good record after several meetings, filtering tasks and agreements to communicate them in an efficient way to people who couldn't attend… Teem has already a real-time collaborative space (“pad”, from etherpad) ready for each working group, and the project is already being extended for the specific use of taking minutes in face to face meetings. This project will build onto the (yet to be developed) tools for face to face minute taking tools, adding features, and expanding it in a direction of interest by the GSoC student. <br />
<br />
''Knowledge recommended:'' Javascript (ideally AngularJS), HTML and CSS. <br><br />
''Mentors'': Samer [mailto:shassan@cyber.law.harvard.edu shassan@cyber.law.harvard.edu], Antonio [mailto:antoniotenorio@ucm.es antoniotenorio@ucm.es]<br />
<br />
====Update Teem to cutting-edge technologies====<br />
Web technologies evolves in a frenetic speed, and what was state of the art technologies 3-4 years ago might be deprecated today. Far from just blind following fancy fashions, some of the innovations deserve to be considered in order to bring stability, scalability and sustainability to Teem. In this project, the GSoC student will acquire knowledge of cutting-edge technologies. We are considering three technology update candidates: 1) push mobile notifications without GCM, 2) SwellRT and 3) Angular 2. The GSoC project proposal may consider one, two, or the three of them. <br />
<br />
# Push notifications in Android have traditionally relied in the privative and centralized service Google Cloud Messaging (GCM) (nowadays renamed Firebase Cloud Messaging). This has been an important issue for the Free/Open Source Software communities and advocates, as it forces all Android apps to communicate their activity to Google. Recent efforts have been done by other projects to provide free/open source working alternatives for mobile push notifications, the most recent and famous being Signal’s anouncement https://github.com/WhisperSystems/Signal-Android/commit/1669731329bcc32c84e33035a67a2fc22444c24b. Teem would benefit from this changes by providing a more Free/Open Source Software and privacy friendly tool that can be endorsed by organizations such as the Free Software Foundation or the F-Droid repositories. In addition, if done general enough, it would facilitate the path to other tools willing to make the same effort. This transition difficulty is a-priori considered as '''medium-hard'''.<br />
# One of the most critical dependencies Teem has is '''SwellRT''', the distributed real-time collaboration framework we developed for Teem. SwellRT has evolved a lot and nowadays has a new API, more developer friendly. However, Teem is still using the old deprecated version, and an update to the new version is needed. This transition difficulty is a priori considered to be '''easy-medium'''.<br />
# Teem’s code is developed with '''Angular''', a JavaScript framework which a few months ago proposed a new version (Angular 2) which is not backwards compatible. This idea project would study the adequacy and viability of migrating Teem to Angular 2 and to update Teem if the benefits are clear. There are multiple tutorials to migrate from Angular to Angular 2, which would facilitate the process. This transition difficulty is a priori considered as '''medium''', although with high investment of time/effort.<br />
<br />
''Knowledge recommended'': Javascript (ideally Angular), HTML and CSS. For GCM, Android Java.<br />
''Mentors'': Samer [mailto:shassan@cyber.law.harvard.edu shassan@cyber.law.harvard.edu], Antonio [mailto:antoniotenorio@ucm.es antoniotenorio@ucm.es]<br />
<br />
<br />
===Integrate Teem with third-party apps===<br />
Internet applications do not exist as independent islands but rather as a part of a live ecosystem where everything is connected. Teem aims to be the best tool for communities to collaborate and share their projects. In order to become so, it has to be seamlessly integrated with existing communities tools and workflows. As an example of a success case, Slack has become the chosen chat tool for many groups due to their wide integration options.<br />
<br />
Teem can be extended in many ways to provide an interconnected and smooth experience for communities. Some integrations ideas are:<br />
# Providing context information about links (e.g. like Facebook or Telegram integrate inserted links, recognizing it as a video, a piece of news, a document...).<br />
# Sharing tasks and sentences from Teem’s projects in Social Networks (e.g. as Medium blog posts sentences can be shared in Twitter).<br />
# Integrate with other collaborative project management tools used by communities such as Trello by sharing the tasks and their state across tools.<br />
<br />
Despite these, other integrations with third-party apps could be explored by the project.<br />
Knowledge recommended: Javascript (ideally Angular), HTML and CSS.<br />
''Mentors'': Samer [mailto:shassan@cyber.law.harvard.edu shassan@cyber.law.harvard.edu], Antonio [mailto:antoniotenorio@ucm.es antoniotenorio@ucm.es]</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=SwellRT&diff=651SwellRT2019-03-18T14:53:54Z<p>BerkmanSysop: old project</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
SwellRT, is a free/open source backend as a service that allows faster development of apps providing -out of the box- common features found in any modern application (auth, real-time storage, event based integration...). In addition, servers can be federated on Internet through the Matrix protocol, enabling decentralization of data and users, and interoperability among providers. It has similarities with Google Firebase, Meteor or Realm frameworks, but it provides stronger capabilities for collaborative text editing and a simpler API. More at http://swellrt.org and https://github.com/P2Pvalue/swellrt<br />
<br />
<br />
===Ideal candidate:===<br />
SwellRT is interested in proactive candidates with experience in backend development and mobile platforms. Depending on the project, different sets of programming languages are required (as specified in the ideas below), although SwellRT core components are developed using Java and JavaScript. Qualities that we would welcome are initiative, creativity, and interest/experience with decentralized / federated / distributed systems or protocols. You may check Github’s open issues and the project ideas below to have an overall idea of the possible evolutions of SwellRT. Of course, GSoC candidates are encouraged to adapt our proposals to their interests and we are very open to new ideas or unexpected evolutions of chosen ones.<br />
<br />
===Project ideas:===<br />
<br />
====Android Native Client====<br />
To develop a native client library of SwellRT for the Android platform. Starting with the current Java client implementation, some platform dependent components should be replaced with native ones for Android, mainly, the HTTP and Websocket client libraries. This work would also require knowledge of Android's threading model, Gradle tool, and Java/Android application packaging models. <br />
Knowledge recommended: Java and Android, HTTP and Websocket protocols.<br />
<br><br />
'''Mentors:''' Samer [mailto:shassan@cyber.law.harvard.edu shassan@cyber.law.harvard.edu], Pablo [mailto:pablojan@ucm.es pablojan@ucm.es]<br />
<br />
====Access control for real-time JSON objects====<br />
SwellRT provides a real-time storage of JSON objects that can be edited by different users at the same time. In some scenarios, some properties of JSON objects are not desirable to be accessed (or edited) by anyone except for specific users. In this regard, the GSoC intern would need to design and implement an access control layer for JSON objects that can be accessed by different users. That layer would be integrated in the SwellRT implementation in Java. Also, the JavaScript client will be used for testing. To perform this task, the intern would have to understand the concepts of the Wave Protocol.<br />
''Knowledge recommended:'' Java, JavaScript and security policies.<br />
<br><br />
'''Mentors:''' Samer [mailto:shassan@cyber.law.harvard.edu shassan@cyber.law.harvard.edu], Pablo [mailto:pablojan@ucm.es pablojan@ucm.es]<br />
<br />
====iOS Native Client====<br />
To develop a native client library of SwellRT for the iOS platform. Starting with the current Java client implementation, the base Objective-C version should be generated using the Java2Objc tool. After this, some platform dependent components should be replaced with native ones for iOS , mainly, the HTTP and Websocket client libraries. This work would also require knowledge of iOS threading model and application packaging. <br />
Knowledge recommended: Java and Objective-C, HTTP and Websocket protocols.<br />
<br><br />
'''Mentors:''' Samer [mailto:shassan@cyber.law.harvard.edu shassan@cyber.law.harvard.edu], Pablo [mailto:pablojan@ucm.es pablojan@ucm.es]<br />
<br />
====High-performance Server ====<br />
SwellRT server is a Jetty-based Java application. It has an internal bus architecture and a custom thread model. As long as SwellRT is based on processing live transformation operations we would like to refactor current code base to provide a better scalable implementation, ideally based on a non-blocking IO server and an actor based architecture.<br />
Knowledge recommended: Java, HTTP, Websocket, non blocking IO servers and actor based frameworks.<br />
<br><br />
'''Mentors:''' Samer [mailto:shassan@cyber.law.harvard.edu shassan@cyber.law.harvard.edu], Pablo [mailto:pablojan@ucm.es pablojan@ucm.es]</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Online_Media_Legal_Network&diff=650Online Media Legal Network2019-03-18T14:53:42Z<p>BerkmanSysop: old project</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
The software that runs the site [http://www.omln.org www.omln.org] is built as a plug-in to the excellent open-source Rails CMS [http://www.browsercms.org BrowserCMS]. It has always been our intention to release the plugin under an open-source license. The application has a fairly flexible system of [http://www.browsercms.org/doc/guides/html/developer_guide.html#creating-a-custom-content-block ContentBlocks] and [http://www.browsercms.org/doc/guides/html/developer_guide.html#creating-a-custom-portlet Portlets] that integrate to create admin and user-level tools.<br />
<br />
The core purpose of the OMLN application is to allow lawyers and law school clinics that provide free or low-cost legal services to get matched via a web application with clients - primarily online publishers and independent journalists - that need legal help. As a bonus, it contains a full-fledged Content Management System because of it's integration with BrowserCMS. The OMLN application could be useful to legal aid providers and/or organizations providing counseling or other professional services. <br />
<br />
<br />
'''Definitions:'''<br />
* Clients are seeking services.<br />
* Matters are the services they are seeking.<br />
* Members provide the services to clients.<br />
* Managers manage everything.<br />
<br />
<br />
'''Possible projects:'''<br />
* Help clean up the codebase and make it ready for public release,<br />
* Create a simpler way to deploy and configure the plugin, sort of a pre-configured "toolkit" approach for different service providers,<br />
* Create a richer interface for the management of Clients, Members, and Matters.<br />
* Create a set of tools to make accepting new Matters and Client possible via the web. Currently they are vetted manually and then entered into the application.<br />
* Create a richer full-text search interface for Members and Managers.<br />
<br />
Your ideas are welcome!</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Next_Generation_Video_Player&diff=649Next Generation Video Player2019-03-18T14:53:31Z<p>BerkmanSysop: old project</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
The Berkman Center has hundreds of multimedia files from influential speakers on copyright, the internet, the intersection of law and technology and multitudinous other topics - start [[http://cyber.law.harvard.edu/interactive here]] for a sample.<br />
<br />
If you're interested in the mechanics behind video-heavy sites like TED, Ustream, and YouTube - this is the place for you!<br />
<br />
We're looking to build a next-generation client-side video player in flash and - ideally - HTML5, giving users the options to toggle their preference.<br />
<br />
Some of the features we are looking to code/build into our video player:<br />
<br />
* Embedding/sharing/social bookmarking options<br />
* Custom embedding option for longer content - video snippets via user selected start and end points, and auto-generating embed code<br />
* Allow user to "pop-out", resize, select viewing quality options on video window<br />
* Allow "chaptering" and linking to timecode within video timeline<br />
* Ability to embed basic text and links on top of video<br />
* Ability to broadcast slides and presenter side-by-side<br />
* Enhancements to our live webcast environment, including spaces for live chatting and twitter hashtag stream<br />
<br />
We're considering starting from the awesome (and open source) gStreamer based [[http://flumotion.org flumotion]] streaming framework - so you'd have a chance to help deploy a fairly large corpus of video in a linux environment.<br />
<br />
We'd love to pick a few of the above enhancements to work on over the summer. We will share the results both in our interactive space, and with the world under an open source license.<br />
<br />
Ideas?</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=ListenLog&diff=648ListenLog2019-03-18T14:52:49Z<p>BerkmanSysop: old project</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
ListenLog is a way for individuals to log their own listening activity -- to online streams, to podcasts, to other audio sources. It logs in a standard and open format, capturing an individual's listening actions from multiple applications. ListenLog is unique in that '''its aim is to give the user the ability to accumulate and control the use of his or her own listening data'''. In other words, the data is the user's. It does not belong to any vendor or intermediary. Not is it meant to provide marketing fodder to any seller.<br />
<br />
With ListenLog the idea is that the user alone has control over where his or her data lives, what applications write to it, what gets logged, who to share it with, and how it can be used.<br />
<br />
The ListenLog concept was developed by [http://projectvrm.org ProjectVRM], of the Berkman Center. An early version is currently being prototyped for the [http://publicradioplayer.org Public Radio Player]. In the long run, however, it should log any listening activity on any device.<br />
<br />
The code developed for ListenLog should also apply to other forms of [http://cyber.law.harvard.edu/projectvrm/Media_Logging media logging].<br />
<br />
[http://cyber.law.harvard.edu/projectvrm/ListenLog Here is the ListenLog development page] at the [http://cyber.law.harvard.edu/projectvrm/ ProjectVRM wiki].</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=H2O&diff=647H2O2019-03-18T14:52:39Z<p>BerkmanSysop: old project</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
H2O is a suite of tools for teaching online. The major components are:<br />
<br />
*The question tool: A backchannel application for use during a class. It has much broader utility as an asynchronous discussion system and conference backchannel application.<br />
*The rotisserie: A tool to manage discussion around a particular topic in a way that ensures no one user dominates.<br />
*The playlist: A tool to aggregate content from within an h2o instance and (ideally, in a delicious like fashion) from around the web.<br />
*Collages: A tool that allows arbitrary text to be annotated with tagged layers. This is most useful to create "social casebooks" used in the teaching of specific legal concepts, but has broader application for the annotation and discussion of complex topics.<br />
<br />
Most items can be forked and copied in a rudimentary github fashion, and retain their lineage in an effort to build out a social, collaborative experience. The source code will be released very soon, most likely under an AGPL license.<br />
<br />
The alpha-level h2o system is in use here ( link to http://h2odev.law.harvard.edu/ ).<br />
<br />
Some possible directions include:<br />
*Building out the playlist system to create a richer UI,<br />
*Building out a better playlisting UI for adding arbitrary items from around the web - youtube, vimeo, flickr images, wikipedia pages, arbitrary HTML pages, etc.<br />
*Create a delicious-like bookmarklet for adding and categorizing an item in a playlist<br />
*Work to improve javascript / ajax performance in the collage system.<br />
*Extract the question tool to function as a standalone application.<br />
*Improve the API to allow for the federation of separate H2O installations.<br />
*Improve the test suite.</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Data_Portraits&diff=646Data Portraits2019-03-18T14:51:59Z<p>BerkmanSysop: old project</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
The long-term goal of the project is to develop a series of visualizations of people based on their digital data. (See http://vivatropolis.com/judith/papers/DataPortraits.Siggraph.Leonardo.pdf )<br />
<br />
This project will focus on portraying Twitter users. The goal of the project is to create a visualization that gives the viewer a more intuitive sense of the interests of a Twitter user and their role in the community.<br />
<br />
<br />
The first stage of the project is data collection: <br />
*writing the code to download a given user's tweets<br />
*download the tweets of those they follow<br />
*summarize who follows their followers<br />
**how many followers they had<br />
**how many they were following<br />
<br />
<br />
The second part of the project is visualization: <br />
*designing and coding an evocative, legible and visually appealing representation of this data<br />
*topic-modeling and other NLP analysis of the subject's postings<br />
*recommend using [http://processing.org/ Processing], but open to other suggestions<br />
<br />
<br />
Key skills:<br />
*linguistic analysis<br />
*graphic design and animation<br />
*database management<br />
<br />
<br />
==UPDATE==<br />
<br />
Several of you have asked for more detail about how I envision the<br />
project and about what information I am seeking in the application.<br />
<br />
Here is one scenario:<br />
<br />
One of the things that is interesting about Twitter is the asymmetric<br />
social structure. Someone who follows all and only those people who<br />
follow them will have an entirely reciprocal network, but most people<br />
have a mix of reciprocal relationships and one way ones (both followers<br />
whom they do not not follow and followees who do not follow them).<br />
<br />
The scale and balance of these relationships is revealing. Someone with<br />
far more followers than they follow uses Twitter more in a publishing<br />
mode or is a celebrity. Those with predominantly reciprocal<br />
relationships may use it more socially. So, the portraits should show<br />
this scale and balance.<br />
<br />
What the subject says is also interesting. So another challenge is<br />
represented compactly the gist of what they say. There are many<br />
possible approaches here - from simple representations of typical words<br />
to topic modeling or sentiment analysis. Are they someone who posts<br />
about politics? TV shows? What they ate for breakfast? The rhythm of<br />
their postings is also relevant.<br />
<br />
Similarly, what the people the subject follows have to say is also<br />
interesting, for it shows what the subject sees when using Twitter. So,<br />
along with the number of followees, we want to show something of the<br />
stream that the subject sees.<br />
<br />
The subject's followers are of interest especially in terms of what they<br />
say about the subject's reach. Do they themselves have many followers?<br />
<br />
Is the subject one of a few people they follow, or is he or she mostly<br />
followed by people who follow lots of others? Is the subject's words<br />
retweeted?<br />
<br />
Can we find patterns of interaction: retweets, @ mentions? Does the<br />
subject use #s ? Are they for say conferences or the more social<br />
topic of the moment ones?<br />
<br />
Such a portrait could help people make sense of others they see on<br />
twitter - say someone retweets you or a friend of yours is in a<br />
discussion with them - who is this person? This portrait could answer<br />
that at a glance.<br />
<br />
It would make most sense as part of a group of portraits, where you<br />
could see how people differed from each other in this depiction.<br />
<br />
It could be multi-layered and interactive: something with a compact<br />
initial version that you could explore more deeply. For example, it<br />
might go from a small picture portraying the highlights of the data, to<br />
a big and detailed one that could then be explored to show the the<br />
network of connections and their inter-relationships starting with this<br />
user.<br />
<br />
A version of this could be an exhibition piece - data portrait as art<br />
exhibit. I would certainly like it to be a publicly accessible, web<br />
application, that people could use to see themselves and others on<br />
twitter in a new way.<br />
<br />
It should be a model for people to realize the potential of a richly<br />
detailed visualization as an "avatar". In particular, given the rush to<br />
insist on "real names" in many discussion groups, I think such portraits<br />
can make the argument that a pseudonymous identity with an extensive and<br />
intuitively depicted data history can be for many purposes a better form<br />
of identification.<br />
<br />
-----<br />
<br />
In your application, be sure to include your relevant background. What<br />
coding projects have you done, what what technologies and languages are<br />
you familiar with? Give me an idea of what you find interesting about<br />
this project. You might want to outline how you would approach<br />
implementing it. <br />
<br />
<br />
Mentor: [mailto:jdonath@cyber.law.harvard.edu jdonath@cyber.law.harvard.edu]<br />
<br />
General Questions: [mailto:berkmancenterharvard@gmail.com berkmancenterharvard@gmail.com]</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Curarium&diff=645Curarium2019-03-18T14:51:46Z<p>BerkmanSysop: old project</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
Curarium is a platform for exploring, analyzing, and making arguments about collections and the objects they comprise. It leverages the power of collections to tell stories by giving users tools ranging from item-level annotations to comprehensive, repository-wide visualizations, allowing them to bring both objects and the communities to which they belong into dialogue with one another.<br />
<br />
Curarium isn’t an online exhibition platform, but an environment for pursuing and sharing collections-based research nimbly, intuitively, and iteratively. Browse vast numbers of objects, using an expanding library of visualization tools to generate dynamic data portraits of collections. Annotate records and images, curating them to highlight relationships and juxtapositions. Assemble those records into trays of objects, images, and visualizations to share and work collaboratively with your social circles, and transform trays into published spotlights that unlock the stories and arguments bound up in collections.<br />
<br />
More information: [http://www.curarium.com http://www.curarium.com]<br />
<br />
GitHub repo: https://github.com/berkmancenter/curarium<br />
<br />
===Ideal candidate criteria===<br />
<br />
Curarium is interested in candidates with experience with Ruby on Rails, JavaScript, HTML5, Booststrap, AJAX, JSON, and PostgreSQL.<br />
<br />
Example sub-projects include:<br />
<br />
*Visualizations of works of art (thumbnails, titles, topics, other properties) within and across library collections (including brainstorming, sketching, implementing, and testing)<br />
<br />
*Curarium as a Learning Tools Interoperability (LTI) component for an LTS (e.g. [http://www.canvaslms.com/ Canvas])<br />
**embed individual works into LTS<br />
**embed tray of images/annotations into LTS<br />
**embed works visualization (whole collection or search results) into LTS<br />
<br />
<br />
*Tighter integration between WAKU (spotlight/story creation web app) via JSON APIs<br />
<br />
*Collection extraction from libraries and subsequent importing to Curarium; not just the act but also improving the online process</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Cohort&diff=644Cohort2019-03-18T14:51:24Z<p>BerkmanSysop: old project</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
We're looking to continue our work on the Cohort CRM. It's a tag-based CRM targetted to the needs of small teams in a non-sales environment - so non-profits and/or mission-based organizations looking to track their interactions and relationships with individuals and other organizations. It implements a hierarchical tagging system that allows for free tagging, where those tags live in a hiearchy as well. It's a modern Rails3 application targetted for ruby 1.8.7. The gems it's using include:<br />
<br />
*Formtastic<br />
*Sunspot<br />
*A customized version of acts_as_taggable_on<br />
*ancestry<br />
<br />
It has a rich jquery / ajax interface and is around 15 engineering hours away from an early release. The source code repository lives here ( link to https://github.com/berkmancenter/cohort_ng ).<br />
<br />
Some possible directions include:<br />
*Build a well modelled custom field system<br />
*Build a rich de-duplication system<br />
*Finish the mailing system from the Rails 2 version of cohort<br />
*Create a lightweight "custom form" system that allows third parties to add and/or remove themselves from tags inside Cohort in a secure fashion.<br />
*Build out a bulletproof suite of tests (rspec?) that test thoroughly against both Postgresql and MySQL</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Chilling_Effects&diff=643Chilling Effects2019-03-18T14:51:12Z<p>BerkmanSysop: old project</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
Chilling Effects is a website, database and research project studying cease and desist letters concerning online content. Our goals are to conduct and facilitate research on the notices, to educate the public about the different kinds of cease and desist letters--both legitimate and questionable--that are being sent to Internet publishers, and to provide as much transparency as possible about the “ecology” of such notices, in terms of who is sending them and why, and to what effect. <br />
<br />
More information: https://www.chillingeffects.org/.<br />
<br />
GitHub repo: https://github.com/berkmancenter/chillingeffects/.<br />
<br />
=== Ideal candidate criteria ===<br />
Chilling Effects is interested in candidates with coding skills to help Berkman developers work on and improve the project's website and database. An ideal candidate for will have experience with Ruby and/or Ruby On Rails or experience with other MVC frameworks, postgreSQL, and elasticsearch and/or Solr. Experience with large data sets, visualization libraries and/or continuous integration and test suites a plus.<br />
<br />
Example sub-projects include:<br />
<br />
*a bulk action tool for admins<br />
*a CMS for admins<br />
*automated redaction tools<br />
*improving search for users<br />
*expanding admin filter functions</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Book-a-Nook&diff=642Book-a-Nook2019-03-18T14:50:52Z<p>BerkmanSysop: old project</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
Book a Nook is an online tool to activate community spaces, with a particular focus on libraries. It’s approach is differentiated from similar tools in the following ways:<br />
* Networks spaces: Supports searching across libraries / systems<br />
* Data for advocacy and evaluation: Aggregates reservation data to inform space usage, advocacy, and experimentation, while respecting patrons’ privacy.<br />
* Connection: Provides an open API so that libraries can better integrate their resources with online organizational platforms (e.g. Meetup, Eventbrite)<br />
<br />
The project aims to expand libraries' digital presence and to deepen their integration within an online ecosystem.<br />
<br />
Github repo: [http://github.com/berkmancenter/bookanook http://github.com/berkmancenter/bookanook]<br />
<br />
More information about Book a Nook can be found on the [https://github.com/berkmancenter/bookanook/wiki github wiki].<br />
<br />
<br />
==Potential summer projects:==<br />
An initial round of development was recently completed providing core functionality. We’re currently looking for a developer to both refine core user experience as well as support creative use of the tool. Below are some development features listed in order of priority to the project. Each link points to a specific issue on the project's gitHub page.<br />
<br />
===High priority===<br />
*[https://github.com/berkmancenter/bookanook/issues/19 Different views for users in different timezones]<br />
*[https://github.com/berkmancenter/bookanook/issues/42 Integrate with Google Calendar API]<br />
*[https://github.com/berkmancenter/bookanook/issues/14 Prevent confirmation of conflicting reservations]<br />
*[https://github.com/berkmancenter/bookanook/issues/177 Allow admin to upload Multiple images for Nooks]<br />
*[https://github.com/berkmancenter/bookanook/issues/172 Allow users to control which types of emails they receive] and <br />
[https://github.com/berkmancenter/bookanook/issues/171 Add unsubscribe link to all emails]<br />
*[https://github.com/berkmancenter/bookanook/issues/123 Add nook search facility based on max/min capacity]<br />
*[https://github.com/berkmancenter/bookanook/issues/62 Differentiation between study room/event reservations]<br />
<br />
===Medium priority===<br />
*[https://github.com/berkmancenter/bookanook/issues/55 Allow user to favourite Nooks and locations]<br />
*[https://github.com/berkmancenter/bookanook/issues/50 Providing reason for rejecting a reservation request]<br />
*[https://github.com/berkmancenter/bookanook/issues/47 Provide Library's emergency contact info]<br />
*[https://github.com/berkmancenter/bookanook/issues/40 Use Google Maps API to show nearby libraries]<br />
<br />
===Low priority===<br />
*[https://github.com/berkmancenter/bookanook/issues/81 Create admin page to edit room policy]<br />
*[https://github.com/berkmancenter/bookanook/issues/77 Reservation calendar: include holidays?]<br />
*[https://github.com/berkmancenter/bookanook/issues/67 Enable patron reporting of empty search]<br />
*[https://github.com/berkmancenter/bookanook/issues/36 Add user reviews and comments on Libraries Page]<br />
*[https://github.com/berkmancenter/bookanook/issues/29 Add User details Page]<br />
*[https://github.com/berkmancenter/bookanook/issues/170 Allow Patron/Admin to request for repeatable event]</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Application_Tracker&diff=641Application Tracker2019-03-18T14:50:42Z<p>BerkmanSysop: old project</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
Basic tool requirements:<br />
<br />
General<br />
<br />
*Create a database of profiles for people interested in applying to open Berkman positions,<br />
*Accept application materials – CVs, cover letters, other various attachments/files including links and multimedia.<br />
<br />
Administrators / decision makers<br />
<br />
*Can manage job listings,<br />
*Have the ability to report on candidates and the materials they've submitted,<br />
*Can classify and download all of a certain kind of file (eg: all CVs, or all proposals, or all letters of recommendation, etc.),<br />
*Are able to download all materials for a specific candidate (eg: get all of John Smith's application materials),<br />
*Have the ability to send responses to applicants individually and as a group (eg: when a position has been filled, email all other applicants of the change in status),<br />
*A basic "workflow" to track decisions and where an applicant stands in the process.<br />
<br />
The applicant<br />
<br />
*Can manage their profile and complete it incrementally,<br />
*Can apply to multiple positions and tailor their application/letters/files to each application,<br />
*Allow non-applicants – primarily people submitting letters of recommendation on an applicant’s behalf – to submit materials that would link to the applicant’s application<br />
<br />
==More info of a more technical bent==<br />
<br />
We want this application to work perfectly on both Postgres and MySQL.<br />
<br />
We'd probably have these models, minimally:<br />
* '''User''' - has many Jobs. This keeps track of administrators. Ideally it'd be able to use pluggable authentication - specifically LDAP.<br />
* '''Applicant'''<br />
* '''JobApplicationStatus''' - A basic workflow allowing Users to accept, reject, and move JobApplications through a set of stages. . .<br />
* '''Job''' - A job, with file attachments.<br />
* '''JobApplicationFileRequirements''' - Allows a User to define what JobApplicationFiles are required for an Applicant to apply.<br />
* '''JobFile''' A job posting in PDF format. A background document. A white paper. <br />
* '''JobFileCategory'''<br />
* '''JobCategory'''<br />
* '''JobMessages''' - A message to be sent via html and plain text multipart email to an applicant.<br />
* '''JobMessageType''' - Defines a set of "triggers" and relates to a JobMessage. This would allow for a set of customized messages per job and at points relating to an applicant. . . the "thanks for applying" message, the "sorry, job is filled" message, etc.<br />
* '''JobApplication''' - Relates jobs and applicants.<br />
* '''JobApplicationFile''' - A file submitted along with an JobApplicant by an Applicant. A customized CV, a letter of recommendation, a tarball of sample code, etc.<br />
* '''JobApplicationFileCategory''' Categories could probably be a single polymorphic model using a tree model for organization.<br />
* '''JobReference'''<br />
<br />
In narrative -<br />
* A User has many Jobs.<br />
* A Job has many categorized JobFiles and is organized in (at least one) JobCategory. A Job also has JobApplicationFileRequirements.<br />
* An Applicant has many JobApplications.<br />
* A JobApplication has many categorized JobFiles<br />
* A Job has many JobMessages, invoked by the backend at various stages according to their JobMessageType.<br />
* A JobApplication has a JobApplicationStatus.<br />
* An Applicant has many JobReferences. These JobReferences will have an email address, allowing the reference to authenticate and create a JobFile with a "letter of reference" category. We might consider relating this model to JobApplications - we haven't through it through yet.<br />
<br />
And then we'd want a set of administrator-level tools to do things like:<br />
* get all the CVs for a job application,<br />
* get all the letters of recommendation,<br />
* fill (or close) a job position and auto-email all the applicants,<br />
* Deactivate all the JobApplications for an Applicant if we're just not<br />
interested in them,<br />
* Send custom messages to job applicants.<br />
<br />
==Wrapping Up==<br />
<br />
We're expecting that the GSoC participant would help us define a scope/requirements doc to the level of detail they need to be productive - it doesn't have to be spelled out in exquisite detail in your application.<br />
<br />
We envision this being written in Rails, and are open to it being a plugin to an existing application (eg. Redmine) or a standalone application altogether. We are not entirely beholden to Rails as the implementation language, though we do have a strong preference for it. Put it this way: If we do not get a qualified applicant to build the application in Rails, then we're willing to consider building it in something else.<br />
<br />
Your ideas are GREATLY appreciated!</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=APN_on_rails_Gem_Upgrade&diff=640APN on rails Gem Upgrade2019-03-18T14:50:32Z<p>BerkmanSysop: old project</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
Upgrade the Ruby on Rails gem APN_on_rails that makes it easy to integrate the server side of Apple Push Notifications into a Rails application. The gem is currently actively being used by many applications that manage push notifications for iPhone apps. <br />
<br />
Code on Github: https://github.com/PRX/apn_on_rails<br />
<br />
The Project: <br />
<br />
*make the gem Rails 3 compliant<br />
*add support for management of Apple's In-App Purchasing<br />
*address existing bugs and make other feature improvements</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Digital_Public_Library_of_America&diff=639Digital Public Library of America2019-03-18T14:49:02Z<p>BerkmanSysop: old project template</p>
<hr />
<div>{{Template:Oldproject}}<br />
<br />
== Structured Site Scraper: From site to collection ==<br />
<br />
Many local libraries, historical societies, and cultural groups have created web sites displaying collections of digitized photos, scanned documents, oral histories, audio files, etc. Frequently these local treasures are on sites designed purely with end-user browsing in mind. They would be far more useful if they were more widely searchable and browsable. The team developing the Digital Public Library of America's software platform -- a metadata server -- would like to be able to gather metadata about such sites, discovering the heritage items they point to, capturing as much of the explicit metadata as possible (captions, labels, etc.), and using the structure of the site as a heuristic for parsing the collection's structure. This metadata would then be assimilated into the appropriate schema and would be imported into the DPLA's meta-catalog. The local curators would first be shown the data as parsed so they can make corrections to the content and structure. In addition, a site map would be generated for the local curators.<br />
<br />
<br />
Mentors: [mailto:self@evident.com self@evident.com], [mailto:jclark@cyber.law.harvard.edu jclark@cyber.law.harvard.edu]<br />
<br />
General Questions: [mailto:berkmancenterharvard@gmail.com berkmancenterharvard@gmail.com]<br />
<br />
<br />
==Library Item Matching Service==<br />
<br />
The Digital Public Library of America software platform is gathering<br />
metadata about items in collections in libraries, museums, archives,<br />
and online cultural collections. Many of these items have identifiers<br />
in various standard namespaces such as ISBN numbers, OCLC identifiers,<br />
and Open Library IDs. The DPLA platform would like to offer a service<br />
through its API by which developers could query with the information<br />
they have about a particular item and have returned to them any or all<br />
of the identifiers known to the DPLA. If the developer has one of the<br />
standard IDs, then it will just take a table lookup to find the<br />
others, although this might require accessing the API of other such<br />
services, such as OCLC.org's. The problem becomes more difficult when<br />
the query does not include an identification number, but does include<br />
other metadata such as author, title, publisher, year, etc. Then the<br />
matching will be probabilistic since records often vary in these<br />
details, or are incomplete, or the query may contain errors or<br />
variations. This project would consist of building a useful service<br />
that takes all this into account and returns results along with a<br />
numeric expression of the degree of confidence the system has in the<br />
results.<br />
<br />
Find more information about the Digital Public Library of America at: [http://dp.la/ dp.la]<br />
<br />
<br />
Mentor: [mailto:mphillips@law.harvard.edu mphillips@law.harvard.edu]<br />
<br />
General Questions: [mailto:berkmancenterharvard@gmail.com berkmancenterharvard@gmail.com]</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Dotplot&diff=638Dotplot2019-03-18T14:47:16Z<p>BerkmanSysop: switching to jsd temporarily</p>
<hr />
<div>Technical mentor: [mailto:jsd@cyber.harvard.edu jsd@cyber.harvard.edu]<br />
<br />
Dotplot is a [https://i.imgur.com/hePKkim.gifv visualization] that allows one to tell a story about data. Our primary use case is mapping a diverse network of people in a playful and interactive way. All too often, the only way to get a sense for the attendees of an event or the members of a distributed network is to do a significant amount of searching for resumes and background information. This research takes excessive time and effort, and it makes what should be an exciting and fun--meeting new people and finding potential collaborators--into a chore, something to be avoided or put off.<br />
<br />
This is where dotplot intervenes, bringing a sense of joy and discovery back into the process and helping to facilitate even more discoveries of commonalities and uniqueness among group members. This tool takes a spreadsheet of information about individuals gathered through survey questions and generates a dynamic visualization of the responses.<br />
<br />
Examples of dotplot in action: http://brk.mn/dcviz, http://brk.mn/casviz<br />
<br />
GitHub repo: https://github.com/berkmancenter/dotplot <br />
<br />
===Ideal candidate criteria===<br />
dotplot is interested in developers to improve both typical user experience and creative uses of the tool. Currently, dotplot requires specific backend programming to input spreadsheet information, but we would like it to be a tool available for conference organizers and event hosts to use without difficulty or extensive programming experience. The ideal candidate has experience with JavaScript, D3.js, HTML/CSS/SVG, and a server-side programming language of the student’s choice (Javascript, Ruby, PHP, Python).<br />
<br />
Suggested sub-projects include:<br />
*Design and build a system for collecting structured data from users<br />
*Connect the collected data to the existing JS visualization<br />
*Expand and improve upon the existing JS visualization<br />
*Bring creative and thoughtful ideas about potential enhancements<br />
*Ingest data from Google Sheets<br />
<br />
Other potential sub-projects include:<br />
*Easy to use, interactive interface for data input and manipulation<br />
*Offer users choices for how the data is visualized by survey and by question:<br />
**Display locations on a map<br />
**Track a specific respondent through multiple questions (in progress)<br />
**See the relationship between multiple questions using color, space, etc.<br />
**Zoom in and out on specific clusters<br />
**Hover to reveal more information about respondents or clusters (in progress)</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Main_Page&diff=637Main Page2019-03-01T19:32:37Z<p>BerkmanSysop: /* Project Opportunities */</p>
<hr />
<div><div class="width90">[[Image:GSoC2016Logo.jpg|class=responsive]]</div><br />
<br />
<br />
== Welcome to Berkman Klein Center Google Summer of Code 2019==<br />
<br />
===Who are we?===<br />
The [https://cyber.harvard.edu/ Berkman Klein Center for Internet & Society at Harvard University] was founded to explore cyberspace, share in its study, and help pioneer its development. We represent a network of faculty, students, fellows, entrepreneurs, lawyers, and virtual architects working to identify and engage with the challenges and opportunities of cyberspace.<br />
<br />
We [https://cyber.harvard.edu/projects-tools investigate] the real and possible boundaries in cyberspace between open and closed systems of code, of commerce, of governance, and of education, and the relationship of law to each. We do this through active rather than passive research, believing that the best way to understand cyberspace is to actually build out into it.<br />
<br />
Our [https://cyber.harvard.edu/people?field_role_value=Faculty+Associate&field_related_topics_target_id=All faculty], [https://cyber.harvard.edu/people?field_role_value=Fellow&field_related_topics_target_id=All fellows], students, and affiliates engage with a wide spectrum of Net issues, including governance, privacy, intellectual property, antitrust, content control, and electronic commerce. Our diverse research interests cohere in a common understanding of the Internet as a social and political space where constraints upon inhabitants are determined not only through the traditional application of law, but, more subtly, through technical architecture ("code").<br />
<br />
As part of our active research mission, we build, use, and freely share open software platforms for free [https://cyber.harvard.edu/events online lectures and discussions]. We also sponsor gatherings, ranging from informal lunches to international conferences, that bring together members of our diverse network of participants to swap insights – and sometimes barbs – as they stake out their respective visions for what the Net can become. We also [https://cyber.harvard.edu/education teach], seeking out online and global opportunities, as well as supporting the traditional Harvard Law School curriculum, through our [https://cyber.harvard.edu/teaching/cyberlawclinic Cyberlaw Clinic], and in conjunction with other Harvard schools and MIT.<br />
<br />
Read more about the Berkman Klein Center at [https://cyber.harvard.edu our homepage].<br />
<br />
<br />
==Project Opportunities and Ideas==<br />
There are several GSoC 2019 projects at the Berkman Klein Center. We also recommend checking back here through out the GSoC "shopping" and application period and ideas may be refined and updated. We hope that this will provide transparency into our thinking.<br />
<br />
===[[Media Cloud]]===<br />
[https://mediacloud.org/ Media Cloud] ([https://github.com/berkmancenter/mediacloud github]) is an open source platform for studying media ecosystems, run jointly by the Berkman Klein Center at Harvard University and the [https://www.media.mit.edu/groups/civic-media/overview/ Center for Civic Media] at the MIT Media Lab. <br />
<br />
===[[Lumen]]===<br />
The [https://lumendatabase.org/ Lumen Database] is an archive of requests for removal of online content. This lets lawyers, journalists, and the general public study threats to speech online and understand their rights. ([https://github.com/berkmancenter/lumendatabase github]).<br />
<br />
===[[Ayanda]]===<br />
Ayanda is an Open Source Android Library that makes it easy to discover nearby devices and share files through a simple API.<br />
<br />
===[[Question Tool]]===<br />
A tool for asking and voting on questions during events or classes ([https://github.com/berkmancenter/question_tool github]). Written in Javascript using the Meteor.js framework.<br />
<br />
===[[Dotplot]]===<br />
Dotplot is a D3-based visualization tool that lets you tell stories about data ([https://github.com/berkmancenter/dotplot github]).<br />
<br />
==What is GSoC?==<br />
It's where you spend your summer writing code for awesome open source projects: <br />
<blockquote>Google Summer of Code is a global program focused on introducing students to open source software development. Students work on a 3 month programming project with an open source organization during their break from university.<br />
<br />
Since its inception in 2005, the program has brought together almost 11,000 student participants and 10,000 mentors from over 113 countries worldwide. Google Summer of Code has produced over 50 million lines of code for 515 open source organizations.<br />
<br />
As a part of Google Summer of Code, student participants are paired with a mentor from the participating organizations, gaining exposure to real-world software development and techniques. Students have the opportunity to spend the break between their school semesters earning a stipend while working in areas related to their interests.<br />
<br />
In turn, the participating organizations are able to identify and bring in new developers who implement new features and hopefully continue to contribute to open source even after the program is over. Most importantly, more code is created and released for the use and benefit of all</blockquote><br />
<br />
The [https://summerofcode.withgoogle.com official GSoC homepage] describes how it works and what it involves.<br />
<br />
The [https://en.wikipedia.org/wiki/Google_Summer_of_Code GSoC Wikipedia entry] also includes some interesting background information.<br />
<br />
<br />
==How to Apply==<br />
Applications open March 25, 2019 at 13:00 (EDT) / 18:00 (UTC). You must submit your application via GSoC: https://summerofcode.withgoogle.com/get-started. We will not be able to accept or process any application in any other way. Please use the below application template when submitting your application.<br />
<br />
All proposals must be submitted by April 9, 2019 at 13:00 (EDT) / 18:00 (UTC).<br />
<br />
===Application Template===<br />
[[Application_Template|Berkman Klein Application template for GSoC.]] This is the preferred template for submitting your application to work on a Berkman Klein Center project.<br />
<br />
==Contact Us==<br />
We prefer email, though we also run an IRC channel.<br><br />
'''Email:''' [mailto:gsoc@cyber.harvard.edu gsoc@cyber.harvard.edu] <br><br />
<br />
==FAQ==<br />
[[GSoC_FAQ|Answers to commonly asked questions.]] This includes a set of requirements around working hours, who can apply, and other commitments you might have for the summer. Please read!</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Main_Page&diff=636Main Page2019-03-01T19:27:17Z<p>BerkmanSysop: /* Project Opportunities */</p>
<hr />
<div><div class="width90">[[Image:GSoC2016Logo.jpg|class=responsive]]</div><br />
<br />
<br />
== Welcome to Berkman Klein Center Google Summer of Code 2019==<br />
<br />
===Who are we?===<br />
The [https://cyber.harvard.edu/ Berkman Klein Center for Internet & Society at Harvard University] was founded to explore cyberspace, share in its study, and help pioneer its development. We represent a network of faculty, students, fellows, entrepreneurs, lawyers, and virtual architects working to identify and engage with the challenges and opportunities of cyberspace.<br />
<br />
We [https://cyber.harvard.edu/projects-tools investigate] the real and possible boundaries in cyberspace between open and closed systems of code, of commerce, of governance, and of education, and the relationship of law to each. We do this through active rather than passive research, believing that the best way to understand cyberspace is to actually build out into it.<br />
<br />
Our [https://cyber.harvard.edu/people?field_role_value=Faculty+Associate&field_related_topics_target_id=All faculty], [https://cyber.harvard.edu/people?field_role_value=Fellow&field_related_topics_target_id=All fellows], students, and affiliates engage with a wide spectrum of Net issues, including governance, privacy, intellectual property, antitrust, content control, and electronic commerce. Our diverse research interests cohere in a common understanding of the Internet as a social and political space where constraints upon inhabitants are determined not only through the traditional application of law, but, more subtly, through technical architecture ("code").<br />
<br />
As part of our active research mission, we build, use, and freely share open software platforms for free [https://cyber.harvard.edu/events online lectures and discussions]. We also sponsor gatherings, ranging from informal lunches to international conferences, that bring together members of our diverse network of participants to swap insights – and sometimes barbs – as they stake out their respective visions for what the Net can become. We also [https://cyber.harvard.edu/education teach], seeking out online and global opportunities, as well as supporting the traditional Harvard Law School curriculum, through our [https://cyber.harvard.edu/teaching/cyberlawclinic Cyberlaw Clinic], and in conjunction with other Harvard schools and MIT.<br />
<br />
Read more about the Berkman Klein Center at [https://cyber.harvard.edu our homepage].<br />
<br />
<br />
==Project Opportunities==<br />
There are several GSoC 2019 projects at the Berkman Klein Center. We also recommend checking back here through out the GSoC "shopping" and application period and ideas may be refined and updated. We hope that this will provide transparency into our thinking.<br />
<br />
===[[Media Cloud]]===<br />
[https://mediacloud.org/ Media Cloud] ([https://github.com/berkmancenter/mediacloud github]) is an open source platform for studying media ecosystems, run jointly by the Berkman Klein Center at Harvard University and the [https://www.media.mit.edu/groups/civic-media/overview/ Center for Civic Media] at the MIT Media Lab. <br />
<br />
===[[Lumen]]===<br />
The [https://lumendatabase.org/ Lumen Database] is an archive of requests for removal of online content. This lets lawyers, journalists, and the general public study threats to speech online and understand their rights. ([https://github.com/berkmancenter/lumendatabase github]).<br />
<br />
===[[Ayanda]]===<br />
Ayanda is an Open Source Android Library that makes it easy to discover nearby devices and share files through a simple API.<br />
<br />
===[[Question Tool]]===<br />
A tool for asking and voting on questions during events or classes ([https://github.com/berkmancenter/question_tool github]). Written in Javascript using the Meteor.js framework.<br />
<br />
===[[Dotplot]]===<br />
Dotplot is a D3-based visualization tool that lets you tell stories about data ([https://github.com/berkmancenter/dotplot github]).<br />
<br />
==What is GSoC?==<br />
It's where you spend your summer writing code for awesome open source projects: <br />
<blockquote>Google Summer of Code is a global program focused on introducing students to open source software development. Students work on a 3 month programming project with an open source organization during their break from university.<br />
<br />
Since its inception in 2005, the program has brought together almost 11,000 student participants and 10,000 mentors from over 113 countries worldwide. Google Summer of Code has produced over 50 million lines of code for 515 open source organizations.<br />
<br />
As a part of Google Summer of Code, student participants are paired with a mentor from the participating organizations, gaining exposure to real-world software development and techniques. Students have the opportunity to spend the break between their school semesters earning a stipend while working in areas related to their interests.<br />
<br />
In turn, the participating organizations are able to identify and bring in new developers who implement new features and hopefully continue to contribute to open source even after the program is over. Most importantly, more code is created and released for the use and benefit of all</blockquote><br />
<br />
The [https://summerofcode.withgoogle.com official GSoC homepage] describes how it works and what it involves.<br />
<br />
The [https://en.wikipedia.org/wiki/Google_Summer_of_Code GSoC Wikipedia entry] also includes some interesting background information.<br />
<br />
<br />
==How to Apply==<br />
Applications open March 25, 2019 at 13:00 (EDT) / 18:00 (UTC). You must submit your application via GSoC: https://summerofcode.withgoogle.com/get-started. We will not be able to accept or process any application in any other way. Please use the below application template when submitting your application.<br />
<br />
All proposals must be submitted by April 9, 2019 at 13:00 (EDT) / 18:00 (UTC).<br />
<br />
===Application Template===<br />
[[Application_Template|Berkman Klein Application template for GSoC.]] This is the preferred template for submitting your application to work on a Berkman Klein Center project.<br />
<br />
==Contact Us==<br />
We prefer email, though we also run an IRC channel.<br><br />
'''Email:''' [mailto:gsoc@cyber.harvard.edu gsoc@cyber.harvard.edu] <br><br />
<br />
==FAQ==<br />
[[GSoC_FAQ|Answers to commonly asked questions.]] This includes a set of requirements around working hours, who can apply, and other commitments you might have for the summer. Please read!</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Main_Page&diff=635Main Page2019-03-01T19:20:55Z<p>BerkmanSysop: </p>
<hr />
<div><div class="width90">[[Image:GSoC2016Logo.jpg|class=responsive]]</div><br />
<br />
<br />
== Welcome to Berkman Klein Center Google Summer of Code 2019==<br />
<br />
===Who are we?===<br />
The [https://cyber.harvard.edu/ Berkman Klein Center for Internet & Society at Harvard University] was founded to explore cyberspace, share in its study, and help pioneer its development. We represent a network of faculty, students, fellows, entrepreneurs, lawyers, and virtual architects working to identify and engage with the challenges and opportunities of cyberspace.<br />
<br />
We [https://cyber.harvard.edu/projects-tools investigate] the real and possible boundaries in cyberspace between open and closed systems of code, of commerce, of governance, and of education, and the relationship of law to each. We do this through active rather than passive research, believing that the best way to understand cyberspace is to actually build out into it.<br />
<br />
Our [https://cyber.harvard.edu/people?field_role_value=Faculty+Associate&field_related_topics_target_id=All faculty], [https://cyber.harvard.edu/people?field_role_value=Fellow&field_related_topics_target_id=All fellows], students, and affiliates engage with a wide spectrum of Net issues, including governance, privacy, intellectual property, antitrust, content control, and electronic commerce. Our diverse research interests cohere in a common understanding of the Internet as a social and political space where constraints upon inhabitants are determined not only through the traditional application of law, but, more subtly, through technical architecture ("code").<br />
<br />
As part of our active research mission, we build, use, and freely share open software platforms for free [https://cyber.harvard.edu/events online lectures and discussions]. We also sponsor gatherings, ranging from informal lunches to international conferences, that bring together members of our diverse network of participants to swap insights – and sometimes barbs – as they stake out their respective visions for what the Net can become. We also [https://cyber.harvard.edu/education teach], seeking out online and global opportunities, as well as supporting the traditional Harvard Law School curriculum, through our [https://cyber.harvard.edu/teaching/cyberlawclinic Cyberlaw Clinic], and in conjunction with other Harvard schools and MIT.<br />
<br />
Read more about the Berkman Klein Center at [https://cyber.harvard.edu our homepage].<br />
<br />
<br />
==Project Opportunities==<br />
There are several GSoC 2019 projects at the Berkman Klein Center:<br />
<br />
===[[Media Cloud]]===<br />
[https://mediacloud.org/ Media Cloud] ([https://github.com/berkmancenter/mediacloud github]) is an open source platform for studying media ecosystems, run jointly by the Berkman Klein Center at Harvard University and the [https://www.media.mit.edu/groups/civic-media/overview/ Center for Civic Media] at the MIT Media Lab. <br />
<br />
===[[Lumen]]===<br />
The [https://lumendatabase.org/ Lumen Database] is an archive of requests for removal of online content. This lets lawyers, journalists, and the general public study threats to speech online and understand their rights. ([https://github.com/berkmancenter/lumendatabase github]).<br />
<br />
===[[Ayanda]]===<br />
Ayanda is an Open Source Android Library that makes it easy to discover nearby devices and share files through a simple API.<br />
<br />
===[[Question Tool]]===<br />
A tool for asking and voting on questions during events or classes ([https://github.com/berkmancenter/question_tool github]). Written in Javascript using the Meteor.js framework.<br />
<br />
===[[Dotplot]]===<br />
Dotplot is a D3-based visualization tool that lets you tell stories about data ([https://github.com/berkmancenter/dotplot github]).<br />
<br />
<br />
==What is GSoC?==<br />
It's where you spend your summer writing code for awesome open source projects: <br />
<blockquote>Google Summer of Code is a global program focused on introducing students to open source software development. Students work on a 3 month programming project with an open source organization during their break from university.<br />
<br />
Since its inception in 2005, the program has brought together almost 11,000 student participants and 10,000 mentors from over 113 countries worldwide. Google Summer of Code has produced over 50 million lines of code for 515 open source organizations.<br />
<br />
As a part of Google Summer of Code, student participants are paired with a mentor from the participating organizations, gaining exposure to real-world software development and techniques. Students have the opportunity to spend the break between their school semesters earning a stipend while working in areas related to their interests.<br />
<br />
In turn, the participating organizations are able to identify and bring in new developers who implement new features and hopefully continue to contribute to open source even after the program is over. Most importantly, more code is created and released for the use and benefit of all</blockquote><br />
<br />
The [https://summerofcode.withgoogle.com official GSoC homepage] describes how it works and what it involves.<br />
<br />
The [https://en.wikipedia.org/wiki/Google_Summer_of_Code GSoC Wikipedia entry] also includes some interesting background information.<br />
<br />
<br />
==How to Apply==<br />
Applications open March 25, 2019 at 13:00 (EDT) / 18:00 (UTC). You must submit your application via GSoC: https://summerofcode.withgoogle.com/get-started. We will not be able to accept or process any application in any other way. Please use the below application template when submitting your application.<br />
<br />
All proposals must be submitted by April 9, 2019 at 13:00 (EDT) / 18:00 (UTC).<br />
<br />
===Application Template===<br />
[[Application_Template|Berkman Klein Application template for GSoC.]] This is the preferred template for submitting your application to work on a Berkman Klein Center project.<br />
<br />
==Contact Us==<br />
We prefer email, though we also run an IRC channel.<br><br />
'''Email:''' [mailto:gsoc@cyber.harvard.edu gsoc@cyber.harvard.edu] <br><br />
<br />
==FAQ==<br />
[[GSoC_FAQ|Answers to commonly asked questions.]] This includes a set of requirements around working hours, who can apply, and other commitments you might have for the summer. Please read!</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Main_Page&diff=634Main Page2019-03-01T19:20:21Z<p>BerkmanSysop: </p>
<hr />
<div><div class="width90">[[Image:GSoC2016Logo.jpg|class=responsive]]</div><br />
<br />
<br />
== Welcome to Berkman Klein Center Google Summer of Code 2019==<br />
<br />
==Who are we?==<br />
The [https://cyber.harvard.edu/ Berkman Klein Center for Internet & Society at Harvard University] was founded to explore cyberspace, share in its study, and help pioneer its development. We represent a network of faculty, students, fellows, entrepreneurs, lawyers, and virtual architects working to identify and engage with the challenges and opportunities of cyberspace.<br />
<br />
We [https://cyber.harvard.edu/projects-tools investigate] the real and possible boundaries in cyberspace between open and closed systems of code, of commerce, of governance, and of education, and the relationship of law to each. We do this through active rather than passive research, believing that the best way to understand cyberspace is to actually build out into it.<br />
<br />
Our [https://cyber.harvard.edu/people?field_role_value=Faculty+Associate&field_related_topics_target_id=All faculty], [https://cyber.harvard.edu/people?field_role_value=Fellow&field_related_topics_target_id=All fellows], students, and affiliates engage with a wide spectrum of Net issues, including governance, privacy, intellectual property, antitrust, content control, and electronic commerce. Our diverse research interests cohere in a common understanding of the Internet as a social and political space where constraints upon inhabitants are determined not only through the traditional application of law, but, more subtly, through technical architecture ("code").<br />
<br />
As part of our active research mission, we build, use, and freely share open software platforms for free [https://cyber.harvard.edu/events online lectures and discussions]. We also sponsor gatherings, ranging from informal lunches to international conferences, that bring together members of our diverse network of participants to swap insights – and sometimes barbs – as they stake out their respective visions for what the Net can become. We also [https://cyber.harvard.edu/education teach], seeking out online and global opportunities, as well as supporting the traditional Harvard Law School curriculum, through our [https://cyber.harvard.edu/teaching/cyberlawclinic Cyberlaw Clinic], and in conjunction with other Harvard schools and MIT.<br />
<br />
Read more about the Berkman Klein Center at [https://cyber.harvard.edu our homepage].<br />
<br />
<br />
==Project Opportunities==<br />
There are several GSoC 2019 projects at the Berkman Klein Center:<br />
<br />
===[[Media Cloud]]===<br />
[https://mediacloud.org/ Media Cloud] ([https://github.com/berkmancenter/mediacloud github]) is an open source platform for studying media ecosystems, run jointly by the Berkman Klein Center at Harvard University and the [https://www.media.mit.edu/groups/civic-media/overview/ Center for Civic Media] at the MIT Media Lab. <br />
<br />
===[[Lumen]]===<br />
The [https://lumendatabase.org/ Lumen Database] is an archive of requests for removal of online content. This lets lawyers, journalists, and the general public study threats to speech online and understand their rights. ([https://github.com/berkmancenter/lumendatabase github]).<br />
<br />
===[[Ayanda]]===<br />
Ayanda is an Open Source Android Library that makes it easy to discover nearby devices and share files through a simple API.<br />
<br />
===[[Question Tool]]===<br />
A tool for asking and voting on questions during events or classes ([https://github.com/berkmancenter/question_tool github]). Written in Javascript using the Meteor.js framework.<br />
<br />
===[[Dotplot]]===<br />
Dotplot is a D3-based visualization tool that lets you tell stories about data ([https://github.com/berkmancenter/dotplot github]).<br />
<br />
<br />
==What is GSoC?==<br />
It's where you spend your summer writing code for awesome open source projects: <br />
<blockquote>Google Summer of Code is a global program focused on introducing students to open source software development. Students work on a 3 month programming project with an open source organization during their break from university.<br />
<br />
Since its inception in 2005, the program has brought together almost 11,000 student participants and 10,000 mentors from over 113 countries worldwide. Google Summer of Code has produced over 50 million lines of code for 515 open source organizations.<br />
<br />
As a part of Google Summer of Code, student participants are paired with a mentor from the participating organizations, gaining exposure to real-world software development and techniques. Students have the opportunity to spend the break between their school semesters earning a stipend while working in areas related to their interests.<br />
<br />
In turn, the participating organizations are able to identify and bring in new developers who implement new features and hopefully continue to contribute to open source even after the program is over. Most importantly, more code is created and released for the use and benefit of all</blockquote><br />
<br />
The [https://summerofcode.withgoogle.com official GSoC homepage] describes how it works and what it involves.<br />
<br />
The [https://en.wikipedia.org/wiki/Google_Summer_of_Code GSoC Wikipedia entry] also includes some interesting background information.<br />
<br />
<br />
==How to Apply==<br />
Applications open March 25, 2019 at 13:00 (EDT) / 18:00 (UTC). You must submit your application via GSoC: https://summerofcode.withgoogle.com/get-started. We will not be able to accept or process any application in any other way. Please use the below application template when submitting your application.<br />
<br />
All proposals must be submitted by April 9, 2019 at 13:00 (EDT) / 18:00 (UTC).<br />
<br />
===Application Template===<br />
[[Application_Template|Berkman Klein Application template for GSoC.]] This is the preferred template for submitting your application to work on a Berkman Klein Center project.<br />
<br />
==Contact Us==<br />
We prefer email, though we also run an IRC channel.<br><br />
'''Email:''' [mailto:gsoc@cyber.harvard.edu gsoc@cyber.harvard.edu] <br><br />
<br />
==FAQ==<br />
[[GSoC_FAQ|Answers to commonly asked questions.]] This includes a set of requirements around working hours, who can apply, and other commitments you might have for the summer. Please read!</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Main_Page&diff=633Main Page2019-03-01T19:19:37Z<p>BerkmanSysop: </p>
<hr />
<div><div class="width90">[[Image:GSoC2016Logo.jpg|class=responsive]]</div><br />
<br />
<br />
== Welcome to Berkman Klein Center Google Summer of Code 2019==<br />
<br />
==Project Opportunities==<br />
There are several GSoC 2019 projects at the Berkman Klein Center:<br />
<br />
===[[Media Cloud]]===<br />
[https://mediacloud.org/ Media Cloud] ([https://github.com/berkmancenter/mediacloud github]) is an open source platform for studying media ecosystems, run jointly by the Berkman Klein Center at Harvard University and the [https://www.media.mit.edu/groups/civic-media/overview/ Center for Civic Media] at the MIT Media Lab. <br />
<br />
===[[Lumen]]===<br />
The [https://lumendatabase.org/ Lumen Database] is an archive of requests for removal of online content. This lets lawyers, journalists, and the general public study threats to speech online and understand their rights. ([https://github.com/berkmancenter/lumendatabase github]).<br />
<br />
===[[Ayanda]]===<br />
Ayanda is an Open Source Android Library that makes it easy to discover nearby devices and share files through a simple API.<br />
<br />
===[[Question Tool]]===<br />
A tool for asking and voting on questions during events or classes ([https://github.com/berkmancenter/question_tool github]). Written in Javascript using the Meteor.js framework.<br />
<br />
===[[Dotplot]]===<br />
Dotplot is a D3-based visualization tool that lets you tell stories about data ([https://github.com/berkmancenter/dotplot github]).<br />
<br />
<br />
==What is GSoC?==<br />
It's where you spend your summer writing code for awesome open source projects: <br />
<blockquote>Google Summer of Code is a global program focused on introducing students to open source software development. Students work on a 3 month programming project with an open source organization during their break from university.<br />
<br />
Since its inception in 2005, the program has brought together almost 11,000 student participants and 10,000 mentors from over 113 countries worldwide. Google Summer of Code has produced over 50 million lines of code for 515 open source organizations.<br />
<br />
As a part of Google Summer of Code, student participants are paired with a mentor from the participating organizations, gaining exposure to real-world software development and techniques. Students have the opportunity to spend the break between their school semesters earning a stipend while working in areas related to their interests.<br />
<br />
In turn, the participating organizations are able to identify and bring in new developers who implement new features and hopefully continue to contribute to open source even after the program is over. Most importantly, more code is created and released for the use and benefit of all</blockquote><br />
<br />
The [https://summerofcode.withgoogle.com official GSoC homepage] describes how it works and what it involves.<br />
<br />
The [https://en.wikipedia.org/wiki/Google_Summer_of_Code GSoC Wikipedia entry] also includes some interesting background information.<br />
<br />
==Who are we?==<br />
The [https://cyber.harvard.edu/ Berkman Klein Center for Internet & Society at Harvard University] was founded to explore cyberspace, share in its study, and help pioneer its development. We represent a network of faculty, students, fellows, entrepreneurs, lawyers, and virtual architects working to identify and engage with the challenges and opportunities of cyberspace.<br />
<br />
We [https://cyber.harvard.edu/projects-tools investigate] the real and possible boundaries in cyberspace between open and closed systems of code, of commerce, of governance, and of education, and the relationship of law to each. We do this through active rather than passive research, believing that the best way to understand cyberspace is to actually build out into it.<br />
<br />
Our [https://cyber.harvard.edu/people?field_role_value=Faculty+Associate&field_related_topics_target_id=All faculty], [https://cyber.harvard.edu/people?field_role_value=Fellow&field_related_topics_target_id=All fellows], students, and affiliates engage with a wide spectrum of Net issues, including governance, privacy, intellectual property, antitrust, content control, and electronic commerce. Our diverse research interests cohere in a common understanding of the Internet as a social and political space where constraints upon inhabitants are determined not only through the traditional application of law, but, more subtly, through technical architecture ("code").<br />
<br />
As part of our active research mission, we build, use, and freely share open software platforms for free [https://cyber.harvard.edu/events online lectures and discussions]. We also sponsor gatherings, ranging from informal lunches to international conferences, that bring together members of our diverse network of participants to swap insights – and sometimes barbs – as they stake out their respective visions for what the Net can become. We also [https://cyber.harvard.edu/education teach], seeking out online and global opportunities, as well as supporting the traditional Harvard Law School curriculum, through our [https://cyber.harvard.edu/teaching/cyberlawclinic Cyberlaw Clinic], and in conjunction with other Harvard schools and MIT.<br />
<br />
Read more about the Berkman Klein Center at [https://cyber.harvard.edu our homepage].<br />
<br />
==How to Apply==<br />
Applications open March 25, 2019 at 13:00 (EDT) / 18:00 (UTC). You must submit your application via GSoC: https://summerofcode.withgoogle.com/get-started. We will not be able to accept or process any application in any other way. Please use the below application template when submitting your application.<br />
<br />
All proposals must be submitted by April 9, 2019 at 13:00 (EDT) / 18:00 (UTC).<br />
<br />
===Application Template===<br />
[[Application_Template|Berkman Klein Application template for GSoC.]] This is the preferred template for submitting your application to work on a Berkman Klein Center project.<br />
<br />
==Contact Us==<br />
We prefer email, though we also run an IRC channel.<br><br />
'''Email:''' [mailto:gsoc@cyber.harvard.edu gsoc@cyber.harvard.edu] <br><br />
<br />
==FAQ==<br />
[[GSoC_FAQ|Answers to commonly asked questions.]] This includes a set of requirements around working hours, who can apply, and other commitments you might have for the summer. Please read!</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Main_Page&diff=632Main Page2019-03-01T19:15:51Z<p>BerkmanSysop: </p>
<hr />
<div><div class="width90">[[Image:GSoC2016Logo.jpg|class=responsive]]</div><br />
<br />
<br />
== Welcome to Berkman Klein Center Google Summer of Code 2019==<br />
===What is GSoC?===<br />
It's where you spend your summer writing code for awesome open source projects: <br />
<blockquote>Google Summer of Code is a global program focused on introducing students to open source software development. Students work on a 3 month programming project with an open source organization during their break from university.<br />
<br />
Since its inception in 2005, the program has brought together almost 11,000 student participants and 10,000 mentors from over 113 countries worldwide. Google Summer of Code has produced over 50 million lines of code for 515 open source organizations.<br />
<br />
As a part of Google Summer of Code, student participants are paired with a mentor from the participating organizations, gaining exposure to real-world software development and techniques. Students have the opportunity to spend the break between their school semesters earning a stipend while working in areas related to their interests.<br />
<br />
In turn, the participating organizations are able to identify and bring in new developers who implement new features and hopefully continue to contribute to open source even after the program is over. Most importantly, more code is created and released for the use and benefit of all</blockquote><br />
<br />
The [https://summerofcode.withgoogle.com official GSoC homepage] describes how it works and what it involves.<br />
<br />
The [https://en.wikipedia.org/wiki/Google_Summer_of_Code GSoC Wikipedia entry] also includes some interesting background information.<br />
<br />
===Who are we?===<br />
The [https://cyber.harvard.edu/ Berkman Klein Center for Internet & Society at Harvard University] was founded to explore cyberspace, share in its study, and help pioneer its development. We represent a network of faculty, students, fellows, entrepreneurs, lawyers, and virtual architects working to identify and engage with the challenges and opportunities of cyberspace.<br />
<br />
We [https://cyber.harvard.edu/projects-tools investigate] the real and possible boundaries in cyberspace between open and closed systems of code, of commerce, of governance, and of education, and the relationship of law to each. We do this through active rather than passive research, believing that the best way to understand cyberspace is to actually build out into it.<br />
<br />
Our [https://cyber.harvard.edu/people?field_role_value=Faculty+Associate&field_related_topics_target_id=All faculty], [https://cyber.harvard.edu/people?field_role_value=Fellow&field_related_topics_target_id=All fellows], students, and affiliates engage with a wide spectrum of Net issues, including governance, privacy, intellectual property, antitrust, content control, and electronic commerce. Our diverse research interests cohere in a common understanding of the Internet as a social and political space where constraints upon inhabitants are determined not only through the traditional application of law, but, more subtly, through technical architecture ("code").<br />
<br />
As part of our active research mission, we build, use, and freely share open software platforms for free [https://cyber.harvard.edu/events online lectures and discussions]. We also sponsor gatherings, ranging from informal lunches to international conferences, that bring together members of our diverse network of participants to swap insights – and sometimes barbs – as they stake out their respective visions for what the Net can become. We also [https://cyber.harvard.edu/education teach], seeking out online and global opportunities, as well as supporting the traditional Harvard Law School curriculum, through our [https://cyber.harvard.edu/teaching/cyberlawclinic Cyberlaw Clinic], and in conjunction with other Harvard schools and MIT.<br />
<br />
Read more about the Berkman Klein Center at [https://cyber.harvard.edu our homepage].<br />
<br />
==Project Opportunities==<br />
There are several GSoC 2019 projects at the Berkman Klein Center:<br />
<br />
===[[Media Cloud]]===<br />
[https://mediacloud.org/ Media Cloud] ([https://github.com/berkmancenter/mediacloud github]) is an open source platform for studying media ecosystems, run jointly by the Berkman Klein Center at Harvard University and the [https://www.media.mit.edu/groups/civic-media/overview/ Center for Civic Media] at the MIT Media Lab. <br />
<br />
===[[Lumen]]===<br />
The [https://lumendatabase.org/ Lumen Database] is an archive of requests for removal of online content. This lets lawyers, journalists, and the general public study threats to speech online and understand their rights. ([https://github.com/berkmancenter/lumendatabase github]).<br />
<br />
===[[Ayanda]]===<br />
Ayanda is an Open Source Android Library that makes it easy to discover nearby devices and share files through a simple API.<br />
<br />
===[[Question Tool]]===<br />
A tool for asking and voting on questions during events or classes ([https://github.com/berkmancenter/question_tool github]). Written in Javascript using the Meteor.js framework.<br />
<br />
===[[Dotplot]]===<br />
Dotplot is a D3-based visualization tool that lets you tell stories about data ([https://github.com/berkmancenter/dotplot github]).<br />
<br />
==How to Apply==<br />
Applications open March 25, 2019 at 13:00 (EDT) / 18:00 (UTC). You must submit your application via GSoC: https://summerofcode.withgoogle.com/get-started. We will not be able to accept or process any application in any other way. Please use the below application template when submitting your application.<br />
<br />
All proposals must be submitted by April 9, 2019 at 13:00 (EDT) / 18:00 (UTC).<br />
<br />
===Application Template===<br />
[[Application_Template|Berkman Klein Application template for GSoC.]] This is the preferred template for submitting your application to work on a Berkman Klein Center project.<br />
<br />
===Contact Us===<br />
We prefer email, though we also run an IRC channel.<br><br />
'''Email:''' [mailto:gsoc@cyber.harvard.edu gsoc@cyber.harvard.edu] <br><br />
<br />
==FAQ==<br />
[[GSoC_FAQ|Answers to commonly asked questions.]] This includes a set of requirements around working hours, who can apply, and other commitments you might have for the summer. Please read!</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Question_Tool&diff=631Question Tool2019-03-01T18:01:08Z<p>BerkmanSysop: /* Project Goals */</p>
<hr />
<div>===Project Description===<br />
Question Tool is a discussion tool for groups including classrooms and organizations holding meetings rooted in the [https://en.wikipedia.org/wiki/Socratic_method Socratic method ]. Users are able to make posts that can be voted on or discussed by participants either with attribution or anonymously. <br />
<br />
'''GitHub Repo:''' [https://github.com/berkmancenter/question_tool https://github.com/berkmancenter/question_tool]<br />
<br />
'''Demo instance:''' [https://questions.dev.berkmancenter.org/ https://questions.dev.berkmancenter.org/]<br />
<br />
===Ideal Candidate===<br />
Question Tools is written in Javascript using the Meteor.js framework. In addition to javascript the ideal candidate will have extensive experience with front-end web development technologies like CSS3, HTML5, and CSS animations. An ideal candidate will also have design experience and be familiar with mobile responsive design UX and UI best practices.<br />
<br />
==Project Goals==<br />
Here are some ideas, suggestions, and thoughts for directions you might take working on the Question Tool for a summer. We recognize that there are many other way to make question tool better and would love to hear your ideas as well. Please note that a project proposal does not need to take on all of the ideas below but rather should rather be well thought out approaches to improving the platform and realistic about your summer goals.<br />
<br />
===General use UI/UX Improvements===<br />
* Improve on the UX of the instance creation process to better articulate the constraints on an instance and what settings are best for different types of conversations.<br />
* Allow features to be editable by an instance owner after the creation process.<br />
* Improvements to the management process for admin, owner, and mod roles.<br />
** Streamlining account creation for moderators<br />
** Implantation of a password reset functionality for Admin and Mods<br />
* Continued improvements to mobile UI/UX<br />
* Implementation of an archive functionality an Admin can take advantage of.<br />
** How can an Admin download an archive version of a Question Tool instance?<br />
** What would an archive format look like? <br />
** How can the instance be represented without having to re-instantiate that instance? <br />
** Please keep in mind that the Question Tool is NOT intended as a long term hosting platform and we would rather avoid having to "re-upload" that archive to view it.<br />
<br />
===Improvements to Question Tool's classroom presentation UI/UX===<br />
One of Question Tool's primary uses is in the classroom. Question Tool already has a "full screen" mode for use when projected at the front of a classroom or lecture hall. We would like to improve on and refine this functionality. We see several ways that this can be improved and welcome other ideas you may have related to this use case. <br />
<br />
*Implementation of QR codes for easy access to Question Tool instances URLs from a mobile device.<br />
**How would this process look? <br />
**When and how would best be the time and place for a QR code to be displayed? <br />
**Is it something that can always be represented for an instance in a meaningful way?<br />
*Remote control of a projected instance from another device.<br />
**Allow an Admin or possibly a Mod to control and highlight questions that are projected.<br />
**How can the real time aspects of meteor.js be leveraged for this?<br />
**What does this process look like from an Admin's perspective?<br />
**How does this functionality work with multiple Admins and Mods? Can it?</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Question_Tool&diff=630Question Tool2019-03-01T18:00:04Z<p>BerkmanSysop: /* Improvements to Question Tool's classroom presentation UI/UX */</p>
<hr />
<div>===Project Description===<br />
Question Tool is a discussion tool for groups including classrooms and organizations holding meetings rooted in the [https://en.wikipedia.org/wiki/Socratic_method Socratic method ]. Users are able to make posts that can be voted on or discussed by participants either with attribution or anonymously. <br />
<br />
'''GitHub Repo:''' [https://github.com/berkmancenter/question_tool https://github.com/berkmancenter/question_tool]<br />
<br />
'''Demo instance:''' [https://questions.dev.berkmancenter.org/ https://questions.dev.berkmancenter.org/]<br />
<br />
===Ideal Candidate===<br />
Question Tools is written in Javascript using the Meteor.js framework. In addition to javascript the ideal candidate will have extensive experience with front-end web development technologies like CSS3, HTML5, and CSS animations. An ideal candidate will also have design experience and be familiar with mobile responsive design UX and UI best practices.<br />
<br />
==Project Goals==<br />
Here are some ideas, suggestions, and thoughts for directions you might take working on the Question Tool for a summer. We recognize that there are many other way to make question tool better and would love to here your ideas as well. Please note that a project proposal does not need to take on all of the ideas below but rather should rather be well thought out approaches to improving the platform and realistic about your summer goals.<br />
<br />
===General use UI/UX Improvements===<br />
* Improve on the UX of the instance creation process to better articulate the constraints on an instance and what settings are best for different types of conversations.<br />
* Allow features to be editable by an instance owner after the creation process.<br />
* Improvements to the management process for admin, owner, and mod roles.<br />
** Streamlining account creation for moderators<br />
** Implantation of a password reset functionality for Admin and Mods<br />
* Continued improvements to mobile UI/UX<br />
* Implementation of an archive functionality an Admin can take advantage of.<br />
** How can an Admin download an archive version of a Question Tool instance?<br />
** What would an archive format look like? <br />
** How can the instance be represented without having to re-instantiate that instance? <br />
** Please keep in mind that the Question Tool is NOT intended as a long term hosting platform and we would rather avoid having to "re-upload" that archive to view it.<br />
<br />
===Improvements to Question Tool's classroom presentation UI/UX===<br />
One of Question Tool's primary uses is in the classroom. Question Tool already has a "full screen" mode for use when projected at the front of a classroom or lecture hall. We would like to improve on and refine this functionality. We see several ways that this can be improved and welcome other ideas you may have related to this use case. <br />
<br />
*Implementation of QR codes for easy access to Question Tool instances URLs from a mobile device.<br />
**How would this process look? <br />
**When and how would best be the time and place for a QR code to be displayed? <br />
**Is it something that can always be represented for an instance in a meaningful way?<br />
*Remote control of a projected instance from another device.<br />
**Allow an Admin or possibly a Mod to control and highlight questions that are projected.<br />
**How can the real time aspects of meteor.js be leveraged for this?<br />
**What does this process look like from an Admin's perspective?<br />
**How does this functionality work with multiple Admins and Mods? Can it?</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Question_Tool&diff=629Question Tool2019-03-01T17:58:29Z<p>BerkmanSysop: /* General use UI/UX Improvements */</p>
<hr />
<div>===Project Description===<br />
Question Tool is a discussion tool for groups including classrooms and organizations holding meetings rooted in the [https://en.wikipedia.org/wiki/Socratic_method Socratic method ]. Users are able to make posts that can be voted on or discussed by participants either with attribution or anonymously. <br />
<br />
'''GitHub Repo:''' [https://github.com/berkmancenter/question_tool https://github.com/berkmancenter/question_tool]<br />
<br />
'''Demo instance:''' [https://questions.dev.berkmancenter.org/ https://questions.dev.berkmancenter.org/]<br />
<br />
===Ideal Candidate===<br />
Question Tools is written in Javascript using the Meteor.js framework. In addition to javascript the ideal candidate will have extensive experience with front-end web development technologies like CSS3, HTML5, and CSS animations. An ideal candidate will also have design experience and be familiar with mobile responsive design UX and UI best practices.<br />
<br />
==Project Goals==<br />
Here are some ideas, suggestions, and thoughts for directions you might take working on the Question Tool for a summer. We recognize that there are many other way to make question tool better and would love to here your ideas as well. Please note that a project proposal does not need to take on all of the ideas below but rather should rather be well thought out approaches to improving the platform and realistic about your summer goals.<br />
<br />
===General use UI/UX Improvements===<br />
* Improve on the UX of the instance creation process to better articulate the constraints on an instance and what settings are best for different types of conversations.<br />
* Allow features to be editable by an instance owner after the creation process.<br />
* Improvements to the management process for admin, owner, and mod roles.<br />
** Streamlining account creation for moderators<br />
** Implantation of a password reset functionality for Admin and Mods<br />
* Continued improvements to mobile UI/UX<br />
* Implementation of an archive functionality an Admin can take advantage of.<br />
** How can an Admin download an archive version of a Question Tool instance?<br />
** What would an archive format look like? <br />
** How can the instance be represented without having to re-instantiate that instance? <br />
** Please keep in mind that the Question Tool is NOT intended as a long term hosting platform and we would rather avoid having to "re-upload" that archive to view it.<br />
<br />
===Improvements to Question Tool's classroom presentation UI/UX===<br />
One of Question Tools primary uses is in the classroom. Question Tool already has a "full screen" mode for use when projected at the front of a classroom or lecture hall. We would like to improve on and refine this functionality. We see several ways that this can be improved and welcome other ideas you may have related to this use case. <br />
<br />
*Implementation of QR codes for easy access to Question Tool instances URLs from a mobile device.<br />
**How would this process look? <br />
**When and how would best be the time and place for a QR code to be displayed? <br />
**Is it something that can always be represented for an instance in a meaningful way?<br />
*Remote control of a projected instance from another device.<br />
**Allow an Admin or possibly a Mod to control and highlight questions that are projected.<br />
**How can the real time aspects of meteor.js be leveraged for this?<br />
**What does this process look like from an Admin's perspective?<br />
**How does this functionality work with multiple Admins and Mods? Can it?</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Question_Tool&diff=628Question Tool2019-03-01T17:57:18Z<p>BerkmanSysop: /* Improvements to Question Tool's classroom presentation UI/UX */</p>
<hr />
<div>===Project Description===<br />
Question Tool is a discussion tool for groups including classrooms and organizations holding meetings rooted in the [https://en.wikipedia.org/wiki/Socratic_method Socratic method ]. Users are able to make posts that can be voted on or discussed by participants either with attribution or anonymously. <br />
<br />
'''GitHub Repo:''' [https://github.com/berkmancenter/question_tool https://github.com/berkmancenter/question_tool]<br />
<br />
'''Demo instance:''' [https://questions.dev.berkmancenter.org/ https://questions.dev.berkmancenter.org/]<br />
<br />
===Ideal Candidate===<br />
Question Tools is written in Javascript using the Meteor.js framework. In addition to javascript the ideal candidate will have extensive experience with front-end web development technologies like CSS3, HTML5, and CSS animations. An ideal candidate will also have design experience and be familiar with mobile responsive design UX and UI best practices.<br />
<br />
==Project Goals==<br />
Here are some ideas, suggestions, and thoughts for directions you might take working on the Question Tool for a summer. We recognize that there are many other way to make question tool better and would love to here your ideas as well. Please note that a project proposal does not need to take on all of the ideas below but rather should rather be well thought out approaches to improving the platform and realistic about your summer goals.<br />
<br />
===General use UI/UX Improvements===<br />
* Improve on the UX of the instance creation process to better articulate the constraints on an instance and what settings are best for different types of conversations.<br />
* Allow features to be editable by an instance owner after the creation process.<br />
* Improvements to the management process for admin, owner, and mod roles.<br />
** Streamlining account creation for moderators<br />
** Implantation of a password reset functionality for Admin and Mods<br />
* Continued improvements to mobile UI/UX<br />
* Implementation of an archive functionality an Admin can take advantage of.<br />
** How can an Admin download an archive version of a Question Tool instance?<br />
** What would an archive format look like? How can the instance be represented without having to re-instantiate that instance? Please keep in mind that the Question Tool is NOT intended as a long term hosting platform and we would rather avoid having to "re-upload" that archive to view it.<br />
<br />
===Improvements to Question Tool's classroom presentation UI/UX===<br />
One of Question Tools primary uses is in the classroom. Question Tool already has a "full screen" mode for use when projected at the front of a classroom or lecture hall. We would like to improve on and refine this functionality. We see several ways that this can be improved and welcome other ideas you may have related to this use case. <br />
<br />
*Implementation of QR codes for easy access to Question Tool instances URLs from a mobile device.<br />
**How would this process look? <br />
**When and how would best be the time and place for a QR code to be displayed? <br />
**Is it something that can always be represented for an instance in a meaningful way?<br />
*Remote control of a projected instance from another device.<br />
**Allow an Admin or possibly a Mod to control and highlight questions that are projected.<br />
**How can the real time aspects of meteor.js be leveraged for this?<br />
**What does this process look like from an Admin's perspective?<br />
**How does this functionality work with multiple Admins and Mods? Can it?</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Question_Tool&diff=627Question Tool2019-03-01T17:56:14Z<p>BerkmanSysop: /* Project Goals */</p>
<hr />
<div>===Project Description===<br />
Question Tool is a discussion tool for groups including classrooms and organizations holding meetings rooted in the [https://en.wikipedia.org/wiki/Socratic_method Socratic method ]. Users are able to make posts that can be voted on or discussed by participants either with attribution or anonymously. <br />
<br />
'''GitHub Repo:''' [https://github.com/berkmancenter/question_tool https://github.com/berkmancenter/question_tool]<br />
<br />
'''Demo instance:''' [https://questions.dev.berkmancenter.org/ https://questions.dev.berkmancenter.org/]<br />
<br />
===Ideal Candidate===<br />
Question Tools is written in Javascript using the Meteor.js framework. In addition to javascript the ideal candidate will have extensive experience with front-end web development technologies like CSS3, HTML5, and CSS animations. An ideal candidate will also have design experience and be familiar with mobile responsive design UX and UI best practices.<br />
<br />
==Project Goals==<br />
Here are some ideas, suggestions, and thoughts for directions you might take working on the Question Tool for a summer. We recognize that there are many other way to make question tool better and would love to here your ideas as well. Please note that a project proposal does not need to take on all of the ideas below but rather should rather be well thought out approaches to improving the platform and realistic about your summer goals.<br />
<br />
===General use UI/UX Improvements===<br />
* Improve on the UX of the instance creation process to better articulate the constraints on an instance and what settings are best for different types of conversations.<br />
* Allow features to be editable by an instance owner after the creation process.<br />
* Improvements to the management process for admin, owner, and mod roles.<br />
** Streamlining account creation for moderators<br />
** Implantation of a password reset functionality for Admin and Mods<br />
* Continued improvements to mobile UI/UX<br />
* Implementation of an archive functionality an Admin can take advantage of.<br />
** How can an Admin download an archive version of a Question Tool instance?<br />
** What would an archive format look like? How can the instance be represented without having to re-instantiate that instance? Please keep in mind that the Question Tool is NOT intended as a long term hosting platform and we would rather avoid having to "re-upload" that archive to view it.<br />
<br />
===Improvements to Question Tool's classroom presentation UI/UX===<br />
One of Question Tools primary uses is in the classroom. Question Tool already has a "full screen" mode for use when projected at the front of a classroom or lecture hall. We would like to improve on and refine this functionality. We see several ways that this can be improved and welcome other ideas you may have related to this use case. <br />
<br />
*Implementation of QR codes for easy access to Question Tool instances URLs from a mobile device.<br />
**How would this process look? <br />
**When and how would best be the time and place for a QR code to be displayed? Is it something that can always be represented for an instance in a meaningful way?<br />
*Remote control of a projected instance from another device.<br />
**Allow an Admin or possibly a Mod to control and highlight questions that are projected.<br />
**How can the real time aspects of meteor.js be leveraged for this?<br />
**What does this process look like from an Admin's perspective?<br />
**How does this functionality work with multiple Admins and Mods? Can it?</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Dotplot&diff=626Dotplot2019-02-28T16:13:59Z<p>BerkmanSysop: </p>
<hr />
<div>Technical mentor: [mailto:jclark@cyber.harvard.edu jclark@cyber.harvard.edu]<br />
<br />
Dotplot is a [https://i.imgur.com/hePKkim.gifv visualization] that allows one to tell a story about data. Our primary use case is mapping a diverse network of people in a playful and interactive way. All too often, the only way to get a sense for the attendees of an event or the members of a distributed network is to do a significant amount of searching for resumes and background information. This research takes excessive time and effort, and it makes what should be an exciting and fun--meeting new people and finding potential collaborators--into a chore, something to be avoided or put off.<br />
<br />
This is where dotplot intervenes, bringing a sense of joy and discovery back into the process and helping to facilitate even more discoveries of commonalities and uniqueness among group members. This tool takes a spreadsheet of information about individuals gathered through survey questions and generates a dynamic visualization of the responses.<br />
<br />
Examples of dotplot in action: http://brk.mn/dcviz, http://brk.mn/casviz<br />
<br />
GitHub repo: https://github.com/berkmancenter/dotplot <br />
<br />
===Ideal candidate criteria===<br />
dotplot is interested in developers to improve both typical user experience and creative uses of the tool. Currently, dotplot requires specific backend programming to input spreadsheet information, but we would like it to be a tool available for conference organizers and event hosts to use without difficulty or extensive programming experience. The ideal candidate has experience with JavaScript, D3.js, HTML/CSS/SVG, and a server-side programming language of the student’s choice (Javascript, Ruby, PHP, Python).<br />
<br />
Suggested sub-projects include:<br />
*Design and build a system for collecting structured data from users<br />
*Connect the collected data to the existing JS visualization<br />
*Expand and improve upon the existing JS visualization<br />
*Bring creative and thoughtful ideas about potential enhancements<br />
*Ingest data from Google Sheets<br />
<br />
Other potential sub-projects include:<br />
*Easy to use, interactive interface for data input and manipulation<br />
*Offer users choices for how the data is visualized by survey and by question:<br />
**Display locations on a map<br />
**Track a specific respondent through multiple questions (in progress)<br />
**See the relationship between multiple questions using color, space, etc.<br />
**Zoom in and out on specific clusters<br />
**Hover to reveal more information about respondents or clusters (in progress)</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Dotplot&diff=625Dotplot2019-02-28T15:51:12Z<p>BerkmanSysop: /* Ideal candidate criteria */</p>
<hr />
<div>Technical mentor: [mailto:jclark@cyber.harvard.edu jclark@cyber.harvard.edu]<br />
<br />
dotplot is a [https://i.imgur.com/hePKkim.gifv visualization] that allows one to tell a story about data. Our primary use case is mapping a diverse network of people in a playful and interactive way. All too often, the only way to get a sense for the attendees of an event or the members of a distributed network is to do a significant amount of searching for resumes and background information. This research takes excessive time and effort, and it makes what should be an exciting and fun--meeting new people and finding potential collaborators--into a chore, something to be avoided or put off.<br />
<br />
This is where dotplot intervenes, bringing a sense of joy and discovery back into the process and helping to facilitate even more discoveries of commonalities and uniqueness among group members. This tool takes a spreadsheet of information about individuals gathered through survey questions and generates a dynamic visualization of the responses.<br />
<br />
Examples of dotplot in action: http://brk.mn/dcviz, http://brk.mn/casviz<br />
<br />
GitHub repo: https://github.com/berkmancenter/dotplot <br />
<br />
===Ideal candidate criteria===<br />
dotplot is interested in developers to improve both typical user experience and creative uses of the tool. Currently, dotplot requires specific backend programming to input spreadsheet information, but we would like it to be a tool available for conference organizers and event hosts to use without difficulty or extensive programming experience. The ideal candidate has experience with JavaScript, D3.js, HTML/CSS/SVG, and a server-side programming language of the student’s choice (Javascript, Ruby, PHP, Python).<br />
<br />
Suggested sub-projects include:<br />
*Design and build a system for collecting structured data from users<br />
*Connect the collected data to the existing JS visualization<br />
*Expand and improve upon the existing JS visualization<br />
*Bring creative and thoughtful ideas about potential enhancements<br />
*Ingest data from Google Sheets<br />
<br />
Other potential sub-projects include:<br />
*Easy to use, interactive interface for data input and manipulation<br />
*Offer users choices for how the data is visualized by survey and by question:<br />
**Display locations on a map<br />
**Track a specific respondent through multiple questions (in progress)<br />
**See the relationship between multiple questions using color, space, etc.<br />
**Zoom in and out on specific clusters<br />
**Hover to reveal more information about respondents or clusters (in progress)</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Dotplot&diff=624Dotplot2019-02-28T15:49:48Z<p>BerkmanSysop: </p>
<hr />
<div>Technical mentor: [mailto:jclark@cyber.harvard.edu jclark@cyber.harvard.edu]<br />
<br />
dotplot is a [https://i.imgur.com/hePKkim.gifv visualization] that allows one to tell a story about data. Our primary use case is mapping a diverse network of people in a playful and interactive way. All too often, the only way to get a sense for the attendees of an event or the members of a distributed network is to do a significant amount of searching for resumes and background information. This research takes excessive time and effort, and it makes what should be an exciting and fun--meeting new people and finding potential collaborators--into a chore, something to be avoided or put off.<br />
<br />
This is where dotplot intervenes, bringing a sense of joy and discovery back into the process and helping to facilitate even more discoveries of commonalities and uniqueness among group members. This tool takes a spreadsheet of information about individuals gathered through survey questions and generates a dynamic visualization of the responses.<br />
<br />
Examples of dotplot in action: http://brk.mn/dcviz, http://brk.mn/casviz<br />
<br />
GitHub repo: https://github.com/berkmancenter/dotplot <br />
<br />
===Ideal candidate criteria===<br />
dotplot is interested in developers to improve both typical user experience and creative uses of the tool. Currently, dotplot requires specific backend programming to input spreadsheet information, but we would like it to be a tool available for conference organizers and event hosts to use without difficulty or extensive programming experience. The ideal candidate has experience with JavaScript, D3.js, HTML/CSS/SVG, and a server-side programming language of the student’s choice (Javascript, Ruby, PHP, Python).<br />
<br />
Suggested sub-projects include:<br />
*Design and build a system for collecting structured data from users<br />
*Connect the collected data to the existing JS visualization<br />
*Expand and improve upon the existing JS visualization<br />
*Bring creative and thoughtful ideas about potential enhancements<br />
*Include "Ingest data from Google Sheets" as a suggested sub-project<br />
<br />
Other potential sub-projects include:<br />
*Easy to use, interactive interface for data input and manipulation<br />
*Offer users choices for how the data is visualized by survey and by question:<br />
**Display locations on a map<br />
**Track a specific respondent through multiple questions (in progress)<br />
**See the relationship between multiple questions using color, space, etc.<br />
**Zoom in and out on specific clusters<br />
**Hover to reveal more information about respondents or clusters (in progress)</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Dotplot&diff=623Dotplot2019-02-28T15:49:10Z<p>BerkmanSysop: </p>
<hr />
<div>Technical mentor: [mailto:jclark@cyber.harvard.edu jclark@cyber.harvard.edu]<br />
<br />
dotplot is a [https://i.imgur.com/hePKkim.gifv visualization] that allows one to tell a story about data. Our primary use case is mapping a diverse network of people in a playful and interactive way. All too often, the only way to get a sense for the attendees of an event or the members of a distributed network is to do a significant amount of searching for resumes and background information. This research takes excessive time and effort, and it makes what should be an exciting and fun--meeting new people and finding potential collaborators--into a chore, something to be avoided or put off.<br />
<br />
This is where dotplot intervenes, bringing a sense of joy and discovery back into the process and helping to facilitate even more discoveries of commonalities and uniqueness among group members. This tool takes a spreadsheet of information about individuals gathered through survey questions and generates a dynamic visualization of the responses.<br />
<br />
Examples of dotplot in action: http://brk.mn/dcviz, http://brk.mn/casviz<br />
<br />
GitHub repo: https://github.com/berkmancenter/dotplot <br />
<br />
===Ideal candidate criteria===<br />
dotplot is interested in developers to improve both typical user experience and creative uses of the tool. Currently, dotplot requires specific backend programming to input spreadsheet information, but we would like it to be a tool available for conference organizers and event hosts to use without difficulty or extensive programming experience. The ideal candidate has experience with JavaScript, D3.js, HTML/CSS/SVG, and a server-side programming language of the student’s choice (Javascript, Ruby, PHP, Python).<br />
<br />
Suggested sub-projects include:<br />
*Design and build a system for collecting structured data from users<br />
*Connect the collected data to the existing JS visualization<br />
*Expand and improve upon the existing JS visualization<br />
*Bring creative and thoughtful ideas about potential enhancements<br />
<br />
Other potential sub-projects include:<br />
*Easy to use, interactive interface for data input and manipulation<br />
*Offer users choices for how the data is visualized by survey and by question:<br />
**Display locations on a map<br />
**Track a specific respondent through multiple questions (in progress)<br />
**See the relationship between multiple questions using color, space, etc.<br />
**Zoom in and out on specific clusters<br />
**Hover to reveal more information about respondents or clusters (in progress)</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=MediaCloud&diff=622MediaCloud2019-02-27T18:18:19Z<p>BerkmanSysop: </p>
<hr />
<div>Media Cloud is an open source platform for studying media ecosystems.<br />
<br />
By tracking hundreds of millions of stories published online or broadcast via television, our suite of tools allows researchers to track how stories and ideas spread through media, and how different corners of the media ecosystem report on stories.<br />
<br />
Our platform is designed to aggregate, analyze, deliver and visualize information, answering complex quantitative and qualitative questions about the content of online media.<br />
<br />
Aggregate. We have aggregated billions of online stories from an ever-growing set of 1,200,000+ digital media sources. We ingest data via RSS feeds and a set of robots that spider the web to fetch information from a variety of sources in near real-time.<br />
<br />
Analyze. To query our extensive library of data, we have developed a suite of analytical tools that allow you to explore relationships between professional and citizen media, and between online and offline sources.<br />
<br />
Deliver and Visualize. Our suite of tools provides opportunities to present data in formats that you can visualize in your own interfaces. These include the use of graphs, geographic maps, word clouds, network visualizations.<br />
<br />
Project URL: https://mediacloud.org/<br />
<br />
Project on GitHub: https://github.com/berkmancenter/mediacloud<br />
<br />
Project Mentors: Linas Valiukas [[mailto:linas@media.mit.edu]], Hal Roberts [[mailto:hroberts@cyber.law.harvard.edu]]<br />
<br />
===Projects===<br />
<br />
====Build a tool to do some cool visualizations====<br />
''Problem Statement''. Since 2008, we have collected more than a half billion news articles that we have post-processed and indexed. We know quite a lot about them -- which news articles were the most linked to from other similar articles, the most and least popular / influential articles (based on shares on Facebook, tweet count, or clicks on an article's Bit.ly shortened link), specific language and terms used to describe the subject matter in each of the articles, etc., and there's a lot of potential to learn much more. Can you use your design and coding skills to help us out in visualising some of this data, e.g. create a cool network map visualization tool?<br />
<br />
=====Development Tasks=====<br />
*Build any visualization tool based on our extensive data and tool set:<br />
**Figure out what you'd like to visualise and how are you going to do it<br />
**Use Gephi, a tool of your choice, or create your very own tool to implement your visualisation<br />
<br />
<br />
====Create PostgreSQL-based job queue====<br />
''Problem Description''. In more than eight (or is it nine by now?) years since we've been running Media Cloud, we have tried multiple job queue tools (e.g. Gearman) that we could use for dividing and conquering our workload. Unfortunately, all the tools (including the current one -- go look into the codebase to figure out which one it is now) have left us deeply unhappy because of one reason or another. If there's one tool which hasn’t let us down, it’s PostgreSQL. So, we'd like to also try running our job queue on Postgres. Can you implement it for us?<br />
<br />
=====Development Tasks=====<br />
*Write a spec, complete with code samples, on how to implement the following job queue:<br />
**Preferably programming language-agnostic, i.e. should run as a bunch of PL/pgSQL functions.<br />
***Maybe that's a bad idea, I don't know, you tell us.<br />
*Features:<br />
**Add jobs with names and JSON arguments<br />
**Cancel jobs by their ID<br />
**Track job's progress (and log?) by their ID<br />
**Get job ID using its JSON parameters<br />
**Merge jobs with identical JSON arguments into a single job<br />
**See job stats per task, i.e. how many jobs are queued for every task<br />
**Retry failed jobs<br />
**Report job failure, complete with error messages<br />
**Proper locking (for inspiration, see https://github.com/chanks/que)<br />
**Doesn't catch fire with tens of millions of queued jobs<br />
*(Bonus points) Actually implement the queue! If you don't get to doing this over the summer, it's fine, we would be happy with a proven spec.<br />
<br />
<br />
====Implement a method to detect subtopics of a topic====<br />
*Problem Statement.* As described elsewhere, a "topic" is subject discussed by the media that we are researching. Almost every big topic contains subtopics, e.g. the matters of immigration, racism, email server security and a plethora of other subjects were discussed during the last US election. We would like to investigate ways of how we could automatically detect those subtopics, possibly using the [Louvain method](https://en.wikipedia.org/wiki/Louvain_Modularity).<br />
<br />
=====Development Tasks=====<br />
Develop a proof of concept (un)supervised ML tool for detecting subtopics of a chosen subject ("topic").<br />
<br />
====Do your own freehand project====<br />
Problem Statement. If you had more than half a billion (!) news articles from all around the world stored in a single place, extracted from HTML into text, split into sentences, words, and made searchable, what would you do? Propose us something we didn't think of, and we will surely consider it!<br />
<br />
=====Development Tasks. =====<br />
Left as an exercise to the student.<br />
<br />
==Requirements==<br />
*Working knowledge of Perl or Python<br />
*Familiarity with relational databases, preferably PostgreSQL<br />
*Some pedantism<br />
*Willingness to propose, debate and object to ideas<br />
*Keen to work with us on writing your GSoC project proposal, as opposed to just submitting a long shot without any feedback and hoping for the best<br />
*Shown effort into learning what Media Cloud is all about, e.g.:<br />
**Make a pull request to our main code repository; (https://github.com/berkmancenter/mediacloud),<br />
**Craft us an email with a smart question or two,<br />
**Try our our tools (see https://dashboard.mediacloud.org/#demo, https://sources.mediacloud.org/, http://globe.mediameter.org/, http://focus.mediameter.org/)<br />
**Run Media Cloud yourself and collect some news articles (see https://github.com/berkmancenter/mediacloud/blob/master/doc/vagrant.markdown),<br />
**Sign up and check out our API (see https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/api_2_0_spec.md, https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/topics_api_2_0_spec.md, and the API client at https://pypi.python.org/pypi/mediacloud/).</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Ayanda&diff=621Ayanda2019-02-27T15:54:00Z<p>BerkmanSysop: </p>
<hr />
<div>Ayanda Ayanda is an Open Source Android Library that makes it easy to discover nearby devices and share files through a simple API. Ayanda is meant to detect nearby devices using WiFi and Bluetooth technology. Currently the Ayanda library uses [https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.android.com_training_connect-2Ddevices-2Dwirelessly_wifi-2Ddirect&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cislNqXq-cCcJPAw7lrFVbjDiuL6uodM-0xhJJXaJps&m=BqCjHwhBG01zpo6IWidSN4IBOqewZ2qPfytDYPlL_0U&s=Ip0iZNcreBFqK7W8RrqhifZPw_VeGl9ifyNEsWAFCPM&e= Wifi-Direct] and [https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.android.com_guide_topics_connectivity_bluetooth&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cislNqXq-cCcJPAw7lrFVbjDiuL6uodM-0xhJJXaJps&m=BqCjHwhBG01zpo6IWidSN4IBOqewZ2qPfytDYPlL_0U&s=OrP69cPN8bW02ps5WJ4VOkGGpm6FydrIsug5F-oO9ZY&e= Bluetooth] to pair to nearby enabled Android devices and send files between devices. This library can be useful for creating apps that can respond to nearby users and provide proximity based services. It also is essential in allowing for Offline communication in a situation such as the internet is censored or shutdown completely -- A mesh network can be built on this.<br />
<br />
'''Project goals:'''<br />
Work on current [https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sabzo_ayanda_issues&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cislNqXq-cCcJPAw7lrFVbjDiuL6uodM-0xhJJXaJps&m=BqCjHwhBG01zpo6IWidSN4IBOqewZ2qPfytDYPlL_0U&s=7kqXS9orDwNJpBJCurV_yZr8ZolYZx-6l26V74IYMM8&e= issues].<br />
<br />
'''Project goals:'''<br />
Make project stable<br />
<br />
[https://github.com/sabzo/ayanda GitHub Repo]</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Ayanda&diff=620Ayanda2019-02-27T15:52:50Z<p>BerkmanSysop: </p>
<hr />
<div>Ayanda Ayanda is an Open Source Android Library that makes it easy to discover nearby devices and share files through a simple API. Ayanda is meant to detect nearby devices using WiFi and Bluetooth technology. Currently the Ayanda library uses [https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.android.com_training_connect-2Ddevices-2Dwirelessly_wifi-2Ddirect&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cislNqXq-cCcJPAw7lrFVbjDiuL6uodM-0xhJJXaJps&m=BqCjHwhBG01zpo6IWidSN4IBOqewZ2qPfytDYPlL_0U&s=Ip0iZNcreBFqK7W8RrqhifZPw_VeGl9ifyNEsWAFCPM&e= Wifi-Direct] and [https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.android.com_guide_topics_connectivity_bluetooth&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cislNqXq-cCcJPAw7lrFVbjDiuL6uodM-0xhJJXaJps&m=BqCjHwhBG01zpo6IWidSN4IBOqewZ2qPfytDYPlL_0U&s=OrP69cPN8bW02ps5WJ4VOkGGpm6FydrIsug5F-oO9ZY&e= Bluetooth] to pair to nearby enabled Android devices and send files between devices. This library can be useful for creating apps that can respond to nearby users and provide proximity based services. It also is essential in allowing for Offline communication in a situation such as the internet is censored or shutdown completely -- A mesh network can be built on this.<br />
<br />
'''Project goals:'''<br />
Work on current [https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sabzo_ayanda_issues&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cislNqXq-cCcJPAw7lrFVbjDiuL6uodM-0xhJJXaJps&m=BqCjHwhBG01zpo6IWidSN4IBOqewZ2qPfytDYPlL_0U&s=7kqXS9orDwNJpBJCurV_yZr8ZolYZx-6l26V74IYMM8&e= issues].<br />
<br />
'''Project goals:'''<br />
Make project stable</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Ayanda&diff=619Ayanda2019-02-27T15:52:14Z<p>BerkmanSysop: /* Project goals: */</p>
<hr />
<div>Ayanda Ayanda is an Open Source Android Library that makes it easy to discover nearby devices and share files through a simple API. Ayanda is meant to detect nearby devices using WiFi and Bluetooth technology. Currently the Ayanda library uses [https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.android.com_training_connect-2Ddevices-2Dwirelessly_wifi-2Ddirect&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cislNqXq-cCcJPAw7lrFVbjDiuL6uodM-0xhJJXaJps&m=BqCjHwhBG01zpo6IWidSN4IBOqewZ2qPfytDYPlL_0U&s=Ip0iZNcreBFqK7W8RrqhifZPw_VeGl9ifyNEsWAFCPM&e= Wifi-Direct] and [https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.android.com_guide_topics_connectivity_bluetooth&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cislNqXq-cCcJPAw7lrFVbjDiuL6uodM-0xhJJXaJps&m=BqCjHwhBG01zpo6IWidSN4IBOqewZ2qPfytDYPlL_0U&s=OrP69cPN8bW02ps5WJ4VOkGGpm6FydrIsug5F-oO9ZY&e= Bluetooth] to pair to nearby enabled Android devices and send files between devices. This library can be useful for creating apps that can respond to nearby users and provide proximity based services. It also is essential in allowing for Offline communication in a situation such as the internet is censored or shutdown completely -- A mesh network can be built on this.<br />
<br />
'''Project goals:'''</br><br />
Work on current [https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sabzo_ayanda_issues&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cislNqXq-cCcJPAw7lrFVbjDiuL6uodM-0xhJJXaJps&m=BqCjHwhBG01zpo6IWidSN4IBOqewZ2qPfytDYPlL_0U&s=7kqXS9orDwNJpBJCurV_yZr8ZolYZx-6l26V74IYMM8&e= issues].<br />
<br />
'''Project goals:'''</br><br />
Make project stable</div>BerkmanSysophttps://cyber.harvard.edu/gsoc/?title=Ayanda&diff=618Ayanda2019-02-27T15:50:20Z<p>BerkmanSysop: </p>
<hr />
<div>Ayanda Ayanda is an Open Source Android Library that makes it easy to discover nearby devices and share files through a simple API. Ayanda is meant to detect nearby devices using WiFi and Bluetooth technology. Currently the Ayanda library uses [https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.android.com_training_connect-2Ddevices-2Dwirelessly_wifi-2Ddirect&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cislNqXq-cCcJPAw7lrFVbjDiuL6uodM-0xhJJXaJps&m=BqCjHwhBG01zpo6IWidSN4IBOqewZ2qPfytDYPlL_0U&s=Ip0iZNcreBFqK7W8RrqhifZPw_VeGl9ifyNEsWAFCPM&e= Wifi-Direct] and [https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.android.com_guide_topics_connectivity_bluetooth&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cislNqXq-cCcJPAw7lrFVbjDiuL6uodM-0xhJJXaJps&m=BqCjHwhBG01zpo6IWidSN4IBOqewZ2qPfytDYPlL_0U&s=OrP69cPN8bW02ps5WJ4VOkGGpm6FydrIsug5F-oO9ZY&e= Bluetooth] to pair to nearby enabled Android devices and send files between devices. This library can be useful for creating apps that can respond to nearby users and provide proximity based services. It also is essential in allowing for Offline communication in a situation such as the internet is censored or shutdown completely -- A mesh network can be built on this.<br />
<br />
===Project goals:===<br />
Work on current [https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sabzo_ayanda_issues&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cislNqXq-cCcJPAw7lrFVbjDiuL6uodM-0xhJJXaJps&m=BqCjHwhBG01zpo6IWidSN4IBOqewZ2qPfytDYPlL_0U&s=7kqXS9orDwNJpBJCurV_yZr8ZolYZx-6l26V74IYMM8&e= issues].<br />
<br />
Project goals:<br />
Make project stable</div>BerkmanSysop