Media Cloud

From Berkman Klein Google Summer of Code Wiki
Redirect page
Jump to navigation Jump to search

Redirect to:


Media Cloud is a system that lets you see the flow of the media. The Internet is fundamentally altering the way that news is produced and distributed, but there are few comprehensive approaches to understanding the nature of these changes. Media Cloud automatically builds an archive of news stories and blog posts from the web, applies language processing, and gives you ways to analyze and visualize the data. Media Cloud is aims to tracks news content comprehensively – providing open, free, and flexible tools. This will allow unprecedented quantitative analysis of media trends. For instance, some of our driving questions are:

Do bloggers introduce storylines into mainstream media or the other way around?

What parts of the world are being covered or ignored by different media sources?

Where do stories begin?

How are competing terms for the same event used in different publications?

Can we characterize the overall mix of coverage for a given source?

How do patterns differ between local and national news coverage?

Can we track news cycles for specific issues?

Do online comments shape the news?

Media Cloud is capable of crawling and analyzing arbitrary on-line news media and blogs. At the high level, we monitor RSS feed updates and then direct our crawler to download the corresponding web pages. These web page are saved and then later analyzed. Among other types of analysis, we currently do entity extraction, word frequency analysis, and clustering. But we are open to other types of text analysis ideas.

The main site for Media Cloud is There, you can see some simple visualizations generated out of our system, but the project is under very active development and there is much more under the hood. Applicants are encouraged to examine the source forge project page and subversion repository.