Media Cloud

From Google Summer of Code 2009
Revision as of 23:30, 13 March 2009 by Geeks (talk | contribs) (New page: =Overview= Media Cloud is a project that tracks news content comprehensively – providing open, free, and flexible tools. This will allow unprecedented quantitative analysis of media tre...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Overview

Media Cloud is a project that tracks news content comprehensively – providing open, free, and flexible tools. This will allow unprecedented quantitative analysis of media trends. For instance, some of our driving questions are:

  1. Do bloggers introduce storylines into mainstream media or the other way around?
  2. What parts of the world are being covered or ignored by different media sources?
  3. Where do stories begin?
  4. How are competing terms for the same event used in different publications?
  5. Can we characterize the overall mix of coverage for a given source?
  6. How do patterns differ between local and national news coverage?
  7. Can we track news cycles for specific issues?
  8. Do online comments shape the news?

You can see some simple visualizations coming from our system on the main site for Media Cloud, but the project is under very active development and there is much more under the hood.

Ideas

Data Analysis and Visualization

We have terabytes of data and millions of archived stories. How can we construct queries that work efficiently on this data set and generate interesting and compelling results? For instance, we are currently experimenting with time-sequence analysis of different terms across different media sources:

[[Image:http://chart.apis.google.com/chart?chd=s:AABAACDFAAAAAABDNILLPhnqsqYT8xkdZOJWQLKJEDFIHEEDCDDCEDCDEEFGEABCDDCDEDLIMOKEGQPONJEDNMFBFCBFINLJFDLOQSRGGHECCKFCA,AAAAABGBAAAAAAABEFMGRMTekedSROPLLJHGaOJFBBDPFFDCEDCDFCAEDBCCFCCCBMPHELGMHEGCELFKFEJKYLGKBBADCEDCCEIGEHDCBBGAAKEAA,AAAAADBCAAAAAAADHHOGNMSkz6jWXQRQRKJITSQFDBDHFDECGDDDIABFFBBBDACEBFGCEAAAAABABBAABAAAABAAAAAAAAAAAAAABAAAAAAAAAAAA,AABAADCDAAAAAABFFGJHGKOSSNHIURKJQIILKIHGCCEHGEDDCCDCDCBBDCCEDCACDEDCBEEEFGFCDGHIDCCCJGGBCAACCEEFBCEDCEFCCBBABGCBB,AAAABDGGBAAAAABCFFLIPVXOPNLJUKFFGHCIHHJFDCJIFBCCABBADEDCCEBFCAABCBCBCEIFLKGBADGGEEFCMFGDFBACDFCDAACCICCCBCBAAECAA,AAAAABBCAAAAAABEGGNHNNOVVSHJZTQPKIGQPIIHEEIIFFFGCCDDFCBEBCCCBAABCCBBCCCDCDECBEEFDDDBGECBBAABBCCBAABBBCCBABAAACBAA,AAAAADCBBAAAAAAADEIJKNPUeULHTUJHJGDEFCDFCEDFFCEIDBBBCABBCABBBAAAAEDCBGFJFHGDAEDDJCBCIEBBAAABABCEDBECDDEDDDCBBKHCA,AABAAGLOCABAABCDGEIMPRRUSWLLUNJFHFDGGJOECCDECBCCACBBEBBAAABDDAABABCABADDJDEAABCDCBBAFGHBBBABBEECAAACABDBAAAAADBAA,AAAAABABAAAAAAACEEJHLGJNLOFGPPJLMGDJIHGDDCGGFEDEDCCCCCACDBCCDBABDEEDCEDEDDFCCFFHFDEDQIHEBAADCDDDBBDDCCEBBBBABECBA,AAAAABBBAAAAAAAACCGIMOPSTOIGSWMJIGDEDCDCBBDCBABBABBBCBAACBBBBAAABDCDBFEFFFDCBHFIFGDCFDDABAADGJHEDBFGEDECCCBBAFCAA&chdl=bailout%7Cobama%7Cmccain%7Ceconomies%7Ctreasuries%7Ccrisis%7Cbush%7Cmortgage%7Ceconomic%7Ccongress&chxt=x&chxl=0:%7C2008-09-01%7C2008-10-08%7C2008-11-15%7C2008-12-22&chs=600x250&cht=lc&chco=ff0000,00ff00,0000ff,ff8888,88ff88,8888ff,88ffff,ff88ff,ffff88,888888]]

How would we go about visualizing some of the questions expressed above, with the data we have? We currently use the Google Visualizations API to actually generate our charts.