What started out as an attempt to curate arguments surrounding same-sex marriage turned into more of a data analysis piece (or struggle). But we still found the results interesting. Given the huge numbers of memes, pictures of protests and pro/con arguments for same-sex marriage, we wanted to find a way to make sense of them and look at how liberal and conservative media are framing the debate. Our initial idea was to scrape the Media Cloud data for images that accompany relevant articles and compare their captions and headlines (for those of you not familiar with Media Cloud, it's a joint project of Harvard's Berkman Center for Internet & Society and MIT's Center for Civic Media to quantitatively study online media. Media Cloud continuously downloads, archives, and analyzes articles from over 20,000 sources). Obtaining stories related to gay marriage turned out to be more time consuming than expected.
(If you'd like to know the technical details of our Media Cloud struggles, read this paragraph. Otherwise, you can safely skip it.) We ran into some difficulties obtaining images from these articles. Although Media Cloud retains an archive of the raw HTML of articles it downloads, only the text is stored in the database. We needed to write a separate script to grab the HTML. Looking at all images in an article's HTML page produced too much noise -- we got ads and navigation. Media Cloud has a facility to extract just the article content. However, it is optimized for text only analysis and often excluded key images. Finally, there was the added difficulty of writing a script to parse the extracted html, resolve links, and download the images. The image analysis thus wasn't feasible within our time constraints.
So instead of analyzing images, we parsed through the titles and texts of 1) the top 25 mainstream media, 2) all political blogs, 3) all popular blogs, 4) Left-leaning political blogs, 5) “center” political blogs, 6) right-leaning political blogs. Our search criteria were articles/posts in which the terms a) “gay” and “marriage” and b) “marriage” and “equality” appeared in the same sentence. We wanted to see what other words frequently appear in these articles and what patterns we could detect. We used ManyEyes Version 2 to create word clouds for each of these categories (24 in total), taking out the word “marriage”, since it was the most dominant word in all of the clouds. What we found was that in left-leaning political blog titles, the word “equality” dominated (for both of the search terms)”. In the texts, it was “people”. In the center and right-leaning blog titles and blogs, the dominant words were “gay”, “court” and “supreme”. When we looked at word trees, we found that in the texts of left-leaning political blogs, the word “marriage” is most often followed by “equality”, whereas in center and right-leaning blogs, “marriage” is most frequently followed by “.”
Media Cloud allowed us to zoom out by a lot look at rhetoric at a macro level. But to really understand how the arguments are being framed by both sides, we had to zoom in again and look at specific texts. Here, we found that a lot of the arguments involve same-sex marriage supporters appealing to ethical responsibility and civil rights discourse (note that some of the excerpts that follow don't fit into the timeframe of our search, but they were cited and linked to from articles within the timeframe):
“Same-sex marriage is an affirmation that people --“ all people --“ are made for better things, are capable of charity and concern for one another, are enhanced by living a life of virtue. ” (David Horsey, LA Times, linked to from Cab Drollery blog post)
“Now why is it happening? One, because a lot of brave gay and lesbian people had the courage to come out and people got to see them....Once the color barrier was breached, things changed and sports directly led the way for racial equality in America. It will take another brave person to lead again. If and when they do, let's support him/her as best as we can.”(Crooks and Liars, March 24)
“In the meantime, I think we can all agree that we're in a place where marriage equality is coming, and probably coming pretty rapidly......Young people, including young conservatives (Republicans), favor marriage equality by pluralities, and the only demographic where a majority are opposed to it is the demographic of people transitioning off of government health care and onto Heavenly health care.” (Michigan Liberal, March 26)
“If you call something a 'civil right' -- no matter how wildly unpopular it is -- then any opposition to it is bigotry. And what do you do with bigots? You put them down, you ignore them, you discredit what they believe." (Rush Limbaugh, March 25 )
“Efforts to institutionalize gay marriage have followed this course, with 'equality' as the goal. But the civil rights paradigm never really fit: unlike most African-Americans, lesbians and gay men can render their minority status invisible....They tend to be better educated, have better jobs, and these days are not at all what one could call an oppressed minority.” ( Justin Raimondo, The American Conservative )
“The point is, the marriage innovators assaulted the settled tradition --” and have just about won. But here's the thing: they won in part by framing their own assault on tradition as self-defense....It's brilliant propaganda, because it paints people who preferred the status quo into culture-war aggressors, rather than those who are actually aggressing against the settled tradition.” (Rob Dreher, The American Conservative)
The assignment let us experience the difficulties of analyzing and curating arguments on a macro level. We have a number of thoughts for how studies like this could be made easier. One thing that slowed us down was database performance. For example, we initially planned to study all of March but restricted ourselves to a shorter period because after nearly a day the database queries didn't complete. Improving query performance of a large dataset is a hard problem, one that the Media Cloud backend team has been actively working on.
It's also really hard to scrape images from a large number of articles and filtering the “noise” of ads, etc. Had this been easier, we could have compared and contrasted the types of images and their description by different media sources.
Word clouds are essentially a fairly “thin” analysis of texts. Word trees are maybe a bit better. But moving from the macro scale to the micro level of analysis required us to make a huge jump. Looking at word frequency in different political discourses gave us hints about questions we might ask. But understanding why different words were used requires a very different approach. Coding article paragraphs according to different categories might be a good “medium-level” analysis step. But this is extremely time consuming. What other tools might we look at to move from the “macro” to the “micro” level of analysis?