Ideology and Text: Classifying and Analyzing Discourse using Machine Learning
with Ali Hashmi
Tuesday, June 30, 2015 at 12:00 pm
Typically, text analysis tools uncover patterns in the data without uncovering the 'ideology' embedded in the text. In doing so, they conceal the function of the relation of 'what is being said' to its social, and, more importantly, political context. As part of my research, I have developed a tool that uses data-driven approaches for classifying discourse in news media. My research combines critical discourse analysis (CDA) approaches with corpus linguistics using machine learning and natural language processing (NLP) techniques. The objective of CDA approaches is to make more visible the hidden aspects of discourse by looking at the latent social ideologies that permeate social texts. On the other hand, corpus linguistics is an agnostic way of studying language patterns in large amounts of text. As an instance of this framework, I have developed a tool for analyzing discourse on Islam in the mainstream media. The tool is based on the hypothesis that the media coverage in several mainstream news sources tends to contextualize Muslims largely as a group embroiled in conflict at a disproportionately large level. My hypothesis is based on the assumption that discourse on Islam in mainstream global media tends to lean toward the dangerous "clash of civilizations" frame. To test this hypothesis, I have developed a prototype tool "Said-Huntington Discourse Analyzer" that machine classifies news articles on a normative scale— a scale that measures "clash of civilization" polarization in an article on the basis of conflict. The tool also extracts semantically meaningful conversations for a media source using topic modeling, allowing the users to discover frames of conversations on the basis of Said-Huntington index classification.
About Ali
Ali Hashmi is a researcher at the MIT Center for Civic Media. At the center, he is developing software tools that machine-classify and analyze discourse in news articles to elucidate the relationships between language, social identities and power. Ali is interested in projects and ideas at the intersection of journalism and technology. In particular, Ali is interested in: 1) understanding the ontology of digital asymmetries on the Internet; and 2) developing relevant media technologies for leveling the inequalities produced by these asymmetries. Prior to MIT, Ali was a McCormick scholar at Medill (Northwestern) and a Knight fellow at the Globe Lab (Boston Globe, NYTCO). He has worked as a software architect and development manager for Bell Canada for nearly nine years, leading business intelligence and data integration teams in Toronto, Montreal, London (Ontario) and Bangalore; he has also worked as a journalist in Pakistan. He holds an MS degree from MIT Media Lab, an MSJ degree from Northwestern University, and a BS degree in Computer Science from the University of Western Ontario.
Links
- Twitter profile - twitter.com/alihashmi01