Ethics of Data
This is a blog post summarizing the "Teaching Ethics" session of the Berkman Center Teaching Data Storytelling for Civic Impact Study Group. This blog post is based on the collaborative live-notes we took together on Thursday March 26th.
Working with data raises important ethical questions. There are dimensions of ethics in all aspects of the data pipeline from collecting data to storing and retaining data to choosing how to represent data. Teaching ethics at the same time that we teach skills is crucial, particularly if the goal is to use data for civic impact and empowerment. This session of the study group focused on discussing these questions. We started off with a number of case studies, and then broke-out into 2 groups to talk about issues related to data and ethics for various audiences.
Introduction
Catherine introduces the session with a brief presentation about the landscape of data and ethics (see above slides). We are generating more data, particularly personal data, on a daily basis than ever before. But though many of us generate data, analyzing storing and making sense of the data is the realm of specialists and experts. Many data practices are closed, centralized and extractive. While the Open Data Movement provides some hope for opening up data for public benefit, there are numerous challenges before its aspirations become reality and there are troubling cases of misusing data.
Catherine
Catherine starts her case study here. She is interested in how to prioritize teaching aspects of ethics and data for journalism students. The class is Data Visualization so the whole class can't be about ethics, so what is most essential for them to know? Should she teach tools for keeping sources protected and/or focus more on case studies and/or focus on the philosophical issues?
The group offers several concrete suggestions including using personal data as an exercise in class so that students see how much is exposed, Rahul's Gallery walk activity, and using powerful case studies as vivid stories that people will remember.
Danielle
Danielle shows the project GoBoston2030. This is a city of Boston project created by the Interaction Institute for Social Change. It was inspired by Ceasar McDowell who says that big data alone isn’t enough to spark innovation - The missing element is the lived experience of people. They ran a "question campaign" where they had a truck driving around asking people “what’s your question about getting around Boston in the future?” Not just what’s my problem, but what’s my vision. They collected 5000 questions in 5 weeks. She sees this as related to an ethics of inclusion - how do we hear voices that might not often be represented in big data? Her question centers on what to do with the data and how to treat it ethically, how to make sure that everyone feels like their voice was heard. Folks ask questions about the project and suggest that they can position themselves as leaders in this space. Perhaps they can develop a methodology that communicates conclusions back to the public and then other organizations can use that as a blueprint.
[IMAGE]
Laura
Laura's master's thesis was creating "data experiences" for people. The "Greenhousing" study looked at chemicals found in renovated public housing. All people in the study have a child with asthma. The chemicals she studied in this research study were “emerging” contaminants - not a lot of research yet on what risks they might pose or what effect they might have on asthma.
She is interested in ways to report the data back to study participants. Laura designed a shirt that all study participants received that represented their data. It's not standard to go back and share results with the people who lived in the houses that were studied. Should you tell people if there isn't something they can do about it? Since you don't have full information about what to do about it is it ethical to share it? They've decided in this instance the answer is yes. Then the question is HOW to share it so people feel empowered by this information. Sharing a number doesn't mean much, since there are no safety thresholds. Should they go back to participants and communicate uncertainty? Danielle mentions the Campaign for Safe Cosmetics and asks if the message really needs to be about uncertainty. Catherine asks if a citizen science model might be useful if it were possible so that the research is framed as shared discovery and mutual learning rather than experts and subjects.
Sarah Wolozin
Sarah works in the OpenDocLab at MIT. She shares the donottrack project, coming in April. It is an attempt to help people understand what is happening to their data. You enter a website and it shows you who gets your data from that single website visit (in the data economy). Another one is called In Limbo, which shows you what becomes of your memory of your data. They are trying to make you feel what it means to have your data tracked and not be private. Another one, digital me, tries to bring you face-to-face with your digital self. Yet another one, { and }, shows you videos based on asking questions of you and your partner.
She is seeing these filmmakers work with algorithms and data and wondering how informed are filmmakers in working with data? What are the protocols around transparency and literacy for these media makers? What level of data literacy do the subjects and audience need? She points to the Tow Center event upcoming (their report on Algorithmic Transparency is great). There is no space for filmmakers to think about this.
The group discusses various options and issues such as visualizing algorithms, debating the merits of transparency versus algorithmic literacy, and pinpointing the goals of understanding algorithms.
Sarah Williams
Sarah teaches a required GIS class in the urban planning group at MIT. She spends two weeks of the class on representing data in an ethical way.
“Power is the ability to do work which is what maps do: They work”
Dennis Wood: Power of Maps
Basically - We believe maps to be true, so they have power. She shows two maps: which map is the right map? (Motor vehicle deaths or motor vehicle death rate) They tell very different stories. Answer: Motor vehicle death rate, in this context, as opposed to deaths in total. Because the rate is normalized to account for population. But the "right" answer is not always clear. Sarah walks the group through distinctions between the different ways of breaking up a distribution of data to make a choropleth map (unclassified, "natural" breaks, quantiles and so on). In the ethics classes they discuss how different breaks create different stories.
Breakouts
We broke out into three groups for discussion: Cases, Activities and Tools.
Cases
Catherine led the cases group. They discussed some of the high-profile cases that Catherine mentioned in her presentation and also mentioned others:
http://abena.crowdmap.com - This is Mohammed's project to advocate to take down the regime in North Sudan during the Arab Spring. The government shutdown the INternet. His group designed the crowd map to get information out about this. The complicated factor was that they had a particular political position (against the regime) and international outlets started to use them as an unbiased source of information. So they found that they needed to move towards a more neutral position in order to be trusted and get information out that wasn't otherwise flowing.
Snowden - We discussed the Snowden case as a good example of data and ethics that highlighted misuse of personal and other data.
Disney Bands - Like a FitBit that tracks you around the themepark. Gives participant certain access to different rides. It's convenient. Disney gets all the data from exactly where you go and when.
Bringing data & ethics to the university - We discussed how you might make these cases relevant to college students on campus by looking at how universities are using data, what they track about their students.
Sudan truck drivers - In Sudan, there is a company like UPS in the states that tracks all of their trucks with GPS and other sensors.
Activities
Dalia led the activities group which was charged with coming up with activities to do in a classroom or workshop setting that would teach ethics and data.
Tell different stories from the same set of data.
Commodification of data, there must be a way to visualize the market value of data to push for the "why should you care?"
Data literacy - learning how to understand data (Correlation vs causation) there are a number of visuals that have.
Ecosystem visualizations - shows aerial views of systems, showing the different layers and tradeoffs.
Creating a game to show how data is power, how is the data used.
It depends on the context and who the audience, if it’s a journalism article vs targeting other audience members.
WTF Visualizations - critiquing existing projects and find what’s wrong with them.
Try to identify people in the other group with open data, eg. me & my shadow. Fill in the blanks / discrepancies of personal data.
Tools
Sarah Williams led the tools group which discussed tools for teaching ethics and data. Their list was as follows:
Mapping or graphing tool (Excel, CartoDB) to do activity comparing
Best way to teach data ethics is through activities
Problems are templatization and out-of-the-box tools too easy to use
Importance of annotating tools and techniques to teach ethical issues
OpenPass tracks your phone for a week, download xml file to a map so you can have conversation around data, privacy.
IFTTT can help you track a hashtag and explore data that you’ve captured yourself. Explore biases, teaches you about noise.
Conclusion
This session surfaced multiple dimensions of data and ethics, including ethics in collection and storage of data, ethics in analyzing and presenting data and ethics in communicating about data to the public. We heard about ethical issues faced by group members related to scientific research, journalism education, public policy-making and film. We shared numerous ideas around cases, activities and tools we can use to teach data ethics in a workshop or classroom setting.
You might also like
- storyMeasuring Impact
- storyBuilding Capacity