Data Portraits

From Berkman Klein Google Summer of Code Wiki
Jump to navigation Jump to search
This page is for an old project that is not be part of Google Summer of Code currently. If you are a student looking for projects to get involved with we suggest you check out the projects linked to from the main page of this wiki.

The long-term goal of the project is to develop a series of visualizations of people based on their digital data. (See http://vivatropolis.com/judith/papers/DataPortraits.Siggraph.Leonardo.pdf )

This project will focus on portraying Twitter users. The goal of the project is to create a visualization that gives the viewer a more intuitive sense of the interests of a Twitter user and their role in the community.


The first stage of the project is data collection:

  • writing the code to download a given user's tweets
  • download the tweets of those they follow
  • summarize who follows their followers
    • how many followers they had
    • how many they were following


The second part of the project is visualization:

  • designing and coding an evocative, legible and visually appealing representation of this data
  • topic-modeling and other NLP analysis of the subject's postings
  • recommend using Processing, but open to other suggestions


Key skills:

  • linguistic analysis
  • graphic design and animation
  • database management


UPDATE

Several of you have asked for more detail about how I envision the project and about what information I am seeking in the application.

Here is one scenario:

One of the things that is interesting about Twitter is the asymmetric social structure. Someone who follows all and only those people who follow them will have an entirely reciprocal network, but most people have a mix of reciprocal relationships and one way ones (both followers whom they do not not follow and followees who do not follow them).

The scale and balance of these relationships is revealing. Someone with far more followers than they follow uses Twitter more in a publishing mode or is a celebrity. Those with predominantly reciprocal relationships may use it more socially. So, the portraits should show this scale and balance.

What the subject says is also interesting. So another challenge is represented compactly the gist of what they say. There are many possible approaches here - from simple representations of typical words to topic modeling or sentiment analysis. Are they someone who posts about politics? TV shows? What they ate for breakfast? The rhythm of their postings is also relevant.

Similarly, what the people the subject follows have to say is also interesting, for it shows what the subject sees when using Twitter. So, along with the number of followees, we want to show something of the stream that the subject sees.

The subject's followers are of interest especially in terms of what they say about the subject's reach. Do they themselves have many followers?

Is the subject one of a few people they follow, or is he or she mostly followed by people who follow lots of others? Is the subject's words retweeted?

Can we find patterns of interaction: retweets, @ mentions? Does the subject use #s  ? Are they for say conferences or the more social topic of the moment ones?

Such a portrait could help people make sense of others they see on twitter - say someone retweets you or a friend of yours is in a discussion with them - who is this person? This portrait could answer that at a glance.

It would make most sense as part of a group of portraits, where you could see how people differed from each other in this depiction.

It could be multi-layered and interactive: something with a compact initial version that you could explore more deeply. For example, it might go from a small picture portraying the highlights of the data, to a big and detailed one that could then be explored to show the the network of connections and their inter-relationships starting with this user.

A version of this could be an exhibition piece - data portrait as art exhibit. I would certainly like it to be a publicly accessible, web application, that people could use to see themselves and others on twitter in a new way.

It should be a model for people to realize the potential of a richly detailed visualization as an "avatar". In particular, given the rush to insist on "real names" in many discussion groups, I think such portraits can make the argument that a pseudonymous identity with an extensive and intuitively depicted data history can be for many purposes a better form of identification.


In your application, be sure to include your relevant background. What coding projects have you done, what what technologies and languages are you familiar with? Give me an idea of what you find interesting about this project. You might want to outline how you would approach implementing it.


Mentor: jdonath@cyber.law.harvard.edu

General Questions: berkmancenterharvard@gmail.com