Media Source Ideology from Twitter User Ideologies

Outline

Summary

The goal of this is to estimate the ideology of various media sources by using Pablo Barberá's 2015 paper, "Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data". That paper estimates the political ideology ("ideal point") of Twitter users by looking at who follows them (for "elites"), and who they follow (for the rest of us).

Barberá has some code for estimating user ideologies around the 2016 U.S. Presidential election. I ran a slightly modified version of that code, which resulted in ideology estimates for around 54 million politically engaged Twitter users.

I picked a random sample of 10,000 users from that set and fetched their full Twitter histories. I filtered that set of tweets to only those containing URLs, and then munged each URL to get to a registered domain. We could now group tweets by tweeted domains.

For each domain, I gathered together the unique users that tweeted that domain. I excluded domains with fewer than 30 unique tweeters from further analysis. For the 647 domains remaining, I took the mean of the estimated political ideologies of the users that tweeted them out.

Here's a graph of the mean audience ideologies of a handful of known news media sites:

In [27]:
_ = domain_ideo_means.sort_values(ascending=False).plot.barh(figsize=(10, 18), legend=False)

It looks pretty good to me. I do a bit more validation work comparing it to other estimators we've played with as well as mediabiasfactcheck.com's bias scores.

Methods

There are a couple ways we can use Barberá.

  • As a good replacement for the "follows Hillary/follows Trump" rule that was used in previous studies. We could find all the users that share "nytimes.com", estimate their ideologies, and aggregate those distributions in some way (mean, median, weighted versions of those using estimates of a user's political interest, which the Barbera model also gives us, etc.). Just the distributions themselves should be interesting.
  • We could also estimate the ideology of the Twitter accounts associated with media sources directly. Barbera does this to some degree with NYT, Fox, etc. This doesn't scale well (we'd have to have a map from domain to Twitter accounts, and it assumes Twitter accounts for all media sources), but we don't stray nearly as far from the published method.
  • A third option is adapting Barbera's method to estimate the ideology of shared domains directly. I haven't thought through all the specifics yet, but it should be fairly clean to substitute "following an 'elite' account" with "sharing an 'elite' domain" and running the same code. "Elite" defined below.

Ideology from Twitter Account Shares

Let's start off by using shares by accounts with estimated ideologies. To do that, we'll first sample 10,000 Twitter users from our set and collect their Twitter histories.

Twitter User Ideologies

In [2]:
import pandas as pd
est_ideo = pd.read_csv('ideology-estimates-20160101.csv', index_col=0)
est_ideo.describe()
/home/jclark/miniconda3/envs/mediacloud/lib/python3.6/site-packages/numpy/lib/arraysetops.py:472: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  mask |= (ar1 == a)
Out[2]:
theta pol.follow
count 5.411329e+07 5.411329e+07
mean 1.829597e-15 6.512130e+00
std 1.000000e+00 7.456007e+00
min -2.440438e+00 2.000000e+00
25% -5.872451e-01 3.000000e+00
50% -1.698174e-01 5.000000e+00
75% 3.300349e-01 7.000000e+00
max 6.783002e+00 5.980000e+02

Let's see what the distributions look like.

In [3]:
%matplotlib inline
import seaborn as sb
sb.set()
_ = sb.distplot(est_ideo['theta'], kde=False)

Observations

  • We're looking at ideology estimates for a little over 54 million Twitter user accounts.
  • The mean is almost exactly zero and the standard deviation is 1. The algo is designed for that to be the case.
  • The most common ideology is a little left of center.
  • There's a long right tail and little left tail.
  • It's not a really smooth distribution. What's that ledge on the left of the peak? Is it just an artifact of the binning?
  • The range looks unbounded, but given a stddev of 1, it should keep things fairly close.
In [4]:
g = sb.distplot(est_ideo['pol.follow'], kde=False)
_ = g.set(yscale="log", xscale='log')

Observations

  • This is the number of political "elite" accounts followed by each account
  • I log-logged this because it follows a power law, as one might expect
  • There's a mean of 6 followed elites per account
  • The power law falls apart once you approach the total number of political elites. Makes sense.

Now we'll actually sample 10k accounts.

In [5]:
# I won't run this again to keep from overwriting my original sample.
#with open('sampled_account_ids_10k.txt', 'w') as f:
#    for aid in est_ideo.sample(10000)['id'].values:
#        f.write(str(aid) + "\n")

I've run a few scripts outside of this notebook:

  • Collected all 10k users' tweet histories
  • Filtered all collected tweets to just those that contain URLs with grep '"urls": \[{"'
  • Unshortened, normalized, and extracted domains for all the URLs in tweets

I'll read all of those tweets into a data structure that maps each domain to the ideologies of the unique users that shared that domain. I'll only consider domains that were shared by more than 30 unique users.

In [6]:
import json, collections

MAX_COUNT = None
MIN_USERS = 30
count = 0
domain_to_users = collections.defaultdict(set)
domain_to_ideo = collections.defaultdict(list)
with open('10k_sampled_accounts_tweets_with_augmented_urls.txt') as f:
    for line in f:
        if MAX_COUNT and count >= MAX_COUNT: break
        tweet = json.loads(line)
        user_id = int(tweet['user']['id'])
        try:
            user_ideo = est_ideo.at[user_id, 'theta']
        except KeyError:
            continue
        for url in tweet['urls']:
            domain = url['domain']
            if user_id in domain_to_users[domain]: continue
            domain_to_users[domain].add(user_id)
            domain_to_ideo[domain].append(user_ideo)
        count += 1

dom_ideo = pd.DataFrame.from_dict({k:v for k,v in domain_to_ideo.items() if len(v) >= MIN_USERS}, orient='index').T

Let's see what that data looks like.

In [7]:
display(dom_ideo.shape)
dom_ideo.count().sort_values(ascending=False).head(20)
(2520, 647)
Out[7]:
twitter.com           2520
youtube.com           2032
instagram.com         1500
facebook.com          1296
nytimes.com            863
cnn.com                851
vine.co                645
google.com             643
washingtonpost.com     636
huffingtonpost.com     615
blogspot.com           557
theguardian.com        549
wordpress.com          543
apple.com              527
bbc.co.uk              512
yahoo.com              492
wsj.com                490
tumblr.com             479
twimg.com              474
time.com               455
dtype: int64

Audience Ideology of Shared Domains

OK, so we've got 647 domains. I should remember that we can get ideology estimates for more domains by increasing the number of users that we sample. We're sampling 10k of 54M right now.

Twitter has the most unique user shares at 2,520, which makes sense. It drops off pretty quickly, as we'd expect with a power law.

Let's plot all the distributions on top of each other to see what we're working with.

In [8]:
_ = dom_ideo.plot(kind='kde', figsize=(8,8), legend=False, alpha=0.1, color='#7777dd')

Observations

  • We see the biggest peak in roughly the same place as the overall distribution of user ideologies, but it's spread a bit more, perhaps even bimodal.
  • We've clearly got two primary types of distributions here - one with peaks near the typical user, and one with peaks on the far right.
  • Again, very little center right.

We've got a distribution of audience ideologies for every domain, but now we should look at ways of reducing those distributions to single scores. The obvious choice is just taking the mean, so let's just look at that for now.

Audience Ideology of Known Media Sites

Let's pick a subset of sites that we know a little bit about and see what their mean scores look like.

In [9]:
news_media_domains = [
    'alarabiya.net', 'aljazeera.com', 'americanthinker.com', 'bbc.com',
    'bbc.co.uk', 'bloomberg.com', 'bostonglobe.com', 'breitbart.com',
    'buzzfeed.com', 'cbc.ca', 'cbsnews.com', 'chicagotribune.com', 'cnbc.com',
    'cnn.com', 'csmonitor.com', 'dailycaller.com', 'dailykos.com',
    'dailymail.co.uk', 'economist.com', 'forbes.com', 'foreignpolicy.com',
    'fortune.com', 'foxnews.com', 'haaretz.com', 'hindustantimes.com',
    'huffingtonpost.com', 'huffpost.com', 'independent.co.uk', 'infowars.com',
    'latimes.com', 'miamiherald.com', 'motherjones.com', 'msnbc.com',
    'nationalreview.com', 'nbcnews.com', 'newsweek.com', 'newyorker.com',
    'npr.org', 'nydailynews.com', 'nypost.com', 'nytimes.com', 'pbs.org',
    'politico.com', 'propublica.org', 'realclearpolitics.com','reuters.com',
    'rollcall.com', 'rt.com', 'salon.com', 'sky.com', 'slate.com',
    'sputniknews.com', 'theatlantic.com', 'theguardian.com', 'thehill.com',
    'time.com', 'usatoday.com', 'vox.com', 'washingtonpost.com',
    'washingtontimes.com', 'weeklystandard.com', 'westernjournal.com', 'wsj.com',
    'zerohedge.com',
]
non_news_domains = [
    'aclu.org', 'change.org', 'cosmopolitan.com', 'facebook.com', 'google.com',
    'harvard.edu', 'hbr.org', 'mit.edu', 'patreon.com', 'politifact.com',
    'reddit.com', 'reuters.com', 'twitter.com', 'wikileaks.org', 'youtube.com',
]
domains = news_media_domains + non_news_domains
domain_ideo_means = dom_ideo.loc[:,news_media_domains].mean()
_ = domain_ideo_means.sort_values(ascending=False).plot.barh(figsize=(10, 18), legend=False)

Observations

  • This looks pretty good to me. Where do others think this breaks down?
  • You can see the asymmetry. There's a slow movement from left to right, and then a bunch of sites that are just really far right.

The means look so convincing, it's probably worth reminding ourselves how wide the audience distributions are.

In [10]:
_ = dom_ideo.loc[:,news_media_domains].T.reindex(domain_ideo_means.sort_values(ascending=False).index) \
    .T.plot.box(figsize=(10, 18), vert=False)

I made a couple more graphs of the same stuff that are pretty and somewhat elucidating, so here they are.

In [12]:
import matplotlib.colors as mcol
pol_cm = mcol.LinearSegmentedColormap.from_list("Pol",["#3771f3", "#b147cc", "#d62222"])
_ = dom_ideo.T.loc[news_media_domains,].reindex(domain_ideo_means.sort_values().index)\
    .T.plot(subplots=True, kind='kde', layout=(10,8), figsize=(16,16), sharex=True, sharey=True, colormap=pol_cm)
In [13]:
_ = dom_ideo.T.loc[news_media_domains,].reindex(domain_ideo_means.sort_values().index)\
    .T.plot(kind='kde', figsize=(8,8), legend=False, colormap=pol_cm, alpha=0.5)
In [14]:
import joypy
_ = joypy.joyplot(dom_ideo.T.loc[news_media_domains,].reindex(domain_ideo_means.sort_values().index).T,
                 overlap=0.6, figsize=(8, 12), alpha=0.95, x_range=(-2.3, 5), linewidth=0.5, bw_method=0.15,
                  colormap=pol_cm)

Comparing to Other Ideology Estimators

Let's compare to the 2016 election Trump/HRC retweet ideology estimation, and the ideology estimator from Congressional tweets I was working on.

In [15]:
import tldextract
retweeter_ideo = pd.read_csv('election_retweeter_polarization_media_scores.csv')
retweeter_ideo['domain'] = retweeter_ideo['url'].apply(lambda u: tldextract.extract(u).registered_domain)
retweeter_ideo.set_index('domain', inplace=True)
# Remove duplicates. See discussion in "Media Source Partisanship as Measured by Congressional Tweets"
retweeter_ideo = retweeter_ideo[~retweeter_ideo.index.duplicated(keep=False)].dropna()
retweeter_ideo.head(10)
Out[15]:
media_id name url score partition
domain
nytimes.com 1 New York Times http://nytimes.com -0.471190 2
washingtonpost.com 2 Washington Post http://washingtonpost.com -0.452138 2
csmonitor.com 3 Christian Science Monitor http://csmonitor.com -0.482005 2
latimes.com 6 LA Times http://www.latimes.com/ -0.323638 2
nypost.com 7 NY Post http://www.nypost.com/ 0.805616 5
nydailynews.com 8 Daily News http://www.nydailynews.com/ -0.586536 2
chicagotribune.com 9 Chicago Tribune http://www.chicagotribune.com/ -0.274454 2
chron.com 10 houstonchronicle http://www.chron.com/ -0.202598 2
dallasnews.com 12 Dallas Morning News http://www.dallasnews.com/ 0.062767 3
newsday.com 13 Newsday http://www.newsday.com/ -0.061521 3
In [16]:
audience_ideo = dom_ideo.mean().T
audience_ideo.name = 'ideo_by_mean_audience_ideo'
congress_tweet_ideo = pd.read_csv('media_partisanship_from_congressional_tweets.csv', index_col=0)
joined_ideo_est = retweeter_ideo.join(congress_tweet_ideo).join(audience_ideo).dropna()
joined_ideo_est = joined_ideo_est.rename({
    'score': 'ideo_by_retweet',
    'congress_dwnom': 'ideo_by_congress_tweet',
    'score_by_followers': 'ideo_by_mean_audience_ideo'}, axis='columns')
joined_ideo_est.head()
Out[16]:
media_id name url ideo_by_retweet partition ideo_by_congress_tweet num_sharers num_shares congress_party_count ideo_by_mean_audience_ideo
domain
nytimes.com 1 New York Times http://nytimes.com -0.471190 2 -0.280897 461.0 13803.0 -0.036876 -0.123368
washingtonpost.com 2 Washington Post http://washingtonpost.com -0.452138 2 -0.084014 500.0 15994.0 0.036000 -0.057838
csmonitor.com 3 Christian Science Monitor http://csmonitor.com -0.482005 2 0.030621 66.0 121.0 -0.090909 0.162700
latimes.com 6 LA Times http://www.latimes.com/ -0.323638 2 -0.288183 294.0 2268.0 -0.176871 0.020318
nypost.com 7 NY Post http://www.nypost.com/ 0.805616 5 0.221191 103.0 290.0 0.436893 0.668709
In [17]:
sb.set(style="ticks", color_codes=True)
_ = sb.pairplot(joined_ideo_est, vars=['ideo_by_retweet', 'ideo_by_congress_tweet', 'ideo_by_mean_audience_ideo'])
In [18]:
print("Num sites:", joined_ideo_est.shape[0])
joined_ideo_est.loc[:,['ideo_by_retweet', 'ideo_by_congress_tweet', 'ideo_by_mean_audience_ideo']].corr()
Num sites: 285
Out[18]:
ideo_by_retweet ideo_by_congress_tweet ideo_by_mean_audience_ideo
ideo_by_retweet 1.000000 0.637661 0.708548
ideo_by_congress_tweet 0.637661 1.000000 0.744543
ideo_by_mean_audience_ideo 0.708548 0.744543 1.000000

Observations

  • The distributions looks pretty different from one another, though they all have a right tail.
  • They all tend to agree more with each other on the left than on the right.
  • In all the plots, you need a squint a little, but you can see a big center-left cluster and a smaller right cluster. They all seem to agree on that.

Comparing to Media Bias/Fact Check

I don't like the fact that I'm not validating against external datasets. I've requested access to the Facebook estimations from 2015, but until that comes through, let's look at Media Bias Fact Check data. I couldn't find an official source for it, but I did find someone that scraped the site and put up their ratings here. Let's add that into the mix. The data I have give text tags for each domain ("left", "right", etc.), so I turned that into points on a scale from -1 to 1.

In [19]:
mbfc = pd.read_csv('domain_information.csv', index_col=1)
mbfc_ideo = mbfc['mediabiasfactcheck'].dropna().map({
    'left': -1.0,
    'left_center': -0.5,
    'least_biased': 0.0,
    'right_center': 0.5,
    'right': 1.0}).dropna().sort_values()
In [20]:
with_mbfc_ideo_est = joined_ideo_est.join(mbfc_ideo)\
    .rename({'mediabiasfactcheck': 'ideo_by_mbfc'}, axis='columns').dropna()
_ = sb.pairplot(with_mbfc_ideo_est,
    vars=['ideo_by_retweet', 'ideo_by_congress_tweet', 'ideo_by_mean_audience_ideo', 'ideo_by_mbfc'])
sb.set()
In [21]:
print("Num sites:", with_mbfc_ideo_est.shape[0])
with_mbfc_ideo_est\
    .loc[:,['ideo_by_retweet', 'ideo_by_congress_tweet', 'ideo_by_mean_audience_ideo', 'ideo_by_mbfc']].corr()
Num sites: 167
Out[21]:
ideo_by_retweet ideo_by_congress_tweet ideo_by_mean_audience_ideo ideo_by_mbfc
ideo_by_retweet 1.000000 0.775512 0.806664 0.791554
ideo_by_congress_tweet 0.775512 1.000000 0.830251 0.801584
ideo_by_mean_audience_ideo 0.806664 0.830251 1.000000 0.856908
ideo_by_mbfc 0.791554 0.801584 0.856908 1.000000

Observations

  • Dropping the sites that weren't in MBFC (115 sites) upped the correlations between everything.
  • Everything shows some agreement with MBFC.
  • Mean audience ideology looks the best relative to MBFC, though not great for the far right. Retweet looks better there.
  • Looking at the distribution for MBFC, it follows a pretty similar pattern to the other distributions, but we can't trust their process for selecting domains to rate.

Comparing to Facebook Ideology Estimations

In [39]:
facebook_ideo = pd.read_csv('facebook_ideology_estimates.csv', index_col=0,
                           converters={'domain': lambda d: tldextract.extract(d).registered_domain})
with_facebook_ideo_est = joined_ideo_est.join(facebook_ideo)\
    .rename({'avg_align': 'ideo_by_facebook'}, axis='columns').dropna()
with_facebook_ideo_est = with_facebook_ideo_est[~with_facebook_ideo_est.index.duplicated(keep=False)].dropna()
print("Num sites:", with_facebook_ideo_est.shape[0])
with_facebook_ideo_est\
    .loc[:,['ideo_by_retweet', 'ideo_by_congress_tweet', 'ideo_by_mean_audience_ideo', 'ideo_by_facebook']].corr()
Num sites: 140
Out[39]:
ideo_by_retweet ideo_by_congress_tweet ideo_by_mean_audience_ideo ideo_by_facebook
ideo_by_retweet 1.000000 0.833411 0.854780 0.897768
ideo_by_congress_tweet 0.833411 1.000000 0.841798 0.908121
ideo_by_mean_audience_ideo 0.854780 0.841798 1.000000 0.939189
ideo_by_facebook 0.897768 0.908121 0.939189 1.000000
In [40]:
sb.set(style='ticks')
_ = sb.pairplot(with_facebook_ideo_est,
    vars=['ideo_by_retweet', 'ideo_by_congress_tweet', 'ideo_by_mean_audience_ideo', 'ideo_by_facebook'])
sb.set()

Observations

  • Facebook and Mean Audience Ideology have the highest correlation.

Let's dig into the outliers.

In [41]:
import plotly.offline as plotly
import plotly.graph_objs as go

plotly.init_notebook_mode()

scatter1 = go.Scattergl(
    y=joined_ideo_est['ideo_by_retweet'],
    x=joined_ideo_est['ideo_by_mean_audience_ideo'],
    mode='markers',
    text=joined_ideo_est.index,
    marker=dict(
        #color=joined_scores.index.isin(news_media_domains) * 1
    )
)

layout1 = go.Layout(
    title ='Comparison of Ideology Score Metrics',
    hovermode = 'closest',
    xaxis = dict(title = 'Ideology by Mean Audience Ideology'),
    yaxis = dict(title = 'Ideology by Trump/HRC Retweet'),
)

scatter2 = go.Scattergl(
    y=joined_ideo_est['ideo_by_congress_tweet'],
    x=joined_ideo_est['ideo_by_mean_audience_ideo'],
    mode='markers',
    text=joined_ideo_est.index,
    marker=dict(
        #color=joined_scores.index.isin(news_media_domains) * 1
    )
)

layout2 = go.Layout(
    title = 'Comparison of Ideology Score Metrics',
    hovermode = 'closest',
    xaxis = dict(title = 'Ideology by Mean Audience Ideology'),
    yaxis = dict(title = 'Ideology by Congressional Tweets'),
)

scatter3 = go.Scattergl(
    y=with_mbfc_ideo_est['ideo_by_mbfc'],
    x=with_mbfc_ideo_est['ideo_by_mean_audience_ideo'],
    mode='markers',
    text=with_mbfc_ideo_est.index,
    marker=dict(
        #color=joined_scores.index.isin(news_media_domains) * 1
    )
)

layout3 = go.Layout(
    title = 'Comparison of Ideology Score Metrics',
    hovermode = 'closest',
    xaxis = dict(title = 'Ideology by Mean Audience Ideology'),
    yaxis = dict(title = 'Ideology by Media Bias/Fact Check'),
)

scatter4 = go.Scattergl(
    y=with_facebook_ideo_est['ideo_by_facebook'],
    x=with_facebook_ideo_est['ideo_by_mean_audience_ideo'],
    mode='markers',
    text=with_facebook_ideo_est.index,
    marker=dict(
        #color=joined_scores.index.isin(news_media_domains) * 1
    )
)

layout4 = go.Layout(
    title = 'Comparison of Ideology Score Metrics',
    hovermode = 'closest',
    xaxis = dict(title = 'Ideology by Mean Audience Ideology'),
    yaxis = dict(title = 'Ideology by Facebook'),
)
In [23]:
plotly.iplot(go.Figure(data=[scatter1], layout=layout1))

Observations

  • There are different scales because ideology doesn't fall in a fixed range.
  • Correlation looks better on the left than it does on the right.
  • The retweet metric pins a bunch of sites to the far right.
  • The most disagreed upon are often general platform sites (wixsite.com, rebelmouse.com, dropbox.com, gofundme.com, etc.) that retweet sticks to the far right.
  • sun-sentinel.com, orlandosentinel.com, sfchronicle.com, hindustantimes.com, torontosun.com are media outlets that don't agree between methods.
In [24]:
plotly.iplot(go.Figure(data=[scatter2], layout=layout2))

Observations

  • liveleak.com, imgur.com, wikileaks.org, donaldjtrump.com, sun-sentinel.com, torontosun.com, thefreethoughtproject.com are outliers.
In [25]:
plotly.iplot(go.Figure(data=[scatter3], layout=layout3))

Observations

  • Overall, it looks pretty good
  • observer.com, torontosun.com, timesofisrael.com, mediaite.com are the big outliers.
In [42]:
plotly.iplot(go.Figure(data=[scatter4], layout=layout4))

Observations

  • This looks really clean.
  • Audience Ideology spreads out the far right a bit.
  • We see a lot of the same outliers across each pairwise scatter plot. Something special about those?

Future Work

  • I'm being really sloppy about temporal stuff here. I've estimated user ideologies using follower lists from July 2018, but I've used full Twitter sharing histories dating back a bunch of years. Users and domains can shift ideologies over those scales, so I should do this better.
  • I need to look at the sensitivity of ideological estimates to altering the set of "elite" accounts. My hunch is that is just makes it harder to find Twitter accounts that follow enough elites, but doesn't actually change estimates all that much.
  • I haven't tried either of the other two methods I listed up in the Methods section. This looks pretty good though.

Odds and Ends

This might be an interesting complement to the other work I was doing around Congressional sharing on Twitter and DW_NOMINATE to estimate source ideology. Say we bucket Congressional Twitter users and regular Twitter users by ideology - how does the sharing of given domains vary between Congress and regular folks within those buckets? Which domain-sharing patterns look the same between Congress and citizens and which look different? Are those patterns looking more alike or different over time? Across the whole ideology spectrum? Who's shifting?

In [26]:
scatter3d = go.Scatter3d(
    x=joined_ideo_est['ideo_by_congress_tweet'],
    y=joined_ideo_est['ideo_by_mean_audience_ideo'],
    z=joined_ideo_est['ideo_by_retweet'],
    mode='markers',
    text=joined_ideo_est.index,
    marker=dict(
        size=6
    )
)

layout3d = go.Layout(
    title= 'Comparison of Ideology Score Metrics',
)

plotly.iplot(go.Figure(data=[scatter3d], layout=layout3d))

Output from Running Barbera's 2016 Election Scripts

> # who is on the extremes
> head(users[order(users$phi1),])
            twitter                  name gender    party       phi1       phi2        phi3
547 senkamalaharris      Kamala D. Harris      F Democrat -0.9964626 -0.4306486 -1.61173590
308   repjoekennedy Joseph P. Kennedy III      M Democrat -0.9884050  0.2418400 -0.21613413
363 repmaxinewaters         Maxine Waters      F Democrat -0.9550397 -0.4858503 -1.80167847
160   repadamschiff        Adam B. Schiff      M Democrat -0.9473548 -0.1116719 -1.42363724
315    repjohnlewis            John Lewis      M Democrat -0.9416533 -0.4777571 -1.64266921
535    senfeinstein      Dianne Feinstein      F Democrat -0.9304100  0.3827524  0.00955195

> tail(users[order(users$phi1),])
            twitter            name gender      party     phi1       phi2        phi3
601  warrendavidson Warren Davidson      M Republican 2.317007  0.6290465 -0.56455723
102     judgecarter  John R. Carter      M Republican 2.317354 -1.0996292  0.02720010
326  repkenmarchant  Kenny Marchant      M Republican 2.319707 -1.1509091  0.17487176
182   repbillflores     Bill Flores      M Republican 2.322245 -1.1054425  0.06811868
348 replouiegohmert   Louie Gohmert      M Republican 2.390454  2.6556838 -1.65378962
362    repmattgaetz      Matt Gaetz      M Republican 2.455506  3.3088650 -2.02016381

> # primary candidates
> users <- users[order(users$phi1),]
> users[users$type=="Primary Candidate",c("screen_name", "phi1")]
        screen_name       phi1
11    BernieSanders -0.6135061
79   HillaryClinton -0.5506683
121   MartinOMalley -0.4106683
157 realDonaldTrump  0.1744603
111   LincolnChafee  0.1777569
154        RandPaul  0.2044847
101      JohnKasich  0.3302630
115      marcorubio  0.3618459
96       JimWebbUSA  0.6485842
89          JebBush  0.6557951
67      GovChristie  0.8206196
71  GovMikeHuckabee  0.8545968
73       GrahamBlog  1.0938885
580         tedcruz  1.1070568
25     CarlyFiorina  1.1892254
68   GovernorPataki  1.2859452
156   RealBenCarson  1.3563719
70      gov_gilmore  1.4005577
479    RickSantorum  1.5283071
491     ScottWalker  1.6191919
69    GovernorPerry  1.6686204
17      BobbyJindal  1.7090644
>
> #         screen_name        phi1
> # 548      SenSanders -0.92013210
> # 129   MartinOMalley -0.90287394
> # 117   LincolnChafee -0.73712461
> # 83   HillaryClinton -0.60077731
> # 102      JimWebbUSA -0.04502052
> # 70      gov_gilmore  0.40719497
> # 71      GovChristie  0.59047439
> # 76       GrahamBlog  0.69157606
> # 108      JohnKasich  0.77414571
> # 95          JebBush  0.79967822
> # 72   GovernorPataki  0.82375241
> # 169 realDonaldTrump  0.88911528
> # 472    RickSantorum  1.16065941
> # 123      marcorubio  1.23982284
> # 25     CarlyFiorina  1.24576870
> # 74  GovMikeHuckabee  1.29905012
> # 17      BobbyJindal  1.32647736
> # 73    GovernorPerry  1.34255394
> # 484     ScottWalker  1.43519830
> # 164        RandPaul  1.44800182
> # 569         tedcruz  1.68322519
> # 168   RealBenCarson  1.70647329
>
> # others
> users[users$type=="Media Outlets",c("screen_name", "phi1")]
       screen_name        phi1
573  StephenAtHome -0.69375065
130    MotherJones -0.66313351
583   TheDailyShow -0.65338527
44        dailykos -0.59726026
585  thinkprogress -0.58887694
131          MSNBC -0.48343146
56          edshow -0.47690322
570          Slate -0.43370275
136      NewYorker -0.41769496
140        nprnews -0.40501285
143        nytimes -0.33360566
4             ajam -0.31918381
83     HuffPostPol -0.31757079
76      GuardianUS -0.30789882
134       NewsHour -0.30566222
602 washingtonpost -0.27665626
33             CNN -0.26960885
1              ABC -0.20768687
9         BBCWorld -0.20749533
133        NBCNews -0.20449353
151       politico -0.18948322
28         CBSNews -0.05804963
591       USATODAY  0.01182303
55          EconUS  0.07932249
23     BuzzFeedPol  0.07964434
604            WSJ  0.09065961
605      YahooNews  0.25306298
15       Bloomberg  0.38091176
58         FoxNews  0.83977682
54   DRUDGE_REPORT  1.49262192
21   BreitbartNews  1.60689951
582       theblaze  1.69644715
487   rushlimbaugh  2.01593693

> users[users$type=="Journalists",c("screen_name", "phi1")]
        screen_name       phi1
114          maddow -0.6821680
126    MHarrisPerry -0.6371607
6    andersoncooper -0.4633423
75  GStephanopoulos -0.2108086
125      megynkelly  0.8412498
492     seanhannity  0.8799218
7        AnnCoulter  1.1974140
64        glennbeck  1.4937796
145   oreillyfactor  1.7845476
110        limbaugh  2.0388439

> users[users$type=="Other Politicians",c("screen_name", "phi1")]
       screen_name       phi1
98        JoeBiden -0.7130537
5           algore -0.5471131
584   TheDemocrats -0.4957447
13     BillClinton -0.4820982
495     SenateDems -0.4268991
80  HouseDemocrats -0.3663539
48            dccc -0.3168939
152          POTUS  0.2617762
60    GeorgeHWBush  0.5469928
65             GOP  0.9547741
81        HouseGOP  1.0075366
135   newtgingrich  1.2309574
490  SarahPalinUSA  1.2785546
105       KarlRove  1.4705073