In [1]:
import pickle
import tldextract, joypy
import pandas as pd
import seaborn as sns
%matplotlib inline
sns.set()

NewsGuard Data

Let's take a look at what NewsGuard has for data. A CSV of NewsGuard data joined with Facebook ideology scores and Media Cloud election retweeter scores is available here.

Getting Data

They didn't have any policies I could find against scraping data, so I wrote a quick scraper that tried to be nice about it. I couldn't find a way to get them to list the sites they had rated, so I put together a list of ~22,700 unique domains we had seen in some of our research. I then queried their system for each of those domains and stored all the data.

Looking at the Data

Their data comes back as a fairly detailed, multi-level JSON document. Here's an example document:

In [2]:
ratings = pickle.load(open('scrape_news_domains_20180827T14:08:00Z-0400.pkl', 'rb'))
ratings['nytimes.com']
Out[2]:
{'active': True,
 'byGordon': True,
 'bySteve': True,
 'byline': {'bio': 'https://www.newsguardtech.com/about/team?p=161',
  'id': '40d750d0-ef6b-40e3-87ac-b137c362390f',
  'name': 'Jim Warren',
  'type': 'AUTHOR'},
 'contributor': None,
 'country': 'US',
 'createdBy': None,
 'createdDate': None,
 'criteria': [{'body': 'Yes', 'title': 'falseContent'},
  {'body': 'Yes', 'title': 'basicStandards'},
  {'body': 'Yes', 'title': 'newsOpinion'},
  {'body': 'Yes', 'title': 'deceptiveHeadlines'},
  {'body': 'Yes', 'title': 'accountabilityPractices'},
  {'body': 'Yes', 'title': 'ownership'},
  {'body': 'Yes', 'title': 'labelsAdvertising'},
  {'body': 'Yes', 'title': 'management'},
  {'body': 'Yes', 'title': 'contentCreators'}],
 'editor': {'bio': 'https://www.newsguardtech.com/about/team?p=150',
  'id': '317d8da3-698d-41ff-b093-6a25d90a827b',
  'name': 'Eric Effron',
  'type': 'AUTHOR'},
 'editors': [{'bio': 'https://www.newsguardtech.com/about/team?p=99',
   'id': '8d37d711-1d6b-448d-bead-fb7dd3fd1af2',
   'name': 'Gordon Crovitz',
   'type': 'AUTHOR'},
  {'bio': 'https://www.newsguardtech.com/about/team?p=100',
   'id': '24ea2478-63fb-4325-8f96-1956d03e413a',
   'name': 'Steven Brill',
   'type': 'AUTHOR'},
  {'bio': 'https://www.newsguardtech.com/about/team?p=156',
   'id': '9b98b83e-95fb-49b8-9e82-a8366dae70c8',
   'name': 'Kendrick McDonald',
   'type': 'AUTHOR'},
  {'bio': 'https://www.newsguardtech.com/about/team?p=150',
   'id': '317d8da3-698d-41ff-b093-6a25d90a827b',
   'name': 'Eric Effron',
   'type': 'AUTHOR'}],
 'id': 'd110207e-069a-49cf-a2d9-ddf34121e4d4',
 'identifier': 'nytimes.com',
 'identifierAlt': None,
 'metadata': [{'body': 'Yes', 'title': 'wikipedia'},
  {'body': 'Original Reporting,Wire Services', 'title': 'type'},
  {'body': 'Print Publication', 'title': 'medium'},
  {'body': 'Newspaper', 'title': 'printType'},
  {'body': 'National,Regional,International,Local', 'title': 'coverage'},
  {'body': 'New York', 'title': 'dma'},
  {'body': 'Yes', 'title': 'paywall'},
  {'body': 'No', 'title': 'opinion'},
  {'body': 'N/A', 'title': 'orientation'},
  {'body': 'New York Times Company', 'title': 'owner'},
  {'body': 'Public Company', 'title': 'ownertype'},
  {'body': 'facebook.com/nytimes', 'title': 'facebook'},
  {'body': '@nytimes', 'title': 'twitter'},
  {'body': 'youtube.com/nytimes', 'title': 'youtube'},
  {'body': '@nytimes', 'title': 'instagram'},
  {'body': 'linkedin.com/company/the-new-york-times/', 'title': 'linkedin'},
  {'body': 'pinterest.com/nytimes/', 'title': 'pinterest'}],
 'network': 'WEB',
 'profileId': '60e528f5-284a-4883-85de-46af0feb215f',
 'rank': 'T',
 'reviewer': {'bio': 'https://www.newsguardtech.com/about/team?p=156',
  'id': '9b98b83e-95fb-49b8-9e82-a8366dae70c8',
  'name': 'Kendrick McDonald',
  'type': 'AUTHOR'},
 'score': 100.0,
 'sources': [{'key': 'sources',
   'sources': ['Profile of paper (written by Jim Warren), including digital expansion and history: <a href="https://www.vanityfair.com/news/2017/07/new-york-times-washington-post-donald-trum" target="_blank">https://www.vanityfair.com/news/2017/07/new-york-times-washington-post-donald-trum</a>',
    '2018 first quarter results and expansion into television production: <a href="https://www.poynter.org/news/new-york-times-co-dipping-toe-television-production" target="_blank">https://www.poynter.org/news/new-york-times-co-dipping-toe-television-production</a>',
    '“4.1 Miles” documentary wins Peabody Award <a href="https://www.nytco.com/the-new-york-times-op-doc-4-1-miles-wins-2016-peabody-award/\xa0" target="_blank">https://www.nytco.com/the-new-york-times-op-doc-4-1-miles-wins-2016-peabody-award/\xa0</a>',
    'Obituaries of women previously not run: <a href="https://www.nytimes.com/interactive/2018/obituaries/overlooked.html" target="_blank">https://www.nytimes.com/interactive/2018/obituaries/overlooked.html</a>',
    'Slaughter of citizens in the Philippines: <a href="https://www.nytimes.com/interactive/2016/12/07/world/asia/rodrigo-duterte-philippines-drugs-killings.html" target="_blank">https://www.nytimes.com/interactive/2016/12/07/world/asia/rodrigo-duterte-philippines-drugs-killings.html</a>',
    'WAlk through African American Museum of History and Culture: <a href="https://www.nytimes.com/interactive/2016/09/15/arts/design/national-museum-of-african-american-history-and-culture.html" target="_blank">https://www.nytimes.com/interactive/2016/09/15/arts/design/national-museum-of-african-american-history-and-culture.html</a>',
    'Memorial Day weekend of carnage in Chicago: <a href="https://www.nytimes.com/interactive/2016/06/04/us/chicago-shootings.html" target="_blank">https://www.nytimes.com/interactive/2016/06/04/us/chicago-shootings.html</a>',
    '52 Places to visit in the United States: <a href="https://www.nytimes.com/interactive/2016/01/07/travel/places-to-visit.html" target="_blank">https://www.nytimes.com/interactive/2016/01/07/travel/places-to-visit.html</a>',
    'A reporter’s trek through Syria: <a href="https://www.nytimes.com/interactive/2016/06/10/world/middleeast/syria-road-trip.html" target="_blank">https://www.nytimes.com/interactive/2016/06/10/world/middleeast/syria-road-trip.html</a>',
    'Spread of deserts in China due to climate change',
    'Graphic on contradictory statements on President Trump’s relationship with Stormy Daniels: <a href="https://www.nytimes.com/interactive/2018/05/03/us/politics/giuliani-stormy-trump-statements.html" target="_blank">https://www.nytimes.com/interactive/2018/05/03/us/politics/giuliani-stormy-trump-statements.html</a>',
    'Using world “lie” in Trump coverage: <a href="https://www.nytimes.com/2017/01/25/business/media/donald-trump-lie-media.html" target="_blank">https://www.nytimes.com/2017/01/25/business/media/donald-trump-lie-media.html</a>',
    'Decision to eliminate Public Editor position: <a href="https://www.politico.com/story/2017/05/31/new-york-times-public-editor-239000" target="_blank">https://www.politico.com/story/2017/05/31/new-york-times-public-editor-239000</a>',
    'Hillary Clinton email account and server: <a href="https://www.nytimes.com/2015/03/03/us/politics/hillary-clintons-use-of-private-email-at-state-department-raises-flags.html" target="_blank">https://www.nytimes.com/2015/03/03/us/politics/hillary-clintons-use-of-private-email-at-state-department-raises-flags.html</a>',
    'Baquet on paper’s liberal leaningstes',
    '<a href="https://www.politico.com/media/newsletters/morning-media/2018/03/23/bannon-meets-the-global-elite-zucker-rips-state-run-fox-bolton-joining-white-house-facebook-vs-google-001475" target="_blank">https://www.politico.com/media/newsletters/morning-media/2018/03/23/bannon-meets-the-global-elite-zucker-rips-state-run-fox-bolton-joining-white-house-facebook-vs-google-001475</a>',
    'On resolving controversy about reporter Ali Watkins: <a href="https://www.nytimes.com/2018/07/03/business/media/ali-watkins-times-reporter-memo.html" target="_blank">https://www.nytimes.com/2018/07/03/business/media/ali-watkins-times-reporter-memo.html</a>',
    'Story on the Watkins controversy: <a href="https://www.nytimes.com/2018/06/24/business/media/james-wolfe-ali-watkins-leaks-reporter.html" target="_blank">https://www.nytimes.com/2018/06/24/business/media/james-wolfe-ali-watkins-leaks-reporter.html</a>',
    'Interview with Joseph Kahn on alleged bias: <a href="https://www.vanityfair.com/news/2018/04/a-woke-civil-war-is-simmering-at-the-new-york-times" target="_blank">https://www.vanityfair.com/news/2018/04/a-woke-civil-war-is-simmering-at-the-new-york-times</a>',
    '<a href="https://www.nytco.com/introducing-the-reader-center/" target="_blank">https://www.nytco.com/introducing-the-reader-center/</a>',
    '<a href="http://s1.q4cdn.com/156149269/files/doc_financials/annual/2017/Final-2017-Annual-Report.pdf" target="_blank">http://s1.q4cdn.com/156149269/files/doc_financials/annual/2017/Final-2017-Annual-Report.pdf</a>',
    '<a href="https://www.nytimes.com/2018/05/24/arts/television/the-fourth-estate-review.html" target="_blank">https://www.nytimes.com/2018/05/24/arts/television/the-fourth-estate-review.html</a>',
    '<a href="https://www.nytimes.com/2018/06/07/reader-center/corrections-how-the-times-handles-errors.html?rref=collection%2Fseriescollection%2Funderstanding-the-times" target="_blank">https://www.nytimes.com/2018/06/07/reader-center/corrections-how-the-times-handles-errors.html?rref=collection%2Fseriescollection%2Funderstanding-the-times</a>',
    'Walter Duranty',
    '<a href="https://www.nytco.com/new-york-times-statement-about-1932-pulitzer-prize-awarded-to-walter-duranty/" target="_blank">https://www.nytco.com/new-york-times-statement-about-1932-pulitzer-prize-awarded-to-walter-duranty/</a>',
    'Editor explains corrections process',
    '<a href="https://www.nytimes.com/2018/06/07/reader-center/corrections-how-the-times-handles-errors.html" target="_blank">https://www.nytimes.com/2018/06/07/reader-center/corrections-how-the-times-handles-errors.html</a>',
    'Public Editor assesses Israeli-Palestinian conflict coverage',
    '<a href="https://www.nytimes.com/2014/11/23/opinion/sunday/the-conflict-and-the-coverage.html" target="_blank">https://www.nytimes.com/2014/11/23/opinion/sunday/the-conflict-and-the-coverage.html</a>',
    '“The Killing Fields” wins three Oscars',
    '<a href="https://www.google.com/search?q=did+%22the+killing+fields%22+win+an+oscar%3F&oq=did+%22the+killing+fields%22+win+an+oscar%3F&aqs=chrome..69i57.15815j1j4&sourceid=chrome&ie=UTF-8" target="_blank">https://www.google.com/search?q=did+%22the+killing+fields%22+win+an+oscar%3F&oq=did+%22the+killing+fields%22+win+an+oscar%3F&aqs=chrome..69i57.15815j1j4&sourceid=chrome&ie=UTF-8</a>']}],
 'topline': 'The website of a New York-based news organization with a network of journalists around the world whose coverage exerts significant influence on national and international news and public debate. The New York Times has evolved from its mid-19th century print origins to digital innovation as it enters its fifth generation of family leadership.',
 'updatedDate': None,
 'writeup': [{'body': ['The New York Times Company is publicly owned with a shareholder structure that places power in the hands of the Sulzberger family, descendants of patriarch Adolph S. Ochs, who moved from Chattanooga, Tennessee, and bought the struggling paper in 1896. Following financial travail in the wake of the 2008 world financial crisis, Carlos Slim Helú, Mexico’s wealthiest individual, loaned the company $250 million and later became its single largest shareholder. The Times also negotiated a $225 million sale and leaseback of its then-new Manhattan headquarters building. With the industry-wide decline in advertising and print circulation revenues, the company increasingly relies on revenue from its more than 2.7 million digital subscribers (and about one million print subscribers and newstand buyers). In addition, the company receives revenue from its NYTLicensing Service, which licenses content to other news organizations and brands; from its NYTLive business, which runs conferences and other events; and from its product recommendation website Wirecutter.',
    'Dean Baquet, a Pulitzer Prize-winning journalist who previously worked as editor of the Los Angeles Times, is executive editor of the paper. A.G. Sulzberger, himself a former reporter, succeeded his father, Arthur Ochs Sulzberger Jr., as publisher in 2018 to become the sixth generation of the family to head the paper. Baquet oversees a newsroom of over 1,450 journalists, including national and foreign bureaus.'],
   'title': 'Ownership and Financing'},
  {'body': ['The Times&nbsp;offers a comprehensive daily examination of news, politics, culture, science, sports, and other primary areas of human endeavor.&nbsp;While the news organization still produces local coverage of New York, it has declined over the years as the paper has expanded the national and international scale of its reporting and of its readership. In the Sunday edition, The Times publishes a substantial magazine with regular long features, columns, photos, and puzzles.&nbsp;The magazine’s ambitions are typified by its devoting an August 2016 issue to a single 42,000-word article on the Middle East, "Fractured Lands," that included a virtual-reality video on the retaking of the Iraqi city of Fallujah from ISIS by the Iraqi military and a U.S.-led coalition.',
    'In addition to its regular news coverage, The Times has a 100-person staff for editorials and opinion sections. It features&nbsp;special projects through multimedia storytelling and the creation of new sections outside the normal news cycle. “Overlooked,” for example, is a series of&nbsp;obituaries of famous women throughout history&nbsp;who had not received attention from the paper upon their passing. Many of these&nbsp;features are interactive&nbsp;and visual, taking a significant amount of public data or information and making it accessible to readers. For example, a Detroit&nbsp;story&nbsp;allowed readers to explore a database of all 43,000 properties on the brink of foreclosure in the city. Some sections of the news organization, like books and the magazine, have their own special projects editors. ',
    'Breaking news stories on the site are routinely supplemented by a mix of reporting, multimedia, graphics, and push notifications. The paper&nbsp;has expanded its digital operations aggressively in the 21st century with innovative graphics and photography, mobile apps, podcasts, and more. The Times has emphasized video content by pioneering 360-degree and virtual reality technology to distribute news and by featuring the documentary series, Op-Docs, which have earned Oscar nominations as well as Emmy and Peabody awards since the Opinion section began producing them in 2011. Starting later in 2018, The Times will produce a television show in partnership with FX and Hulu.'],
   'title': 'Content'},
  {'body': ["The Times' staff of over 1,450 journalists publish stories with original reporting by conducting first-hand interviews, referencing primary sources, and verifying stories broken by other outlets. Their work is scrutinized intensely due to the paper’s scale and influence. Its errors gain widespread notice. The paper has been involved in high-profile editorial scandals, including significant fabrications by former reporter Jayson Blair, which led to the 2003 departure of the paper’s executive and managing editors as well as an extensive review of the paper’s editorial practices. The Times published a report with its findings, which created standards and Public Editor positions at the paper.",
    'The paper has consistently endorsed Democratic candidates in presidential elections and has not backed a Republican since President Dwight Eisenhower in 1956. Its editorials are generally liberal on social, cultural, and economic issues. Its coverage of the Israeli-Palestinian conflict is a source of heated controversy, nearly always seen by one side as favoring the other, and prompted a 2014 Public Editor analysis that offered both criticisms and suggestions for improvement, including a need to “penetrate Palestinian society with understanding and solid news judgment.”',
    'The Times’ news coverage&nbsp;is seen as resolutely liberal by conservatives, an image that Baquet has discussed publicly. “I would be lying if I did not say that a newsroom that is largely built in Manhattan does not have liberal leanings in the lifestyles and attitudes of its employees,” he said at a 2018 “Future of News” event held by the Financial Times. “That would be nuts if I said that. What I will say is that we have in our culture, and in our institution, a set of safeguards&nbsp;in editing that force us, that allow us, to kind of achieve that balance despite that.” In a separate magazine interview, Managing Editor Joseph Kahn said the paper "has made it really clear that we consider it crucial to our future that we not become an opposition-news organization. We do not see ourselves, and we do not wish to be seen, as partisan media. That means that the news and opinion divide, and things like social media guidelines and some of our traditional restrictions on political activity by employees, may feel cumbersome to some people at this point in our evolution.”',
    'The editorial page reports to the newspaper’s publisher, not to executive editor Baquet, who oversees news coverage. That divide, typical at many large newspapers, can be lost on critics. Reporting done by The Times often drives news coverage done by other major American news outlets, regardless of those outlets’ own ideology. During the 2015-2016 presidential campaign, for example, The Times’s influence &nbsp;-- and its aggressive coverage of both sides of the political divide -- was manifested in exclusive stories on the use of a private email server by Hillary Clinton while serving as Secretary of State and on Russian interference in the election.',
    'The site includes separate emails or phone numbers for those seeking corrections on news stories and editorials. It runs corrections on a daily basis, with a senior editor and a news assistant assigned exclusively to corrections. High-profile corrections include its response to criticism of a July 23, 2015, story that indicated inspectors general in the government sought a Justice Department criminal investigation "into whether Hillary Rodham Clinton mishandled sensitive government information on a private email account she used as secretary of state." There were two separate corrections and a lengthier Editor\'s Note, which said that the corrections should have run sooner.',
    'The Times has won 125 Pulitzer Prizes—more than any newspaper—including 2018 wins for reporting on powerful sexual predators and on Russian influence in the 2016 presidential campaign. The Times’s sexual violence investigations were a catalyst for the #MeToo movement.'],
   'title': 'Credibility'},
  {'body': ['The New York Times website discloses information about many (but not all) reporters and editors, allowing readers to click on their names to find biographical information and other stories by them. It also links to the New York Times Company website, where readers can find financial and investor information, as well as the company’s history, the current leadership, and the company’s extensive standards and ethics policies. In late August 2018, The Times released a redesign of the site that eliminated most bylines from its home page, although they still appear on the articles. Some media watchers suggested the change could hurt the paper\'s credibility, Politico reported. "Wow, we\'ve gone back to 1942, it seems," tweeted George Washington University professor Nikki Usher, according to Politico. "The rise of bylines was part of the growth of accountability journalism."',
    'In 2017, The Times eliminated the position of Public Editor, a role intended to address reader concerns and editorial practices. Then-publisher Arthur Sulzberger Jr. said the move reflected changes in social media and the internet that combine “to collectively serve as a modern watchdog, more vigilant and forceful than one person could ever be. Our responsibility is to empower all of those watchdogs, and to listen to them, rather than to channel their voice through a single office.” Some criticized the move as overestimating the role of social media as a surrogate for a full-time Public Editor. The doubters include former Public Editor Margaret Sullivan, who by then had moved to The Washington Post to become a columnist.',
    'Concurrently, The Times established its Reader Center, an editorial team aimed at engaging readers, increasing the paper’s transparency, and maintaining loyal subscribers. The section solicits comment or ideas from readers, and it regularly publishes discussions of feedback to controversial stories or explanations of the reporting process. One series, titled “Understanding The Times,” seeks to explain to readers the news organization’s basic journalistic practices, such as its use of anonymous sources or corrections policy.',
    'The Times is also transparent about high-profile personnel controversies. For example, in 2018 &nbsp;the email and phone records of Washington reporter Ali Watkins were seized by federal prosecutors as part of a case against a Senate Intelligence Committee aide who had allegedly leaked information to her and with whom she had an affair. The communication in question took place before Watkins started working at The Times, but the company conducted an internal review after the incident came to light. Afterward, The Times published a long story about their relationship and the journalism ethics questions it raised. The Times disclosed in a separate news story that she would be reassigned to a beat in New York and given a mentor.',
    'On two occasions, The Times has allowed documentary filmmakers inside the news organization. The 2011 "Page One: Inside the New York Times" caught the paper amid a drastic industry downturn and layoffs. Showtime\'s 2018 four-part "The Fourth Estate" covered 16 months at the paper, including the first year of the Trump presidency.',
    'When it comes to corrections, a senior editor is assigned full-time to the process. The editor has a full-time assistant and they work with more than 30 editors in various subject areas who oversee investigating possible errors in those areas.'],
   'title': 'Transparency'},
  {'body': ['Founded in 1851, The New York Times grew significantly throughout the 20th century and looked beyond local coverage to become one of the nation’s most significant media institutions. The news organization has been involved in high-profile Supreme Court cases involving freedom of the press and freedom of speech. It won a major case concerning libel law in 1964 and another concerning prior restraint in 1971 over its publication of the Pentagon Papers, a secret government history of the Vietnam War.',
    'Its range has been broad. In 1970 it exposed how gamblers, drug dealers, and small businesses made “illicit payments of millions of dollars a year to the policemen of New York” (a key source, a loner and bohemian police officer named Frank Serpico, inspired the Hollywood film “Serpico,” starring Al Pacino). Reporter Sydney Schanberg’s accounts of &nbsp;the fall of Cambodia won a 1972 Pulitzer and inspired “The Killing Fields,” which won three Academy Awards. &nbsp;(Schanberg was portrayed by Sam Waterston.) &nbsp;A daily special section, “A Nation Challenged,” captured the national mood and fear in the aftermath of the 9/11 terrorist attacks. A 2005 series on secret domestic eavesdropping by the government predated by eight years disclosures of secret surveillance programs of the National Security Agency leaked by Edward Snowden, a former NSA contractor. Its more recent sexual violence stories may prove a cultural turning point.',
    'The Times history includes serious missteps, perhaps the most notable being the 1931 Pulitzer Prize-winning reporting from the Stalin-era Soviet Union by Walter Duranty, a celebrated foreign correspondent. His work has been thoroughly discredited for accepting Soviet propaganda and underplaying Stalin’s brutality, including the collectivization of agriculture that led to famine deaths of millions of people in the Ukraine. The paper thoroughly acknowledged the reporting as flawed, but the Pulitzer board has declined to revoke the 1932 award.',
    "Disclosure: Joyce Purnick, a deputy managing editor at NewsGuard, is a former reporter, editor, editorial board member and New York City political columnist at The New York Times. Her husband, Max Frankel, was The Times's executive editor, editorial page editor, and a Pulitzer Prize-winning foreign correspondent."],
   'title': 'History'}]}

Observations

  • Gordon and Steve get their own fields.
  • createdDate and updatedDate are empty.
  • They have authors, editors, and reviewers on each.
  • The interesting bits look like criteria, metadata, rank, and score.
  • There is an extensive source list.
  • There is an orientation metadata field.
  • The metadata also contains links to social media pages for the media source
  • The criteria field contains the attributes they judge on
  • The rank field contains their judgement.

I looked at sample pages for each unique rank code. Here are the messages a user receives for each rank:

In [3]:
rank_info = {
    'TK': 'This website is still in the process of being rated by NewsGuard.',
    'T': 'This website generally maintains basic standards of accuracy and accountability.',
    'N': 'Proceed with caution: This website generally fails to maintain basic standards of accuracy and accountability.',
    'P': 'This website publishes content from its users that it does not vet. Information from this source may not be reliable.',
    'S': 'This is a satire or humor website. It is not an actual news source.'
}

They obviously didn't return ratings for all the domains I queried. Let's see how many ratings I actually retrieved, not including ratings of rank TK.

In [4]:
have_ratings = [d for d,r in ratings.items() if r['id'] is not None and r['rank'] != 'TK']
len(have_ratings)
Out[4]:
568
In [5]:
site_to_id = {s:r['id'] for s,r in ratings.items() if r['id'] is not None}
ratings_by_id = {r['id']:r for r in ratings.values() if r['id'] is not None}
domain_to_rating = {tldextract.extract(s).registered_domain.lower():ratings_by_id[i] for s,i in site_to_id.items()}

def get_subkey(rating, key, title):
    if rating[key] is None:
        return None
    metadata = next((m for m in rating[key] if m['title'] == title), None)
    return metadata['body'] if metadata else None

def get_metadata(rating, title):
    return get_subkey(rating, 'metadata', title)
    
def get_criteria(rating, title):
    return get_subkey(rating, 'criteria', title)

Orientation

The orientation data seems interesting. Let's see what that looks like.

In [6]:
orientations = {}
for domain, rating in domain_to_rating.items():
    orientations[domain] = get_metadata(rating, 'orientation')

ng_orientations = pd.Series(orientations)
ng_orientations.name = 'ng_orientation'
ng_orientations.value_counts()
Out[6]:
N/A               373
Slightly Left      48
Far Right          35
Slightly Right     30
Far Left           10
Name: ng_orientation, dtype: int64
In [7]:
ng_orientations.groupby(ng_orientations).apply(lambda g: g.sample(5))
Out[7]:
ng_orientation                           
Far Left        dailykos.com                       Far Left
                latest.com                         Far Left
                bipartisanreport.com               Far Left
                salon.com                          Far Left
                commondreams.org                   Far Left
Far Right       westernjournal.com                Far Right
                townhall.com                      Far Right
                gellerreport.com                  Far Right
                lifenews.com                      Far Right
                conservativedailypost.com         Far Right
N/A             fortune.com                             N/A
                brookings.edu                           N/A
                indiewire.com                           N/A
                wgno.com                                N/A
                nymag.com                               N/A
Slightly Left   mic.com                       Slightly Left
                chron.com                     Slightly Left
                huffingtonpost.com            Slightly Left
                essence.com                   Slightly Left
                refinery29.com                Slightly Left
Slightly Right  rightwingnews.com            Slightly Right
                fox5vegas.com                Slightly Right
                katu.com                     Slightly Right
                nypost.com                   Slightly Right
                christianpost.com            Slightly Right
Name: ng_orientation, dtype: object

Observations

  • Most of the orientations are N/A
  • There are only ten Far Left.
  • It looks like they've been either really conservative in assigning orientations to anything, or they stopped at some point.

How do these orientations correlate with election retweeters and Facebook ideologies?

In [8]:
mc_scores = pd.read_csv('election_retweeter_polarization_media_scores.csv')

mc_scores['domain'] = mc_scores['url'].apply(lambda u: tldextract.extract(u).registered_domain.lower())
mc_scores = mc_scores.set_index('domain')

orientation_order = ['Far Left', 'Slightly Left', 'N/A', 'Slightly Right', 'Far Right']
joined = mc_scores.join(ng_orientations).dropna()
_ = sns.catplot(x='score', y='ng_orientation', 
            order=orientation_order, data=joined)

Observations

  • There's definitely a relationship, so that's good.
  • N/A has a wide range of scores.
  • There's more disagreement in the middle, which is intuitive.

How about the Facebook data?

In [9]:
facebook = pd.read_csv('facebook_ideology_estimates.csv')
facebook['domain'] = facebook['domain'].apply(lambda d: tldextract.extract(d).registered_domain.lower())
facebook = facebook.set_index('domain')

_ = sns.catplot(x='avg_align', y='ng_orientation', 
            order=['Far Left', 'Slightly Left', 'N/A', 'Slightly Right', 'Far Right'],
            data=facebook.join(ng_orientations).dropna())

Observations

  • Looks like a little bit less disagreement in the middle relative to election retweeter method.
  • N/A skews left.
  • Slightly Left and Slightly Right look to extend about as far to the edges as their "Far" counterparts, but just have more variance.

Rank and Score

Let's build a flat dataset of interesting fields so things are a bit easier to look at. Then let's look at rank, score, and criteria.

Criteria is documented here.

In [10]:
domains = {}
for domain, rating in domain_to_rating.items():
    if rating['id'] is None or rating['rank'] == 'TK':
        continue
    domains[domain] = {
        'rank': rating['rank'],
        'orientation': get_metadata(rating, 'orientation'),
        'score': rating['score'],
        'falseContent': get_criteria(rating, 'falseContent'),
        'basicStandards': get_criteria(rating, 'basicStandards'),
        'newsOpinion': get_criteria(rating, 'newsOpinion'),
        'deceptiveHeadlines': get_criteria(rating, 'deceptiveHeadlines'),
        'accountabilityPractices': get_criteria(rating, 'accountabilityPractices'),
        'ownership': get_criteria(rating, 'ownership'),
        'labelsAdvertising': get_criteria(rating, 'labelsAdvertising'),
        'management': get_criteria(rating, 'management'),
        'contentCreators': get_criteria(rating, 'contentCreators'),
        'active': rating['active'],
    }

ng = pd.DataFrame.from_dict(domains, orient='index')
print(ng.shape)
ng['rank'].value_counts()
(533, 13)
Out[10]:
T    485
N     39
P      6
S      3
Name: rank, dtype: int64
In [11]:
ng[ng['rank'] == 'N']['score'].sort_values()
Out[11]:
zerohedge.com                 0.0
truthuncensored.net           7.5
thepoliticalinsider.com       7.5
sputniknews.com               7.5
redstatewatcher.com           7.5
madworldnews.com              7.5
downtrend.com                 7.5
100percentfedup.com           7.5
naturalnews.com              12.5
truepundit.com               12.5
dailymail.co.uk              15.0
bipartisanreport.com         17.5
thegatewaypundit.com         20.0
nationalenquirer.com         20.0
gellerreport.com             20.0
conservativedailypost.com    22.5
jihadwatch.org               25.0
yournewswire.com             25.0
infowars.com                 25.0
louderwithcrowder.com        29.5
westernjournalism.com        32.5
conservativetribune.com      32.5
rightwingnews.com            32.5
rt.com                       32.5
westernjournal.com           32.5
shareblue.com                34.5
latest.com                   39.5
theblaze.com                 40.0
thefederalistpapers.org      40.0
ilovemyfreedom.org           42.0
thoughtcatalog.com           42.0
hannity.com                  42.0
rushlimbaugh.com             47.0
judicialwatch.org            52.0
worldstarhiphop.com          52.0
hollywoodlife.com            54.5
dailykos.com                 54.5
breitbart.com                57.0
thefreethoughtproject.com    59.5
Name: score, dtype: float64
In [12]:
_ = sns.distplot(ng['score'], kde=False)
/home/jclark/miniconda3/envs/mediacloud/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval

Observations

  • Most sites pass. Only 39 out of 533 fail.
  • I know most of these that fail. Most of them look to be on the right.
  • Most of the sites are very near 100.
In [13]:
criteria = [
    "falseContent",
    "basicStandards",
    "newsOpinion",
    "accountabilityPractices",
    "deceptiveHeadlines",
    "ownership",
    "labelsAdvertising",
    "management",
    "contentCreators",
]
ng.loc[:,criteria].apply(lambda c: c.value_counts())
Out[13]:
falseContent basicStandards newsOpinion accountabilityPractices deceptiveHeadlines ownership labelsAdvertising management contentCreators
NA NaN NaN NaN NaN NaN NaN 13 NaN NaN
No 24.0 41.0 78.0 145.0 46.0 55.0 23 134.0 85.0
Yes 500.0 483.0 446.0 379.0 478.0 469.0 488 390.0 439.0
In [14]:
_ = ng.loc[:,criteria].apply(lambda c: c.value_counts()).loc['No',].sort_values().plot.barh(title='# Sites Failing each Criteria')

Observations

  • Accountability practices and management are where most get knocked down. Those are defined as "Regularly corrects or clarifies errors" and "Reveals who’s in charge, including any possible conflicts of interest".

Scores by Orientation

I'll look at scores by NewsGuard assigned orientation first, and then switch over to the Facebook numbers to get something more fine-grained.

In [15]:
ng = ng[(ng['rank'] == 'T') | (ng['rank'] == 'N')]
ng.groupby('orientation')['rank'].value_counts()
Out[15]:
orientation     rank
Far Left        T         6
                N         4
Far Right       N        21
                T        14
N/A             T       361
                N         4
Slightly Left   T        46
                N         2
Slightly Right  T        23
                N         6
Name: rank, dtype: int64
In [16]:
_ = sns.stripplot(x='score', y='orientation', data=ng, order=orientation_order)
In [17]:
_ = sns.boxplot(x='score', y='orientation', data=ng, order=orientation_order)

Observations

  • All of the orientations' scores look pretty spread out.
  • Slightly Left has the fewest outliers with low scores.
  • The boxplots show that scores drop off as you approach the edges of ideology.
  • The Far Right has more of its mass in the lower scores. In fact, the median is actually failing.

Let's look at Facebook numbers.

In [18]:
import plotly.offline as plotly
import plotly.graph_objs as go

plotly.init_notebook_mode()

scatter1 = go.Scattergl(
    y=ng.join(facebook)['score'],
    x=ng.join(facebook)['avg_align'],
    mode='markers',
    text=ng.join(facebook).index,
    marker=dict(
        color=ng.join(facebook)['rank'].factorize()[0],
        colorscale=[[0, 'rgb(200,20,20)'], [1, '#398937']]
    )
)

layout1 = go.Layout(
    title ='NewsGuard Scores vs Source Ideology',
    hovermode = 'closest',
    xaxis = dict(title = 'Ideology by Facebook Average Alignment'),
    yaxis = dict(title = 'NewsGuard Score'),
)

plotly.iplot(go.Figure(data=[scatter1], layout=layout1))

Observations

  • Most of the center/center-left is up near 100, but a few drop down below failing. RT is one of them.
  • Scores fall off a little bit approaching the far left.
  • The center-right is missing, like we've seen before.
  • The far right is its own thing, and the scores are a lot more variable.
In [21]:
ng.join(facebook).join(mc_scores, rsuffix='_mc').to_csv('newsguard_facebook-ideo_election-retweet_joined.csv')