Slashdot Mirror


Statisticians Investigate Political Bias On Wikipedia

Hugh Pickens writes "The Global Economic Intersection reports on a project to statistically measure political bias on Wikipedia. The team first identified 1,000 political phrases based on the number of times these phrases appeared in the text of the 2005 Congressional Record and applied statistical methods to identify the phrases that separated Democratic representatives from Republican representatives, under the model that each group speaks to its respective constituents with a distinct set of coded language. Then the team identified 111,000 Wikipedia articles that include 'republican' or 'democrat' as keywords, and analyzed them to determine whether a given Wikipedia article used phrases favored more by Republican members or by Democratic members of Congress. The results may surprise you. 'The average old political article in Wikipedia leans Democratic' but gradually, Wikipedia's articles have lost the disproportionate use of Democratic phrases and moved to nearly equivalent use of words from both parties (PDF), akin to an NPOV [neutral point of view] on average. Interestingly, some articles have the expected political slant (civil rights tends Democrat; trade tends Republican), but at the same time many seemingly controversial topics, such as foreign policy, war and peace, and abortion have no net slant. 'Most articles arrive with a slant, and most articles change only mildly from their initial slant. The overall slant changes due to the entry of articles with opposite slants, leading toward neutrality for many topics, not necessarily within specific articles.'"

2 of 221 comments (clear)

  1. Re:Hope they don't do just word frequency analysis by Anonymous Coward · · Score: 5, Informative

    Would it kill you to read the paper?

    We obtain a list of 111,216 articles. We then eliminate these articles that cover countries other than the United States.
    [...]

    For each of these articles, we construct a slant index by applying the methods and estimates developed by Gentzkow and Shapiro (2010), hereafter G&S. G&S select 1,000 phrases based on the number of times these phrases appear in the text of the 2005 Congressional Record, applying statistical methods to identify phrases that separate Democratic representatives from Republican representatives, under the model that each group speaks to its respective constituents with a distinct set of coded language. In brief, we ask whether a given Wikipedia article uses phrases favored more by Republican members or by Democratic members of Congress.

    And the corresponding footnote:

    The words “republican” and “democrat” do not appear exclusively in entries about United States politics. If a country name shows up in the title or category names, we then check whether the phrase “United States” or “America” shows up in the title or category names. If yes, we keep this article. Otherwise, we search the text for “United States” or “America.” If these phrases do not show up more than 3 times in the text, this article is dropped. This process keeps articles such as “Iraq War” but drop articles related to political parties in foreign countries.

    Researchers do think of this stuff, you know.

  2. Re:Hope they don't do just word frequency analysis by TheRaven64 · · Score: 5, Informative

    'America' appears 7 times in the article 'Irish republicanism' (3 times as 'America' 4 in 'American') and so by their metric (must occur 3 or more times) it would go in, in spite of being nothing at all to do with the US political party of the same name.

    --
    I am TheRaven on Soylent News