Slashdot Mirror


How Journalists Data-Mined the Wikileaks Docs

meckdevil writes "Associated Press developer-journalist extraordinaire Jonathan Stray gives a brilliant explanation of the use of data-mining strategies to winnow and wring journalistic sense out of massive numbers of documents, using the Iraq and Afghanistan war logs released by Wikileaks as a case in point. The concepts for focusing on certain groups of documents and ignoring others are hardly new; they underlie the algorithms used by the major Web search engines. Their use in a journalistic context is on a cutting edge, though, and it raises a fascinating quandary: By choosing the parameters under which documents will be considered similar enough to pay attention to, journalist-programmers actually choose the frame in which a story will be told. This type of data mining holds great potential for investigative revelation — and great potential for journalistic abuse."

59 comments

  1. I just used grep -P by siddesu · · Score: 3, Interesting

    Worked miracles after I've gotten around the ugly HTML format they use to release all those INFORMATIONS. Still, there was very little new or worthwhile in the heap of those news clips and rumour aggregations. Frankly, the more I grep it, the less it looks like the "largest leak in history", and the more it seems like "the largest controlled release of information" in history.

    / takes off conspiracy theory hat // flame on

    1. Re:I just used grep -P by Anonymous Coward · · Score: 1

      / takes off conspiracy theory hat // flame on

      Never take off the hat, that is how they get you!

    2. Re:I just used grep -P by Anonymous Coward · · Score: 0

      Used grep -ril over here.

    3. Re:I just used grep -P by Anonymous Coward · · Score: 0

      Still, there was very little new or worthwhile in the heap of those news clips and rumour aggregations.

      I thought it was pretty uninteresting as well, but then I used grep -v.

    4. Re:I just used grep -P by buchner.johannes · · Score: 4, Insightful

      Worked miracles after I've gotten around the ugly HTML format they use to release all those INFORMATIONS. Still, there was very little new or worthwhile in the heap of those news clips and rumour aggregations. Frankly, the more I grep it, the less it looks like the "largest leak in history", and the more it seems like "the largest controlled release of information" in history.

      / takes off conspiracy theory hat // flame on

      When you use grep you have to know what you grep for. You can not stumble upon a search keyword with grep.
      Clustering allows that, if you let it build the clusters itself. Perhaps you are missing out on the interesting bits.

      --
      NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
    5. Re:I just used grep -P by siddesu · · Score: 0

      If you don't know what you're looking for in a pile of documents assembled by less than gifted embassy staff by retelling newspaper clips and general gossip, you have no business being a journalist in the first place. There was nothing else but local newspaper clips and gossip in the Wikileaks embassy leak.

    6. Re:I just used grep -P by wvmarle · · Score: 4, Insightful

      If you know what's in the documents, then life gets easy of course. The trouble is that usually you do not know what's in the documents without reading them. And if there's nothing new, that's a pity. But anyway the fact that one could say "there is nothing but local newspaper clips and gossip" in a set of documents indicates that they actually went through them all.

      And for sure with the WikiLeaks documents there's a lot of noise in it. The same will be with the Palin e-mail trove. And finding the interesting bits out of that enormous noise that's what journalists are for, and what those interesting bits are no journalist will know beforehand - which is exactly why they are interesting.

  2. Withered journalism by Anonymous Coward · · Score: 0

    Whilst - purely coincidentally - completely avoiding paying Wikileaks anything for any of these files, from Afghanistan to the US State Dept's opposition to the Haitian government daring to propose a raise in their minimum wage up to 59/hour. (Not by that much, but to that much)

  3. We're Not Limited to Only One Context by MimeticLie · · Score: 5, Insightful

    Isn't that one of the major reasons we have journalism? To synthesize and contextualize information? If the contextualized (or perhaps editorialized, depending on your point of view) information was the only kind available, then yes that is an issue. But with Wikileaks, the data is there for anyone who wants to parse it.

    This strikes me as being similar to when Anderson Cooper was criticized for calling Mubarak a liar. Or the behavior that Colbert mocked the White House press corps for at the correspondents' dinner. Pretending that journalists are free of bias doesn't make it so, and saying that they should just regurgitate facts and talking points verbatim is counter-productive. Reasoned analysis should be encouraged.

    1. Re:We're Not Limited to Only One Context by Anonymous Coward · · Score: 2, Interesting

      If the contextualized (or perhaps editorialized, depending on your point of view) information was the only kind available, then yes that is an issue. But with Wikileaks, the data is there for anyone who wants to parse it.

      If memory serves, and I'm not missing something in my quick re-read of the Wikipedia page, the leaked cables were not all made available to everyone. They were distributed to five major news organizations so more than one editorial staff could reasonably decide which material was newsworthy and which was too sensitive to publish (sarcastic example: the GPS coordinates of Obama's real long-form birth certificate). This is a reasonably good idea, but it does mean that there are only a handful of people who have access to all the documents.

      Have you ever heard that when you find something it's always in the last place you look? That's because you stop looking for it once you're satisfied. Similarly, an editor searching for terms that might confirm a previously-unsubstantiated rumor he's got tucked away in a story on the shelf may find what he's looking for, but he won't find the really juicy stuff he didn't know to look for.

      In a perfect world, the system would correct for this because some enterprising young journalists who are willing to "pound the pavement" and read the whole thing would uncover the stuff they missed. But because of the limited set of people who have access, that won't happen for a decade or two at the earliest. It's a necessary evil to prevent information like the locations of and personnel at sensitive sites from falling into the wrong hands.

    2. Re:We're Not Limited to Only One Context by MimeticLie · · Score: 4, Insightful

      That's part of the point of the video, using data mining techniques to broaden analytical tools beyond a simple keyword search and the preconceptions it can reinforce (the reporter mentions seeing a cluster of tanker truck incidents that was bigger than his organization was previously aware). He ends by noting that the way one writes the algorithm can determine what trends pop out and thus how the story is framed, which seems like a perfectly reasonable statement. Then someone (either the submitter or the Slashdot editors) transforms that into a "great potential for journalistic abuse."

      I don't have an issue with the methodology portrayed in the video. But to than take the presenter's words and twist them to support a "just the facts, ma'am" style of journalism seems dishonest and unproductive.

    3. Re:We're Not Limited to Only One Context by Psychotria · · Score: 1

      From the video (I only watched 5 minutes before getting bored, though) I am not sure he even used his own algorithm/s. It does appear that he used Gephi and its built-in or 3rd party algorithms (plugins) to display the data in a way that made associations not immediately apparent... apparent. The tanker truck incident cluster is an example of this, and about when I stopped watching.

    4. Re:We're Not Limited to Only One Context by martin-boundary · · Score: 1
      The problem you're missing is that, by the nature of broacasting and traditional media, not all journalists (and not all newspapers) have the same weight. This is a problem and the reason why bias should not be encouraged.

      In a world where one source of information has 10 times the weight of others, the impact of that source's bias is 10 times the impact of the others. In other words, for every NY Times article which claims we have to invade Iraq, you would have to read 10 local Springfield Shopper articles that argue the opposite, just to cancel the bias. And if all the newspapers of record have the same bias, then society's fucked.

      The ideal of presenting neutral information therefore grows, the more important the news source is: if local news is highly biased, that's not so bad but national level news should always be neutral "just the facts".

    5. Re:We're Not Limited to Only One Context by MimeticLie · · Score: 1
      The Iraq war example doesn't exactly fit here, since the video is about parsing and analyzing large datasets. But since you brought it up (and I referenced it in my post), I'm going to quote Colbert:

      Over the last five years you people were so good, over tax cuts, WMD intelligence, the effect of global warming. We Americans didn't want to know, and you had the courtesy not to try to find out. Those were good times, as far as we knew.

      But, listen, let's review the rules. Here's how it works. The President makes decisions. He's the decider. The press secretary announces those decisions, and you people of the press type those decisions down. Make, announce, type. Just put 'em through a spell check and go home. Get to know your family again. Make love to your wife. Write that novel you got kicking around in your head. You know, the one about the intrepid Washington reporter with the courage to stand up to the administration? You know, fiction!

      I think there is far more danger to be had in a news media that passively accepts presented facts (Iraqi WMDs, Saddam's ties to al Qaeda, ect) and narratives (invasion of Iraq is necessary) than one that editorializes a bit but doesn't simply act as a mouthpiece for whoever is currently in power. And I disagree with the notion that all bias needs to be balanced out by other bias. That smacks of "teaching the controversy" to me.

    6. Re:We're Not Limited to Only One Context by I3OI3 · · Score: 1

      And I disagree with the notion that all bias needs to be balanced out by other bias. That smacks of "teaching the controversy" to me.

      The parent wasn't saying the bias needs to be balanced out by other bias; he was saying that if the people at the top lie, it takes ten voices of truth at the bottom to reach the same audience.

    7. Re:We're Not Limited to Only One Context by Anonymous Coward · · Score: 0

      Interesting that you bring this up, given the historical actions of the NY Times, the AP (w.r.t. Western Union), etc.

    8. Re:We're Not Limited to Only One Context by biodata · · Score: 1

      He used a standard text mining approach (TF-IDF), followed by clustering of documents on pairwise distance. We did something similar here http://journal.imbio.de/article.php?aid=121 to text mine the biological literature although we went further in terms of figuring out which metrics work best. He eventually ran up against the same thresholding problem we did - at some point you have to decide what you are going to call 'not related' and what 'related' and there doesn't seem an obvious principled way to do it, unless you have a 'truth' to test your precision/recall scores against.

      --
      Korma: Good
    9. Re:We're Not Limited to Only One Context by MimeticLie · · Score: 1

      Read the post again, he never mentions lying. Lying and bias are different things entirely (though they can often be found together).

    10. Re:We're Not Limited to Only One Context by martin-boundary · · Score: 1
      You are both right. I was just talking about bias (= the narrative or interpretation that is shrouded over the bare facts). But I wasn't suggesting that the top news sources should present controversies or talking points. I was suggesting that they should stay with the facts and reduce the interpretation the more important (=higher readership) they are. That's quite different from presenting a bag of contradictory biases hoping that would give a complete interpretation somewhere in the middle.

      In the former case, they report the news. In the latter case, they report the reactions and interpretations of the news instead. It causes people, whose first exposure to the event is by watching national news reports, to take the interpretation and narrative as integral part of the facts, when normally the starting point for everyone ought to be the bare facts without interpretation.

      And because national news are highly influential, this initial bias cannot be displaced by a single alternative news source or commentary. It requires many such alternatives until the memory effect of the initial bias is overcome.

      People don't spend all day watching competing news sources. So for most people, their final viewpoint is close to the initial bias that the national news introduces them to.

  4. Seriously by cultiv8 · · Score: 1

    Great video, can we at least get a better FA.

    --
    sysadmins and parents of newborns get the same amount of sleep.
  5. Control + F? by Anonymous Coward · · Score: 0

    Worked for me... *shrug*

  6. Fascinating by cshark · · Score: 1

    I wonder what this program would do to my extensive volume of email.
    I've got thousands of emails going back over a decade.

    Would love to see where the correlations are.

    --

    This signature has Super Cow Powers

  7. O Noes! by pandymen · · Score: 1

    How is this different than the current trend of deciding what facts to publish and what to ignore?

    I would see this as a threat to journalistic integrity only if there was such a thing anymore.

    1. Re:O Noes! by Anonymous Coward · · Score: 0

      Not much different than politicians doing what their lobbyists pay them for, then make up a story why it is good for the people.

  8. Not Newsworthy by Anonymous Coward · · Score: 0

    This is not cutting edge in the slightest: machine learning researchers have been clustering documents, let alone other objects, designing similarity measures, and constructing visualization schemes for years. The fact that cluster tendency assessment was used in a journalistic context isn't newsworthy.

    1. Re:Not Newsworthy by Anonymous Coward · · Score: 3, Insightful

      I think you miss the point - that it was used in a journalistic context most certainly *is* newsworthy: the AP guy was going to great lengths to stress evidence-based reporting, and uncovering associations, vice pre-supposing those things and backfitting the data.

      Data mining - like stats - allows bias to creep in quite readily, and once a study, a number, a story is out there, it's very difficult to pull it back, even when it's demonstrably wrong, biased or fabricated.

  9. Don't forget everyone else! by cold+fjord · · Score: 2, Informative

    Terrorists and foreign intelligence services will also be doing this to use against the United States and its allies, not just journalists. Wikileaks has provided the raw material for data mining to find things the US doesn't even realize about itself, or its allies. There is no surprise that Bradley Manning has been charged with aiding the enemy.

    The fallout continues, hopefully it won't be literally.

    Al-Qaeda Already Using Wikileaks Material Against Us
    Taliban Study WikiLeaks to Hunt Informants
    Wikileaks: US will have to reshuffle diplomats following revelations
    'They're informants... if they get killed, they deserve it': New book reveals shocking disregard of Julian Assange towards Afghans named in WikiLeaks cables

    Since I can anticipate the follow ups:
    No, Wikileaks didn't do an adequate job of scrubbing the documents of names at various points which is why they are useful to the Taliban and other groups building death lists.
    Yes, I have seen reports of people being killed due to Wikileaks publishing their name, you just have to dig a lot to find them. For some reason it doesn't seem to be a popular news item. Go figure.
    Oversight of US diplomacy, military, and intelligence activity is the role of the Congress elected by voters.

    Even if nobody was killed, Wikileaks has resulted in a significant disruption to US diplomacy and antiterrorism efforts. (You pull out informants due to their cover being blown and you lose valuable intelligence.)

    Poll finds that more Americans oppose WikiLeaks

    WASHINGTON — Americans overwhelmingly think that WikiLeaks is doing more harm than good by releasing classified U.S. diplomatic cables, and they want to see the people behind it prosecuted, according to a new McClatchy-Marist Poll.

    "Clearly people are very unhappy with it," said Lee Miringoff, the director of the Marist Institute for Public Opinion at Marist College in Poughkeepsie, N.Y., which conducted the national poll.

    The survey found that 70 percent of Americans think the leaks are doing more harm than good and want those who publish the secrets to be prosecuted.

    --
    much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
    1. Re:Don't forget everyone else! by cold+fjord · · Score: 2

      The Columbian drug cartels have been doing this sort of thing for years.

      --
      much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
    2. Re:Don't forget everyone else! by Anonymous Coward · · Score: 2, Insightful

      Life's a bitch. Perhaps the US shouldn't be doing things that it has to keep so secret. That's just a consequence of empire-building. Preach one value to the masses, do something else in practice.

      Is it more important to prop up the current system to keep a few agents of the empire safe from harm or is it more important to try to bring some sanity to the whole entire thing and do some longer-term good by shedding light on things people are afraid of showing to even our own public?

      Whether one agrees with the leaks or not, it's quite obvious from the cables that we're doing some rather unsettling things that I don't want to be associated with. I'm more concerned about the long term effects of that than the leaks themselves.

      It's a sorta philosophical debate... it's not a crime if you don't get caught, I guess. But now we're caught... what now? Pretend these things didn't happen?

    3. Re:Don't forget everyone else! by Dails · · Score: 3, Insightful

      we're doing some rather unsettling things that I don't want to be associated with

      And that's why you're not some sort of government agent doing those things. This attitude bothers me for the same reason the "No blood for oil" types bother me. You don't get how important that sort of thing is. No blood for oil? Then what will you shed blood for? Losing oil supplies will so vastly change your way of life that you would argue it impossible if someone accurately showed you. If you think shady goings-on are an endeavor unique to America, you need to wake up. Every country (EVERY country - if you're not an American, believe that your country does it, too) does that. Even if only to stay in power and not out of a desire to provide for the people, every government strives to provide a certain lifestyle or quality of life to the people, and this is the price. If you don't like it, stop doing anything that requires oil (drive a car, use electricity, buy processed foods, etc). Don't get upset at the government for doing what it has to to provide you with something you'd complain about losing (probably here).

    4. Re:Don't forget everyone else! by cold+fjord · · Score: 4, Insightful

      Perhaps the US shouldn't be doing things that it has to keep so secret. That's just a consequence of empire-building. Preach one value to the masses, do something else in practice.

      Is it the duty of the United States government to serve the interests of the United States, as opposed to say, Iran? Is it the duty of the United States government to care for and protect its people, as opposed to say, the people of Venezuela? If so, then it must differentiate between different sets of interests, American, and those of others.

      If American citizens have been taken prisoner unlawfully by pirates, the United States government could try to negotiate with the pirates. If the pirates want $1,000,000, but the US is willing to pay $20,000,000, should the government go in and up front announce the maximum amount they are willing to pay instead of try to pay the least amount? Wouldn't that be a fundamentally stupid bargaining tactic? But to do that, they would need to keep secrets from the pirates. Well, not just pirates, they would need to keep it secret from the media, since there are many media outlets that would gladly publish it, and force the US to pay $20,000,000 instead of $1,000,000. So, do you think the US should keep the maximum bid a secret and serve American interests, or announce it and server pirate interests by undermining the government's own negotiating position?

      Let's say negotiations with the pirates are going badly, they heard in media that the government is willing to pay $20,000,000 but they got greedy and now think they can get $50,000,000. The US Government isn't willing to pay that much, decides to use a commando raid to rescue the hostages while stalling in negotiations. Military actions are generally at least twice as effective over short periods of time when the attacking force attains surprise. Even if the pirates think it is possible, they don't really know if, when, how, who, or where they will come from. Should the US Government announce to the pirates that it has given up negotiations, and that it is going to use military force to free its citizens? If not, that would mean keeping a secret from the pirates - do you oppose that? Of course, it will also have to keep the rescue plan secret from the media as well or it will be published, the pirates will see it, and will be prepared to defeat it. Should the government tell the next of kin that it is going to try a military rescue? They might tell the media, or their kin being held by the pirates, and either the media or the prisoners might tell the pirates. So, it looks like we can't tell the pirates, the media, or the next of kin. What about other people in the United States? Same problem.

      As part of the planning for the rescue mission, it appears that it would be really helpful to refuel some aircraft in a country near where the pirates are holding the American captives. This third country has a government that is friendly to the United States, but much of the population is hostile as they are being influenced by religious extremists from outside their country. The government of this third country agrees to the refueling operation at one of their island military bases, but demand that it be kept secret to avoid agitating their citizens. Since it helps the mission of recovering Americans help hostage, shouldn't the US make use of the island for refueling? What about the request to keep it secret? Should the US stir up problems in the country by making it known, despite the request of the government? If the use of the island is revealed, it could hurt diplomatic relations, and perhaps even generate civil unrest, getting people killed. Shouldn't this be kept a secret? From the pirates? From the media?

      During the flight to the pirate locations, and on the ground, US forces will be using radios for command and control, and various flight operations. Should the US inform the pirates about the radio frequencies it uses? What about the media, who might listen in? Suppose a

      --
      much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
    5. Re:Don't forget everyone else! by nedlohs · · Score: 0

      So are you claiming the US should have publicised that they were pretty sure where Osama bin Laden was instead of keeping it secret until they acted on it? Or that they shouldn't have been looking?

      Are you claiming the US should call the Chinese Government right now and tell them the names and locations of every Chinese dissident they know about? Or that they shouldn't know on the first place or provide any assistance?

      Are you claiming the US should just publicise and, maybe run some training seminars on, how to build a hydrogen bomb? Or that they shouldn't have done it themselves in the first place, and just let the USSR have them?

      Are you claiming the US shouldn't campaign and negotiate for its own interests?

      Where is the line on what is OK to keep secret? And on what is OK to do even though it won't make everyone else in the world happy?

    6. Re:Don't forget everyone else! by Hatta · · Score: 0

      It is exactly because the US needs to serve US interests that we cannot have government secrecy. Without accountability, we cannot ensure that our government is serving our interests.

      --
      Give me Classic Slashdot or give me death!
    7. Re:Don't forget everyone else! by kilfarsnar · · Score: 1

      Translation: 70% of Americans surveyed don't understand freedom of the press or the value of dissent.

      --
      "What the American public doesn't know is what makes them the American public." -Ray Zalinsky (Tommy Boy)
    8. Re:Don't forget everyone else! by Archtech · · Score: 1

      No blood for oil? Then what will you shed blood for?

      So you do actually believe it's fine to kill foreigners so you can keep your higher standard of living? That sounds an absurd question to ask anyone, but I can't see how your post doesn't imply that you do believe it's fine. It's not ethical, though.

      Another thing that bothers "foreigners" (i.e. the approximately 6.7 billion, or over 95% of the human race, who are not citizens of the USA) is the inconsistency of a government that loudly insists that all people are equal, while working as hard as it can to make Americans much more equal than others.

      --
      I am sure that there are many other solipsists out there.
    9. Re:Don't forget everyone else! by kilfarsnar · · Score: 1

      This is a Faustian bargain though. It seems that you are saying, "If you want to have nice things we have to kill people to get them." You sound a lot like Col. Jessep telling us we can't handle the truth.

      I don't deny that you are accurately describing the current dynamic. It is the way it is, and yet people wonder why the world is such a violent and chaotic place and why we can't have "peace". Well this dynamic is partly why the world is the way it is. We can accept the way it is, but not agree that it should be that way. And in my opinion, it should not be that way.

      --
      "What the American public doesn't know is what makes them the American public." -Ray Zalinsky (Tommy Boy)
    10. Re:Don't forget everyone else! by Dails · · Score: 1

      I'm not saying it's fine to kill foreigners to keep a higher standard of living, I'm just saying that it's silly and hypocritical for people to make the kind of arguments I mentioned. The response after yours puts it more accurately; it's very Faustian, but it is in fact the way the world works. See my response to that for more.

    11. Re:Don't forget everyone else! by Dails · · Score: 1

      Well this dynamic is partly why the world is the way it is.

      The word dynamic here is perfect to describe this; the equilibrium of the world is such that if any entity (country, state, politician, military force, etc) won't take unfair or aggressive advantage of something, there will be another equal entity to fill that void. That's why it's silly to point out any one country for doing this kind of thing, because even if a country isn't, given the chance or risk:reward ratio, it would.

      We can accept the way it is, but not agree that it should be that way.

      But that attitude means it will never change. Sure, no blood for oil! I'm against foreign wars! Oh, but I'll also blame the government or big business or whomever if gas prices rise any higher. If you're against something, be against it. I'm in the military, so if someone gives me an order I don't like, I can either deal with it and follow the order, or I can decide I can't follow the order legally or in good conscience and refuse, but like everyone else I'd have to pay the consequences.

    12. Re:Don't forget everyone else! by kilfarsnar · · Score: 1

      The word dynamic here is perfect to describe this; the equilibrium of the world is such that if any entity (country, state, politician, military force, etc) won't take unfair or aggressive advantage of something, there will be another equal entity to fill that void. That's why it's silly to point out any one country for doing this kind of thing, because even if a country isn't, given the chance or risk:reward ratio, it would.

      That still sounds to me like an excuse to take unfair advantage. If I don't do it someone else will. I would like to see my country's foreign and domestic policy be a bit more equitable than that. A pipe dream, I know.

      But that attitude means it will never change. Sure, no blood for oil! I'm against foreign wars! Oh, but I'll also blame the government or big business or whomever if gas prices rise any higher. If you're against something, be against it. I'm in the military, so if someone gives me an order I don't like, I can either deal with it and follow the order, or I can decide I can't follow the order legally or in good conscience and refuse, but like everyone else I'd have to pay the consequences.

      Yeah, but some things I just have to accept. I can't fight all the battles, and I can't make everyting the way I think it should be (the rest of you should probably be glad about that). I can accept what I can't (or won't) change, but still not like that it goes on. I admit I am not willing to accept the personal consequences of trying to change what drives our foreign policy. But I will still speak out against it when I can.

      --
      "What the American public doesn't know is what makes them the American public." -Ray Zalinsky (Tommy Boy)
  10. It's called a narrative by DigiShaman · · Score: 2, Interesting

    The fact that there's a media narrative is hardly news. The purpose is to provide ratings. Anything that will lead to scandal, corruption, or supporting national politics is the name of the game. Fox does this to support Republicans, all the others support the Democrats. I suppose this is news to those that don't already know this however. And this "taking sides" of the national media is nothing new at all. Very old hat in American history.

    Ask any budding journalist as to why they want to be in this industry. Sometimes, you will hear a common theme of "To change the world for a better place". Generally that implies a motive with bias. No, their job to REPORT the news in its purest form. I'll tell ya, that can both end wars and create them. But oh no, we can't have that now can we? They should report the good, the bad, and the ugly with impartiality. BBC is the closest as it comes to doing that. Perhaps I'm giving them too much credit however.

    --
    Life is not for the lazy.
    1. Re:It's called a narrative by cold+fjord · · Score: 2

      BBC is the closest as it comes to doing that. Perhaps I'm giving them too much credit however.

      Although it is a venerable institution, the BBC has struggled with bias over the years.

      BBC had "massive bias to left:" director general

      The director general of the BBC admitted Thursday that his organisation had been guilty of a "massive bias to the left" but said "a completely different generation" of journalists now works at the broadcaster.
      Mark Thompson told the right-of-centre Spectator magazine that there was an institutional bias when he joined the organisation, reinforcing the findings of a 2007 internal report which concluded that greater efforts were required to avoid liberal bias.

      "In the BBC I joined 30 years ago, there was, in much of current affairs, in terms of people's personal politics, which were quite vocal, a massive bias to the left," Thompson said.

      --
      much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
    2. Re:It's called a narrative by Archtech · · Score: 1

      The BBC certainly is massively biased (in an institutional way). It's less a matter of overt censorship than of a pervasive worldview that makes censorship unnecessary. Presumably they don't even hire people who might be members of the "awkward squad", or who don't appear to share the standard politically correct establishment values.

      As a result, I think it's very inaccurate to describe the BBC's bias as "left wing". I would call it "establishment", which of course does beg the question of whether the British establishment itself nowadays is left wing. Certainly not in the old-fashioned sense of fighting to improve the lot of the working classes. Britain doesn't have a left wing or a right wing any more - certainly anyone who falls under those descriptions can only be on the fringes, without much influence. The Labour and Conservative parties are both very similar - middle-of-the-road, managerial, bien-pensant bourgeois. (Many people describe themselves as working class who obviously aren't, on cursory inspection of their financial means and chosen way of life).

      If I had to sum up the BBC's political orientation in a single word, it would have to be "smug". Turn on almost any program, and you can practically hear the journalists, newsreaders, and producers thinking "O God, I give thee thanks that I am not as the rest of men, extortioners, unjust, adulterers, as also is this publican". (Or, as it might be, "O God, I give thee thanks that we are enlightened Westerners, and not as the rest of men - nasty dictators like Assad or Qadafi, religious fanatics like the Muslims, repressive communists like the Chinese, bigoted sexists/racists/elitists...")

      --
      I am sure that there are many other solipsists out there.
  11. Gephi by Psychotria · · Score: 2

    The visualisations look like they were generated using Gephi. Interesting use. I wonder if the search for "search terms" was initially refined by graphing the raw data and continuing from there.

    1. Re:Gephi by biodata · · Score: 1

      The video has a reasonable explanation - they look at every word in each document and give it a relevance score - TF/IDF term-frequence/inverse document frequency - i.e. how often the word comes up in the document compared to how often it comes up in the whole document set. This gives you a rating for every word on how 'document-specific' its use is. Then for every pairwise comparison between documents you can calculate the distance between the pair of documents by looking at the overlap between the terms in the documents, scaled as the the TF/IDF of the terms. Once you know how far every document is away from every other document, you pick a threshold - ignore documents too far away from each other - and visualise links between the rest based on distance. Or something like that. People have been doing this with the scientific literature for a few years to save scientists from having to read everything ever published. I think it's safe to say the results are mixed - it is good at spotting the bleeding obvious, but also tends to highlight some interesting connections that people never noticed before. Choosing the thresholds is a work in progress i would say.

      --
      Korma: Good
  12. Re:ATTENTION ROB MALDA! by Anonymous Coward · · Score: 0, Offtopic

    But yer phone is up his ass, so how that work?

  13. But this isn't reasoned analysis by Anonymous Coward · · Score: 0

    Reasoned analysis would mean taking the entire corpus of documents, and coming up with stories based on what's in them.

    This process, on the other hand, is coming up with stories, then doing targeted searches of the documents to find material to back them up.

    1. Re:But this isn't reasoned analysis by MimeticLie · · Score: 4, Interesting

      Actually, if you watch the video, that's not what Stray is talking about. Rather than doing targeted searches, he's talking about processing the whole dataset and using algorithms to establish connections. The narrative that makes sense of those clusters is what would (hopefully) be the reasoned analysis.

  14. A microcosm of reporting as a whole by Archtech · · Score: 4, Insightful

    Mark Twain summed up the central problem of journalism with his epigram, "Get your facts first... then you can distort 'em as much as you please". But, amusing as it is, this completely misses the point! In the very process of "getting your facts" you have the opportunity - indeed, the obligation - of selecting them from among the infinite number of facts that you could choose. Having selected the facts that you think are most important, there is no longer the slightest need to distort them. The work is already done.

    Suppose you are the New York Times, and you are reporting on events in Afghanistan. You have a certain amount of space, so do you write up the IED explosion which killed a couple of NATO soldiers and put a few more in hospital - or do you describe the NATO helicopter raid that killed a dozen villagers and wounded another few dozen? Well, your readers are far more interested in the fate of NATO people (especially if they are from the USA); moreover, they don't particularly want to read about how their glorious forces have accidentally (or otherwise) killed a lot of civilians. So it's a no-brainer - you write up the IED event. After a few years of such a policy, consistently followed, readers get the idea that all that happens in Afghanistan is that NATO soldiers occasionally get blown up. Yes the NYT has accurately reported the facts. It hasn't reported all of them, but its editors could argue that such an attempt would be physically impossible. The only practical way of giving a more balanced impression would be to read, as well as the NYT, a newspaper that takes an anti-NATO, pro-Afghan point of view. But no such newspaper can survive commercially in the US market, because it wouldn't sell enough copies (even if it were allowed to go on operating for long).

    Indeed, the Wikileaks documents currently under discussion are subject to such a filtering effect too. Remember, all those documents were written by American officials, for US government consumption. You won't find many mentions in there of atrocities by our forces - even if the US authorities in Afghanistan or Washington were aware of such atrocities, they wouldn't put them into messages with such a low level of security. What you can expect to find is a fairly high level of unguarded opinions - either honest or carefully angled to make a particular desired impression.

    --
    I am sure that there are many other solipsists out there.
  15. Exposing Classified by Anonymous Coward · · Score: 0

    Arrest him.

  16. Garbage in garbage out by Anonymous Coward · · Score: 0

    If it uses US military terms, then there will be a significant bias as they declare that all dead Iraqis are insurgents.

  17. Yet another non story by doperative · · Score: 0

    "This type of data mining holds great potential for investigative revelation — and great potential for journalistic abuse"

    I don't think so ...

  18. Why.... by Luthair · · Score: 1

    Is this a link to (presumably) the submitter's blog, rather than the actual presentation available here: http://curiositycounts.com/post/6455747293/jonathan-stray-of-the-associated-press-on

    1. Re:Why.... by Fnord666 · · Score: 1

      Is this a link to (presumably) the submitter's blog, rather than the actual presentation available here.

      Given that the submitter meckdevil's associated email address is john.mecklin@sbcglobal.net and the link to TFA is on johnmecklin.wordpress.com, I would say yes. The linked page contains no content and readers should just use the link in the parent post. The submitter is nothing more than a link whore and if the editors were doing their jobs this wouldn't happen.

      --
      'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
    2. Re:Why.... by Fnord666 · · Score: 1

      Is this a link to (presumably) the submitter's blog, rather than the actual presentation available here: http://curiositycounts.com/post/6455747293/jonathan-stray-of-the-associated-press-on

      You can skip this site also. The presentation can be found here on vimeo.

      --
      'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
    3. Re:Why.... by Fnord666 · · Score: 1

      Is this a link to (presumably) the submitter's blog, rather than the actual presentation available here: http://curiositycounts.com/post/6455747293/jonathan-stray-of-the-associated-press-on

      You can skip this site also. The presentation and the related discussion is in the original post.

      --
      'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
  19. Not exactly news by Geoffrey.landis · · Score: 1

    "By choosing the parameters under which documents will be considered similar enough to pay attention to, journalist-programmers actually choose the frame in which a story will be told."

    Journalists already choose the frame in which a story will be told. They always have. That's not new.

    --
    http://www.geoffreylandis.com
  20. Ventura by danbuter · · Score: 0

    One thing we know, whatever the corps decide not to cover, if it's in the main body of documents, Jesse Ventura will find and make a book out of it. He was smart about that. And it probably got a bunch of people to read WikiLeaks info that otherwise would not have.