Slashdot Mirror


How Journalists Data-Mined the Wikileaks Docs

meckdevil writes "Associated Press developer-journalist extraordinaire Jonathan Stray gives a brilliant explanation of the use of data-mining strategies to winnow and wring journalistic sense out of massive numbers of documents, using the Iraq and Afghanistan war logs released by Wikileaks as a case in point. The concepts for focusing on certain groups of documents and ignoring others are hardly new; they underlie the algorithms used by the major Web search engines. Their use in a journalistic context is on a cutting edge, though, and it raises a fascinating quandary: By choosing the parameters under which documents will be considered similar enough to pay attention to, journalist-programmers actually choose the frame in which a story will be told. This type of data mining holds great potential for investigative revelation — and great potential for journalistic abuse."

7 of 59 comments (clear)

  1. We're Not Limited to Only One Context by MimeticLie · · Score: 5, Insightful

    Isn't that one of the major reasons we have journalism? To synthesize and contextualize information? If the contextualized (or perhaps editorialized, depending on your point of view) information was the only kind available, then yes that is an issue. But with Wikileaks, the data is there for anyone who wants to parse it.

    This strikes me as being similar to when Anderson Cooper was criticized for calling Mubarak a liar. Or the behavior that Colbert mocked the White House press corps for at the correspondents' dinner. Pretending that journalists are free of bias doesn't make it so, and saying that they should just regurgitate facts and talking points verbatim is counter-productive. Reasoned analysis should be encouraged.

    1. Re:We're Not Limited to Only One Context by MimeticLie · · Score: 4, Insightful

      That's part of the point of the video, using data mining techniques to broaden analytical tools beyond a simple keyword search and the preconceptions it can reinforce (the reporter mentions seeing a cluster of tanker truck incidents that was bigger than his organization was previously aware). He ends by noting that the way one writes the algorithm can determine what trends pop out and thus how the story is framed, which seems like a perfectly reasonable statement. Then someone (either the submitter or the Slashdot editors) transforms that into a "great potential for journalistic abuse."

      I don't have an issue with the methodology portrayed in the video. But to than take the presenter's words and twist them to support a "just the facts, ma'am" style of journalism seems dishonest and unproductive.

  2. Re:I just used grep -P by buchner.johannes · · Score: 4, Insightful

    Worked miracles after I've gotten around the ugly HTML format they use to release all those INFORMATIONS. Still, there was very little new or worthwhile in the heap of those news clips and rumour aggregations. Frankly, the more I grep it, the less it looks like the "largest leak in history", and the more it seems like "the largest controlled release of information" in history.

    / takes off conspiracy theory hat // flame on

    When you use grep you have to know what you grep for. You can not stumble upon a search keyword with grep.
    Clustering allows that, if you let it build the clusters itself. Perhaps you are missing out on the interesting bits.

    --
    NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
  3. Re:But this isn't reasoned analysis by MimeticLie · · Score: 4, Interesting

    Actually, if you watch the video, that's not what Stray is talking about. Rather than doing targeted searches, he's talking about processing the whole dataset and using algorithms to establish connections. The narrative that makes sense of those clusters is what would (hopefully) be the reasoned analysis.

  4. Re:I just used grep -P by wvmarle · · Score: 4, Insightful

    If you know what's in the documents, then life gets easy of course. The trouble is that usually you do not know what's in the documents without reading them. And if there's nothing new, that's a pity. But anyway the fact that one could say "there is nothing but local newspaper clips and gossip" in a set of documents indicates that they actually went through them all.

    And for sure with the WikiLeaks documents there's a lot of noise in it. The same will be with the Palin e-mail trove. And finding the interesting bits out of that enormous noise that's what journalists are for, and what those interesting bits are no journalist will know beforehand - which is exactly why they are interesting.

  5. Re:Don't forget everyone else! by cold+fjord · · Score: 4, Insightful

    Perhaps the US shouldn't be doing things that it has to keep so secret. That's just a consequence of empire-building. Preach one value to the masses, do something else in practice.

    Is it the duty of the United States government to serve the interests of the United States, as opposed to say, Iran? Is it the duty of the United States government to care for and protect its people, as opposed to say, the people of Venezuela? If so, then it must differentiate between different sets of interests, American, and those of others.

    If American citizens have been taken prisoner unlawfully by pirates, the United States government could try to negotiate with the pirates. If the pirates want $1,000,000, but the US is willing to pay $20,000,000, should the government go in and up front announce the maximum amount they are willing to pay instead of try to pay the least amount? Wouldn't that be a fundamentally stupid bargaining tactic? But to do that, they would need to keep secrets from the pirates. Well, not just pirates, they would need to keep it secret from the media, since there are many media outlets that would gladly publish it, and force the US to pay $20,000,000 instead of $1,000,000. So, do you think the US should keep the maximum bid a secret and serve American interests, or announce it and server pirate interests by undermining the government's own negotiating position?

    Let's say negotiations with the pirates are going badly, they heard in media that the government is willing to pay $20,000,000 but they got greedy and now think they can get $50,000,000. The US Government isn't willing to pay that much, decides to use a commando raid to rescue the hostages while stalling in negotiations. Military actions are generally at least twice as effective over short periods of time when the attacking force attains surprise. Even if the pirates think it is possible, they don't really know if, when, how, who, or where they will come from. Should the US Government announce to the pirates that it has given up negotiations, and that it is going to use military force to free its citizens? If not, that would mean keeping a secret from the pirates - do you oppose that? Of course, it will also have to keep the rescue plan secret from the media as well or it will be published, the pirates will see it, and will be prepared to defeat it. Should the government tell the next of kin that it is going to try a military rescue? They might tell the media, or their kin being held by the pirates, and either the media or the prisoners might tell the pirates. So, it looks like we can't tell the pirates, the media, or the next of kin. What about other people in the United States? Same problem.

    As part of the planning for the rescue mission, it appears that it would be really helpful to refuel some aircraft in a country near where the pirates are holding the American captives. This third country has a government that is friendly to the United States, but much of the population is hostile as they are being influenced by religious extremists from outside their country. The government of this third country agrees to the refueling operation at one of their island military bases, but demand that it be kept secret to avoid agitating their citizens. Since it helps the mission of recovering Americans help hostage, shouldn't the US make use of the island for refueling? What about the request to keep it secret? Should the US stir up problems in the country by making it known, despite the request of the government? If the use of the island is revealed, it could hurt diplomatic relations, and perhaps even generate civil unrest, getting people killed. Shouldn't this be kept a secret? From the pirates? From the media?

    During the flight to the pirate locations, and on the ground, US forces will be using radios for command and control, and various flight operations. Should the US inform the pirates about the radio frequencies it uses? What about the media, who might listen in? Suppose a

    --
    much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
  6. A microcosm of reporting as a whole by Archtech · · Score: 4, Insightful

    Mark Twain summed up the central problem of journalism with his epigram, "Get your facts first... then you can distort 'em as much as you please". But, amusing as it is, this completely misses the point! In the very process of "getting your facts" you have the opportunity - indeed, the obligation - of selecting them from among the infinite number of facts that you could choose. Having selected the facts that you think are most important, there is no longer the slightest need to distort them. The work is already done.

    Suppose you are the New York Times, and you are reporting on events in Afghanistan. You have a certain amount of space, so do you write up the IED explosion which killed a couple of NATO soldiers and put a few more in hospital - or do you describe the NATO helicopter raid that killed a dozen villagers and wounded another few dozen? Well, your readers are far more interested in the fate of NATO people (especially if they are from the USA); moreover, they don't particularly want to read about how their glorious forces have accidentally (or otherwise) killed a lot of civilians. So it's a no-brainer - you write up the IED event. After a few years of such a policy, consistently followed, readers get the idea that all that happens in Afghanistan is that NATO soldiers occasionally get blown up. Yes the NYT has accurately reported the facts. It hasn't reported all of them, but its editors could argue that such an attempt would be physically impossible. The only practical way of giving a more balanced impression would be to read, as well as the NYT, a newspaper that takes an anti-NATO, pro-Afghan point of view. But no such newspaper can survive commercially in the US market, because it wouldn't sell enough copies (even if it were allowed to go on operating for long).

    Indeed, the Wikileaks documents currently under discussion are subject to such a filtering effect too. Remember, all those documents were written by American officials, for US government consumption. You won't find many mentions in there of atrocities by our forces - even if the US authorities in Afghanistan or Washington were aware of such atrocities, they wouldn't put them into messages with such a low level of security. What you can expect to find is a fairly high level of unguarded opinions - either honest or carefully angled to make a particular desired impression.

    --
    I am sure that there are many other solipsists out there.