Slashdot Mirror


How Journalists Data-Mined the Wikileaks Docs

meckdevil writes "Associated Press developer-journalist extraordinaire Jonathan Stray gives a brilliant explanation of the use of data-mining strategies to winnow and wring journalistic sense out of massive numbers of documents, using the Iraq and Afghanistan war logs released by Wikileaks as a case in point. The concepts for focusing on certain groups of documents and ignoring others are hardly new; they underlie the algorithms used by the major Web search engines. Their use in a journalistic context is on a cutting edge, though, and it raises a fascinating quandary: By choosing the parameters under which documents will be considered similar enough to pay attention to, journalist-programmers actually choose the frame in which a story will be told. This type of data mining holds great potential for investigative revelation — and great potential for journalistic abuse."

17 of 59 comments (clear)

  1. I just used grep -P by siddesu · · Score: 3, Interesting

    Worked miracles after I've gotten around the ugly HTML format they use to release all those INFORMATIONS. Still, there was very little new or worthwhile in the heap of those news clips and rumour aggregations. Frankly, the more I grep it, the less it looks like the "largest leak in history", and the more it seems like "the largest controlled release of information" in history.

    / takes off conspiracy theory hat // flame on

    1. Re:I just used grep -P by buchner.johannes · · Score: 4, Insightful

      Worked miracles after I've gotten around the ugly HTML format they use to release all those INFORMATIONS. Still, there was very little new or worthwhile in the heap of those news clips and rumour aggregations. Frankly, the more I grep it, the less it looks like the "largest leak in history", and the more it seems like "the largest controlled release of information" in history.

      / takes off conspiracy theory hat // flame on

      When you use grep you have to know what you grep for. You can not stumble upon a search keyword with grep.
      Clustering allows that, if you let it build the clusters itself. Perhaps you are missing out on the interesting bits.

      --
      NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
    2. Re:I just used grep -P by wvmarle · · Score: 4, Insightful

      If you know what's in the documents, then life gets easy of course. The trouble is that usually you do not know what's in the documents without reading them. And if there's nothing new, that's a pity. But anyway the fact that one could say "there is nothing but local newspaper clips and gossip" in a set of documents indicates that they actually went through them all.

      And for sure with the WikiLeaks documents there's a lot of noise in it. The same will be with the Palin e-mail trove. And finding the interesting bits out of that enormous noise that's what journalists are for, and what those interesting bits are no journalist will know beforehand - which is exactly why they are interesting.

  2. We're Not Limited to Only One Context by MimeticLie · · Score: 5, Insightful

    Isn't that one of the major reasons we have journalism? To synthesize and contextualize information? If the contextualized (or perhaps editorialized, depending on your point of view) information was the only kind available, then yes that is an issue. But with Wikileaks, the data is there for anyone who wants to parse it.

    This strikes me as being similar to when Anderson Cooper was criticized for calling Mubarak a liar. Or the behavior that Colbert mocked the White House press corps for at the correspondents' dinner. Pretending that journalists are free of bias doesn't make it so, and saying that they should just regurgitate facts and talking points verbatim is counter-productive. Reasoned analysis should be encouraged.

    1. Re:We're Not Limited to Only One Context by Anonymous Coward · · Score: 2, Interesting

      If the contextualized (or perhaps editorialized, depending on your point of view) information was the only kind available, then yes that is an issue. But with Wikileaks, the data is there for anyone who wants to parse it.

      If memory serves, and I'm not missing something in my quick re-read of the Wikipedia page, the leaked cables were not all made available to everyone. They were distributed to five major news organizations so more than one editorial staff could reasonably decide which material was newsworthy and which was too sensitive to publish (sarcastic example: the GPS coordinates of Obama's real long-form birth certificate). This is a reasonably good idea, but it does mean that there are only a handful of people who have access to all the documents.

      Have you ever heard that when you find something it's always in the last place you look? That's because you stop looking for it once you're satisfied. Similarly, an editor searching for terms that might confirm a previously-unsubstantiated rumor he's got tucked away in a story on the shelf may find what he's looking for, but he won't find the really juicy stuff he didn't know to look for.

      In a perfect world, the system would correct for this because some enterprising young journalists who are willing to "pound the pavement" and read the whole thing would uncover the stuff they missed. But because of the limited set of people who have access, that won't happen for a decade or two at the earliest. It's a necessary evil to prevent information like the locations of and personnel at sensitive sites from falling into the wrong hands.

    2. Re:We're Not Limited to Only One Context by MimeticLie · · Score: 4, Insightful

      That's part of the point of the video, using data mining techniques to broaden analytical tools beyond a simple keyword search and the preconceptions it can reinforce (the reporter mentions seeing a cluster of tanker truck incidents that was bigger than his organization was previously aware). He ends by noting that the way one writes the algorithm can determine what trends pop out and thus how the story is framed, which seems like a perfectly reasonable statement. Then someone (either the submitter or the Slashdot editors) transforms that into a "great potential for journalistic abuse."

      I don't have an issue with the methodology portrayed in the video. But to than take the presenter's words and twist them to support a "just the facts, ma'am" style of journalism seems dishonest and unproductive.

  3. Don't forget everyone else! by cold+fjord · · Score: 2, Informative

    Terrorists and foreign intelligence services will also be doing this to use against the United States and its allies, not just journalists. Wikileaks has provided the raw material for data mining to find things the US doesn't even realize about itself, or its allies. There is no surprise that Bradley Manning has been charged with aiding the enemy.

    The fallout continues, hopefully it won't be literally.

    Al-Qaeda Already Using Wikileaks Material Against Us
    Taliban Study WikiLeaks to Hunt Informants
    Wikileaks: US will have to reshuffle diplomats following revelations
    'They're informants... if they get killed, they deserve it': New book reveals shocking disregard of Julian Assange towards Afghans named in WikiLeaks cables

    Since I can anticipate the follow ups:
    No, Wikileaks didn't do an adequate job of scrubbing the documents of names at various points which is why they are useful to the Taliban and other groups building death lists.
    Yes, I have seen reports of people being killed due to Wikileaks publishing their name, you just have to dig a lot to find them. For some reason it doesn't seem to be a popular news item. Go figure.
    Oversight of US diplomacy, military, and intelligence activity is the role of the Congress elected by voters.

    Even if nobody was killed, Wikileaks has resulted in a significant disruption to US diplomacy and antiterrorism efforts. (You pull out informants due to their cover being blown and you lose valuable intelligence.)

    Poll finds that more Americans oppose WikiLeaks

    WASHINGTON — Americans overwhelmingly think that WikiLeaks is doing more harm than good by releasing classified U.S. diplomatic cables, and they want to see the people behind it prosecuted, according to a new McClatchy-Marist Poll.

    "Clearly people are very unhappy with it," said Lee Miringoff, the director of the Marist Institute for Public Opinion at Marist College in Poughkeepsie, N.Y., which conducted the national poll.

    The survey found that 70 percent of Americans think the leaks are doing more harm than good and want those who publish the secrets to be prosecuted.

    --
    much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
    1. Re:Don't forget everyone else! by cold+fjord · · Score: 2

      The Columbian drug cartels have been doing this sort of thing for years.

      --
      much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
    2. Re:Don't forget everyone else! by Anonymous Coward · · Score: 2, Insightful

      Life's a bitch. Perhaps the US shouldn't be doing things that it has to keep so secret. That's just a consequence of empire-building. Preach one value to the masses, do something else in practice.

      Is it more important to prop up the current system to keep a few agents of the empire safe from harm or is it more important to try to bring some sanity to the whole entire thing and do some longer-term good by shedding light on things people are afraid of showing to even our own public?

      Whether one agrees with the leaks or not, it's quite obvious from the cables that we're doing some rather unsettling things that I don't want to be associated with. I'm more concerned about the long term effects of that than the leaks themselves.

      It's a sorta philosophical debate... it's not a crime if you don't get caught, I guess. But now we're caught... what now? Pretend these things didn't happen?

    3. Re:Don't forget everyone else! by Dails · · Score: 3, Insightful

      we're doing some rather unsettling things that I don't want to be associated with

      And that's why you're not some sort of government agent doing those things. This attitude bothers me for the same reason the "No blood for oil" types bother me. You don't get how important that sort of thing is. No blood for oil? Then what will you shed blood for? Losing oil supplies will so vastly change your way of life that you would argue it impossible if someone accurately showed you. If you think shady goings-on are an endeavor unique to America, you need to wake up. Every country (EVERY country - if you're not an American, believe that your country does it, too) does that. Even if only to stay in power and not out of a desire to provide for the people, every government strives to provide a certain lifestyle or quality of life to the people, and this is the price. If you don't like it, stop doing anything that requires oil (drive a car, use electricity, buy processed foods, etc). Don't get upset at the government for doing what it has to to provide you with something you'd complain about losing (probably here).

    4. Re:Don't forget everyone else! by cold+fjord · · Score: 4, Insightful

      Perhaps the US shouldn't be doing things that it has to keep so secret. That's just a consequence of empire-building. Preach one value to the masses, do something else in practice.

      Is it the duty of the United States government to serve the interests of the United States, as opposed to say, Iran? Is it the duty of the United States government to care for and protect its people, as opposed to say, the people of Venezuela? If so, then it must differentiate between different sets of interests, American, and those of others.

      If American citizens have been taken prisoner unlawfully by pirates, the United States government could try to negotiate with the pirates. If the pirates want $1,000,000, but the US is willing to pay $20,000,000, should the government go in and up front announce the maximum amount they are willing to pay instead of try to pay the least amount? Wouldn't that be a fundamentally stupid bargaining tactic? But to do that, they would need to keep secrets from the pirates. Well, not just pirates, they would need to keep it secret from the media, since there are many media outlets that would gladly publish it, and force the US to pay $20,000,000 instead of $1,000,000. So, do you think the US should keep the maximum bid a secret and serve American interests, or announce it and server pirate interests by undermining the government's own negotiating position?

      Let's say negotiations with the pirates are going badly, they heard in media that the government is willing to pay $20,000,000 but they got greedy and now think they can get $50,000,000. The US Government isn't willing to pay that much, decides to use a commando raid to rescue the hostages while stalling in negotiations. Military actions are generally at least twice as effective over short periods of time when the attacking force attains surprise. Even if the pirates think it is possible, they don't really know if, when, how, who, or where they will come from. Should the US Government announce to the pirates that it has given up negotiations, and that it is going to use military force to free its citizens? If not, that would mean keeping a secret from the pirates - do you oppose that? Of course, it will also have to keep the rescue plan secret from the media as well or it will be published, the pirates will see it, and will be prepared to defeat it. Should the government tell the next of kin that it is going to try a military rescue? They might tell the media, or their kin being held by the pirates, and either the media or the prisoners might tell the pirates. So, it looks like we can't tell the pirates, the media, or the next of kin. What about other people in the United States? Same problem.

      As part of the planning for the rescue mission, it appears that it would be really helpful to refuel some aircraft in a country near where the pirates are holding the American captives. This third country has a government that is friendly to the United States, but much of the population is hostile as they are being influenced by religious extremists from outside their country. The government of this third country agrees to the refueling operation at one of their island military bases, but demand that it be kept secret to avoid agitating their citizens. Since it helps the mission of recovering Americans help hostage, shouldn't the US make use of the island for refueling? What about the request to keep it secret? Should the US stir up problems in the country by making it known, despite the request of the government? If the use of the island is revealed, it could hurt diplomatic relations, and perhaps even generate civil unrest, getting people killed. Shouldn't this be kept a secret? From the pirates? From the media?

      During the flight to the pirate locations, and on the ground, US forces will be using radios for command and control, and various flight operations. Should the US inform the pirates about the radio frequencies it uses? What about the media, who might listen in? Suppose a

      --
      much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
  4. It's called a narrative by DigiShaman · · Score: 2, Interesting

    The fact that there's a media narrative is hardly news. The purpose is to provide ratings. Anything that will lead to scandal, corruption, or supporting national politics is the name of the game. Fox does this to support Republicans, all the others support the Democrats. I suppose this is news to those that don't already know this however. And this "taking sides" of the national media is nothing new at all. Very old hat in American history.

    Ask any budding journalist as to why they want to be in this industry. Sometimes, you will hear a common theme of "To change the world for a better place". Generally that implies a motive with bias. No, their job to REPORT the news in its purest form. I'll tell ya, that can both end wars and create them. But oh no, we can't have that now can we? They should report the good, the bad, and the ugly with impartiality. BBC is the closest as it comes to doing that. Perhaps I'm giving them too much credit however.

    --
    Life is not for the lazy.
    1. Re:It's called a narrative by cold+fjord · · Score: 2

      BBC is the closest as it comes to doing that. Perhaps I'm giving them too much credit however.

      Although it is a venerable institution, the BBC has struggled with bias over the years.

      BBC had "massive bias to left:" director general

      The director general of the BBC admitted Thursday that his organisation had been guilty of a "massive bias to the left" but said "a completely different generation" of journalists now works at the broadcaster.
      Mark Thompson told the right-of-centre Spectator magazine that there was an institutional bias when he joined the organisation, reinforcing the findings of a 2007 internal report which concluded that greater efforts were required to avoid liberal bias.

      "In the BBC I joined 30 years ago, there was, in much of current affairs, in terms of people's personal politics, which were quite vocal, a massive bias to the left," Thompson said.

      --
      much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
  5. Re:Not Newsworthy by Anonymous Coward · · Score: 3, Insightful

    I think you miss the point - that it was used in a journalistic context most certainly *is* newsworthy: the AP guy was going to great lengths to stress evidence-based reporting, and uncovering associations, vice pre-supposing those things and backfitting the data.

    Data mining - like stats - allows bias to creep in quite readily, and once a study, a number, a story is out there, it's very difficult to pull it back, even when it's demonstrably wrong, biased or fabricated.

  6. Gephi by Psychotria · · Score: 2

    The visualisations look like they were generated using Gephi. Interesting use. I wonder if the search for "search terms" was initially refined by graphing the raw data and continuing from there.

  7. Re:But this isn't reasoned analysis by MimeticLie · · Score: 4, Interesting

    Actually, if you watch the video, that's not what Stray is talking about. Rather than doing targeted searches, he's talking about processing the whole dataset and using algorithms to establish connections. The narrative that makes sense of those clusters is what would (hopefully) be the reasoned analysis.

  8. A microcosm of reporting as a whole by Archtech · · Score: 4, Insightful

    Mark Twain summed up the central problem of journalism with his epigram, "Get your facts first... then you can distort 'em as much as you please". But, amusing as it is, this completely misses the point! In the very process of "getting your facts" you have the opportunity - indeed, the obligation - of selecting them from among the infinite number of facts that you could choose. Having selected the facts that you think are most important, there is no longer the slightest need to distort them. The work is already done.

    Suppose you are the New York Times, and you are reporting on events in Afghanistan. You have a certain amount of space, so do you write up the IED explosion which killed a couple of NATO soldiers and put a few more in hospital - or do you describe the NATO helicopter raid that killed a dozen villagers and wounded another few dozen? Well, your readers are far more interested in the fate of NATO people (especially if they are from the USA); moreover, they don't particularly want to read about how their glorious forces have accidentally (or otherwise) killed a lot of civilians. So it's a no-brainer - you write up the IED event. After a few years of such a policy, consistently followed, readers get the idea that all that happens in Afghanistan is that NATO soldiers occasionally get blown up. Yes the NYT has accurately reported the facts. It hasn't reported all of them, but its editors could argue that such an attempt would be physically impossible. The only practical way of giving a more balanced impression would be to read, as well as the NYT, a newspaper that takes an anti-NATO, pro-Afghan point of view. But no such newspaper can survive commercially in the US market, because it wouldn't sell enough copies (even if it were allowed to go on operating for long).

    Indeed, the Wikileaks documents currently under discussion are subject to such a filtering effect too. Remember, all those documents were written by American officials, for US government consumption. You won't find many mentions in there of atrocities by our forces - even if the US authorities in Afghanistan or Washington were aware of such atrocities, they wouldn't put them into messages with such a low level of security. What you can expect to find is a fairly high level of unguarded opinions - either honest or carefully angled to make a particular desired impression.

    --
    I am sure that there are many other solipsists out there.