Slashdot Mirror


Data Mining Rescues Investigative Journalism

John Mecklin sends in word of initiatives through which the digital revolution that has been undermining in-depth reportage may be ready to give something back, through a new academic and professional discipline known as "computational journalism." "James Hamilton, director of the DeWitt Wallace Center for Media and Democracy at Duke University, is in the process of filling an endowed chair with a professor who will develop sophisticated computing tools that enhance the capabilities — and, perhaps more important in this economic climate, the efficiency — of journalists and other citizens who are trying to hold public officials and institutions accountable. The goal: Computer algorithms that can sort through the huge amounts of databased information available on the Internet, providing public-interest reporters with sets of potential story leads they otherwise might never have found. Or, in short, data mining in the public interest."

20 of 91 comments (clear)

  1. Tools won't help with journalistic integrity... by Lumenary7204 · · Score: 5, Insightful

    It doesn't matter how efficient journalistic gum-shoeing becomes, because the end product will still be subject to a certain amount of spin by the publisher.

  2. so does this mean.. by spiffmastercow · · Score: 3, Insightful

    so does this mean maybe reporters will stop pulling statistics out of their asses once they have a tool to provide reliable statistics with a minimum of effort?

    1. Re:so does this mean.. by mac1235 · · Score: 5, Insightful

      No, most reporters will continue to copy PR releases into articles.

    2. Re:so does this mean.. by Lumenary7204 · · Score: 2, Insightful

      No, it just means they will shove the statistics with which they don't agree back up their asses where the sun don't shine.

      Out of sight, out of mind...

  3. That's all well and good by MikeRT · · Score: 5, Insightful

    But as it is, we can't get local news media to perform their "watchdog" role in most cases. I can't even begin to count the number of times when I've seen a case that looked suspicious as hell based on the reporting of it, but the local media just parroted the police/prosecutor's story and moved on. Alternatively, when they do get involved, it's often in cases like the Jena 6 where you end up finding out that the media was spreading disinformation and building up a narrative to make more profit.

    Most news media have become a combination of an AP outlet and a source of editorials and classifieds. They're like a primitive RSS feed with some mashed up content thrown in there for local flair.

    1. Re:That's all well and good by smittyoneeach · · Score: 2, Insightful

      More to the point, I want to know how you preclude all these shiny-miney algorithms from being tweaked with misinformation.
      Sure, the really gross stuff is going to get dumped, but the real Machiavellis will engage in propaganda oh so subtly...

      --
      Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
  4. In other news... by djupedal · · Score: 3, Insightful

    Investigative Journalism Rescues Data Mining

  5. Re:Dont get it by JustOK · · Score: 3, Funny

    Red means stop and read it, green means go and read it.

    --
    rewriting history since 2109
  6. sample top sekret mySQL code from the project by vlm · · Score: 4, Funny

    SELECT *
    FROM advertising_revenue_table, list_of_local_business_table
    WHERE advertising_revenue_table.business_name = list_of_local_business_table.business_name
      AND advertising_revenue_table.cost_of_ad_space_purchased = 100
      AND list_of_local_business_table.owners NOT IN (select names from list_of_publishers_buddies)
    ORDER BY cost_of_ad_space_purchased ASC

    --
    "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
  7. Don't re-write history... by Anonymous Coward · · Score: 3, Insightful

    The digital revolution didn't do-in journalism. That was Watergate. After that, and the Left's orgasm over the idea of reporters taking down presidents, propagandists are now all we have. Remember the 'fight' over which reporter would fly with Obama to Iraq, while no one was fighting to go with McCain all those times he went.

    Ask them: "Why be a journalist?"
                        "To make a difference." is the reply.

    By definition, journalists don't "make a difference", they tell a story. Propagandists "make a difference". Just ask Himmler.

    It's gotten so bad that, despite all the channels, and all the money-losing newsrooms on cable/satellite TV, the stories all use the same words. It's because the left owns almost all of them.

    Some might say this consensus makes them right, but it really doesn't. How many times is Fox News chided because they don't agree? Who's programmed, the TV, or us?

    What they leave OUT of a story is just as important as what gets IN.

    Until just the other day, Charlie Rose and (I think it was) Dan Rather were discussing Obama. "We don't know anything about him- who are his heroes?"

    Meanwhile so much was known about "Joe the plumber" that he could barely get work in his town.

    Meanwhile they sent 30+ reporters to scam information in Alaska about Palin, making up things when nothing was available.

    But no...two years of investigation on Obama turned up nothing. Not a word on broadcast TV about Bill Ayers (an unrepentant bomber of the Pentagon and murderer who got free on a technicality). Not a word about Obama's heros like Saul Alinsky (sp?) who is so far Left he bumps elbows with Stalin.

    These people are not in the periphery; these are people with whom he's tightly tied. But that doesn't matter any more, he's elected. Just remember you asked for it. He'll make history, alright.

    But now I suppose, we expect reporters to dig through computer data, and the digital revolution might do something for the industry. Well after being the top radio show host for two decades, they still think Limbaugh is racist. (Not hard to disprove) or fat (that was a decade ago). Yeah, those reporters are really hard working investigators. All they need do is *listen* to the show, and they won't do that.

    Journalism suffers from the same thing science does: loss of integrity. "Show me the money". And "vote for my guy". Truth no longer matters to these people, though it should to you.

    This 'digital revolution' will do nothing but help THEIR causes, not truth.

  8. Journalistic freedom is only theoretical by EmbeddedJanitor · · Score: 4, Insightful
    Journalism is not about reporting the truth, it is about contributing to and competing in an advertising and entertainment industry. In depth is not important, quickly generating good TV and print images to attract eyeballs and thus newspaper/advertising sales is everything. Getting access to the information and sources is an absolute must.

    The journalists groom their resources and need to keep in their sources good books to keep up access. Play ball and you get indented with a patrol so you can send back gripping combat footage. Piss off the brass and you get indented with the guys washing trucks at the transport park.

    It is no wonder that editors and TV execs are quick to fire and distance themselves from any journalists that forget this and start snooping too deeply. Just look at http://en.wikipedia.org/wiki/Peter_Arnett

    --
    Engineering is the art of compromise.
    1. Re:Journalistic freedom is only theoretical by lysergic.acid · · Score: 2, Insightful

      to be fair, what you're describing is the media industry, not journalism itself. journalism is a trade/discipline that serves a crucial role in a free & democratic society. that it has been bastardized and corrupted by commercial interests does not preclude the existence of true journalism which is based on professional integrity and a civic duty to keep the public informed.

      what i'm confused about is why the poster accuses the "digital revolution" of undermining in-depth reportage. there's a huge difference between undermining the profit margins of mainstream media news outlets and undermining the quality of journalism. if anything, the "digital revolution" has only fueled investigative journalism by breaking the monopoly previously held by mainstream news outlets.

      the web has given independent journalists an easy means of reaching a global audience, and it has also given the public an easy means of sampling a much wider variety of diverse news sources. this means that any inherent biases (and there will always be some bias) a particular news outlet demonstrates can be more easily identified and compensated for by the reader.

      and unlike the past where errors in reporting were rarely corrected or even acknowledged beyond a minor footnote buried in the back of the paper, the blogosphere ensures that any misreported information is quickly identified and that corrections are quickly propagated through the web. there have always been millions of eyes reading the news, but now those millions of eyes can easily do their own online research & fact checking and call journalists out when they report incorrect information.

  9. Subject by z-j-y · · Score: 4, Insightful

    It's not what journalists don't know. It's what they don't report.

    And basically people just don't care. Have we decided who to blame for the economy collapse yet? But bathroom foot tapping, wow, that's the shit we have to get to the bottom of it.

  10. Oh bull by Groo+Wanderer · · Score: 4, Interesting

    As someone who does investigative journalism for a living, data mining won't get you squat. Having done it for a living for 5+ years, and being very familiar with data mining, the two so rarely cross paths that it rounds to zero.

    Why? Because if it is in minable form, it doesn't take any digging to find. If you can run a google search and get even a tidbit about what you need, you don't need investigative journalism.

    Of the stories I have gotten, little ones like the P4 going 64 bits, it never reaching 4GHz, Dell exploding laptops (an assist on that one), and more recently the Nvidia bump cracking problem(s), none of that would have been possible through data mining.

    If it is out there, it doesn't need an investigative journalist. If it isn't, than data mining won't help. The end.

                    -Charlie

    1. Re:Oh bull by binpajama · · Score: 2, Interesting

      I'm a grad student and have recently been asked to help out on a research grant proposal for the very same thing. I agree with the point made in the parent post - if its already out there, there's not much investigation needed. Additionally:

      1) How will algorithms figure out if a story is relevant? There's no deux ex machina here. It will see if the article has the relevant buzzwords and if it has been released by a reputable source.

      2) The buzzword factor kills the algorithm's chances of finding something really new. Its just going to find something that is `current'. Thus, its doing news aggregation, not investigative journalism.

      3) The `reputable' source issue will be decided by looking at factors like source authority (measured by incoming links etc) which means that the algorithm will be scraping sites that are already highly visible. Again, this is simply `Google News' by another name. I cannot think of a way by which algorithms can look into nooks and crannies of the internet by being agnostic about source reputation. If they tried, they would quickly start coming up with 9/11 conspiracy theories and other balderdash as news reports.

      Basically, data mining is going the way of fuzzy logic. It has reached saturation in terms of its utility and applications, and now people are trying to sell all kinds of possibilities to allow for the overshoot in academia (too many PhDs, too little to do).

  11. Just another use by emilienne · · Score: 2, Interesting

    The Cline Center for Democracy at UIUC has been running a data mining project, scanning archives and contents of newspapers around the world for reports of political disturbances such as riots &tc. The project, a collaboration between the center and the UIUC CS department, is meant to facilitate research on domestic stability and the like. Currently it's focused primarily on English papers, but efficiency and completeness will dictate searches in other languages sooner or later.

    Information can be suppressed or 'spun', but at least this will ensure that the data's available for such evaluations instead of paying some graduate student peanuts for years and years to put it together.

    Of course it does mean that I'm sort of out of a job...

  12. leave the data mining to bloggers by DrEasy · · Score: 2, Insightful

    To me a journalist is someone who provides the raw data. In the "Web 2.0" world (pardon the buzzword), anybody can do the data mining and editorializing, and it's great to be able to read different interpretations of the same data by different people.

    This is what happens in the sabermetrics world (i.e. baseball stats analysis). Some source provides the raw data, but people merrily discuss and disagree on its meaning on various blog sites. There is none of this confusing mix of data and biased interpretation that you get in most news reporting nowadays.

    If a blog is commercially successful, it will be an incentive to the blogger to dig out more raw data, or rather get a journalist to find him some, as it's not necessarily the same skill.

    --
    "In our tactical decisions, we are operating contrary to our strategic interest."
  13. Thomson Reuters Calais by InsurgentGeek · · Score: 2, Interesting

    If you're in the world of investigative journalism I'd encourage you to take a look at a new class of semantic data generation tools. New capabilities like Calais (www.opencalais.com) from Thomson Reuters allow you to ingest unstructured text (news articles, press releases, FOIA documents, whatever) and automatically extract semantic metadata like people, companies, management changes, natural disasters and hundreds of others. You can take the output of these tools and load them directly into databases to query. You could take news stories and build a social network of family relationships then play news events against the network. We're already seeing some initial uses in the area of investigative journalism and would love to see more. Jump in and give it a try.

  14. Journalism Still In Decline by DynaSoar · · Score: 2, Insightful

    > a new academic and professional discipline
    > known as "computational journalism."

    Differing only in complexity but not principle from the same sort of search engine journalism that's resulted in decline of both accountability and accuracy of news over the past decade. Perhaps some investigative journalism into the lack of actual investigation into investigation is in order. "Hits" != veracity.

    --
    "I may be synthetic, but I'm not stupid." -- Bishop 341-B
  15. Crackpottery goes mainstream by tjstork · · Score: 2, Insightful

    You may as well rename this, "Crackpottery goes mainstream". Instead of calling a few people, doing a couple of interviews, writing up their impressions as a story, journalists will now have automation to help them do what nuts do. Just like so-called UFO, alien and jfk assassination researchers do manually, journalists will be able to arrange players, dates and events to fit any tale imaginable. Government, UN, corporate, environmental conspiracy stories will abound, and the sky is the limit.

    --
    This is my sig.