Data Mining Rescues Investigative Journalism
John Mecklin sends in word of initiatives through which the digital revolution that has been undermining in-depth reportage may be ready to give something back, through a new academic and professional discipline known as "computational journalism." "James Hamilton, director of the DeWitt Wallace Center for Media and Democracy at Duke University, is in the process of filling an endowed chair with a professor who will develop sophisticated computing tools that enhance the capabilities — and, perhaps more important in this economic climate, the efficiency — of journalists and other citizens who are trying to hold public officials and institutions accountable. The goal: Computer algorithms that can sort through the huge amounts of databased information available on the Internet, providing public-interest reporters with sets of potential story leads they otherwise might never have found. Or, in short, data mining in the public interest."
What does it mean when a sd article is red?
It doesn't matter how efficient journalistic gum-shoeing becomes, because the end product will still be subject to a certain amount of spin by the publisher.
so does this mean maybe reporters will stop pulling statistics out of their asses once they have a tool to provide reliable statistics with a minimum of effort?
But as it is, we can't get local news media to perform their "watchdog" role in most cases. I can't even begin to count the number of times when I've seen a case that looked suspicious as hell based on the reporting of it, but the local media just parroted the police/prosecutor's story and moved on. Alternatively, when they do get involved, it's often in cases like the Jena 6 where you end up finding out that the media was spreading disinformation and building up a narrative to make more profit.
Most news media have become a combination of an AP outlet and a source of editorials and classifieds. They're like a primitive RSS feed with some mashed up content thrown in there for local flair.
Investigative Journalism Rescues Data Mining
SELECT *
FROM advertising_revenue_table, list_of_local_business_table
WHERE advertising_revenue_table.business_name = list_of_local_business_table.business_name
AND advertising_revenue_table.cost_of_ad_space_purchased = 100
AND list_of_local_business_table.owners NOT IN (select names from list_of_publishers_buddies)
ORDER BY cost_of_ad_space_purchased ASC
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
The digital revolution didn't do-in journalism. That was Watergate. After that, and the Left's orgasm over the idea of reporters taking down presidents, propagandists are now all we have. Remember the 'fight' over which reporter would fly with Obama to Iraq, while no one was fighting to go with McCain all those times he went.
Ask them: "Why be a journalist?"
"To make a difference." is the reply.
By definition, journalists don't "make a difference", they tell a story. Propagandists "make a difference". Just ask Himmler.
It's gotten so bad that, despite all the channels, and all the money-losing newsrooms on cable/satellite TV, the stories all use the same words. It's because the left owns almost all of them.
Some might say this consensus makes them right, but it really doesn't. How many times is Fox News chided because they don't agree? Who's programmed, the TV, or us?
What they leave OUT of a story is just as important as what gets IN.
Until just the other day, Charlie Rose and (I think it was) Dan Rather were discussing Obama. "We don't know anything about him- who are his heroes?"
Meanwhile so much was known about "Joe the plumber" that he could barely get work in his town.
Meanwhile they sent 30+ reporters to scam information in Alaska about Palin, making up things when nothing was available.
But no...two years of investigation on Obama turned up nothing. Not a word on broadcast TV about Bill Ayers (an unrepentant bomber of the Pentagon and murderer who got free on a technicality). Not a word about Obama's heros like Saul Alinsky (sp?) who is so far Left he bumps elbows with Stalin.
These people are not in the periphery; these are people with whom he's tightly tied. But that doesn't matter any more, he's elected. Just remember you asked for it. He'll make history, alright.
But now I suppose, we expect reporters to dig through computer data, and the digital revolution might do something for the industry. Well after being the top radio show host for two decades, they still think Limbaugh is racist. (Not hard to disprove) or fat (that was a decade ago). Yeah, those reporters are really hard working investigators. All they need do is *listen* to the show, and they won't do that.
Journalism suffers from the same thing science does: loss of integrity. "Show me the money". And "vote for my guy". Truth no longer matters to these people, though it should to you.
This 'digital revolution' will do nothing but help THEIR causes, not truth.
A lot (most?) TV and print media have well publicised portals for eye witnesses to call in, or send their photos. It's certainly cheaper than having to employ one of your own (or, god forbid, having to pay out for agency or newswire product) to get the pictures for the evening news. Plus, of course, eye witnesses give the impression of "real people" - so it's got to be genuine, hasn't it?
In fact, this sounds more like another step in dumbing down our media. Cutting costs and corners. If the quality of the service drops, well it doesn't matter since they're all doing it, so things keep in balance.
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
The journalists groom their resources and need to keep in their sources good books to keep up access. Play ball and you get indented with a patrol so you can send back gripping combat footage. Piss off the brass and you get indented with the guys washing trucks at the transport park.
It is no wonder that editors and TV execs are quick to fire and distance themselves from any journalists that forget this and start snooping too deeply. Just look at http://en.wikipedia.org/wiki/Peter_Arnett
Engineering is the art of compromise.
It's not what journalists don't know. It's what they don't report.
And basically people just don't care. Have we decided who to blame for the economy collapse yet? But bathroom foot tapping, wow, that's the shit we have to get to the bottom of it.
Oh I'm sure some OSS will come out to put an anti-spin to their spin using some of the same data mining they're using.
Shai Schticks:"You don't make peace with friends, you make peace with enemies"
"Journalism is not about reporting the truth, it is about contributing to and competing in an advertising and entertainment industry."
Which by your definition bloggers will never be journalists.
Shai Schticks:"You don't make peace with friends, you make peace with enemies"
It's not what journalists don't know. It's what they don't report.
And basically people just don't care. Have we decided who to blame for the economy collapse yet?
We're working on it.
Shai Schticks:"You don't make peace with friends, you make peace with enemies"
As someone who does investigative journalism for a living, data mining won't get you squat. Having done it for a living for 5+ years, and being very familiar with data mining, the two so rarely cross paths that it rounds to zero.
Why? Because if it is in minable form, it doesn't take any digging to find. If you can run a google search and get even a tidbit about what you need, you don't need investigative journalism.
Of the stories I have gotten, little ones like the P4 going 64 bits, it never reaching 4GHz, Dell exploding laptops (an assist on that one), and more recently the Nvidia bump cracking problem(s), none of that would have been possible through data mining.
If it is out there, it doesn't need an investigative journalist. If it isn't, than data mining won't help. The end.
-Charlie
This is basically content analysis, invented by British linguist professors during World War II, for the request of the secret service, in order to find out if the newly invented and deployed RADAR was effective against German submarines.
Since then all secret services have been using it as analytical research method.
The Cline Center for Democracy at UIUC has been running a data mining project, scanning archives and contents of newspapers around the world for reports of political disturbances such as riots &tc. The project, a collaboration between the center and the UIUC CS department, is meant to facilitate research on domestic stability and the like. Currently it's focused primarily on English papers, but efficiency and completeness will dictate searches in other languages sooner or later.
Information can be suppressed or 'spun', but at least this will ensure that the data's available for such evaluations instead of paying some graduate student peanuts for years and years to put it together.
Of course it does mean that I'm sort of out of a job...
To me a journalist is someone who provides the raw data. In the "Web 2.0" world (pardon the buzzword), anybody can do the data mining and editorializing, and it's great to be able to read different interpretations of the same data by different people.
This is what happens in the sabermetrics world (i.e. baseball stats analysis). Some source provides the raw data, but people merrily discuss and disagree on its meaning on various blog sites. There is none of this confusing mix of data and biased interpretation that you get in most news reporting nowadays.
If a blog is commercially successful, it will be an incentive to the blogger to dig out more raw data, or rather get a journalist to find him some, as it's not necessarily the same skill.
"In our tactical decisions, we are operating contrary to our strategic interest."
Why bother ? "Journalists" already have access to Facebook and MySpace and they can even hit Wikipedia for a quote now and again. What more do they need to write a sensationalist op-ed ?
They don't even bother harrassing family members for photos anymore - they rip them straight from Facebook. All the pics, family links, likes and dislikes...
Bobby Young (pictured left) died tragically yesterday when...blah..blah..blah. The 8 year old university student, a deeply religious man and devout Jedi, was said to be in a complicated open relationship with his best friend David, and will be sadly missed by fiancee Kelly, and friends Garry, Fords4EVA and KnowWhenToHoldem...blah blah blah... Bobby leaves behind two daughters, Maddy, 4, and Kera,6, (pictured here holidaying in Hawaii in 2006)...
And there is already some reporting by bloggers which is not reported by reporters.
Most of intelligent /. contributors have families to be with and work to prepare for. The weekend always brings out the crazies howling at the moon and the moderators who consider it music.
The question is more along the lines of "what is a journalist".
Right now, it seems that a transcription machine meets the criteria. The current "journalists" simply do not ask (and follow up on) meaningful questions. They ask crap questions and focus on non-issues. And then they accept non-answers to those questions.
I'd be very surprised if the majority (51%+) of "political" "journalists" could even name their own Congress Critters.
And tech "journalism" is even worse.
About the only fields where they get it right are "sports" and "fashion".
From the context, it sounds like you are phrasing that as a negative.
So, make a statement that can be tested as to what, specifically, you believe he will do.
Otherwise you're the same as the people you denigrate.
Truth is a difficult thing. I'll stick to facts. They're easier to validate.
and has been used in at least some news organizations, because more than ten years ago I wrote a data mining program for Crain Communications (publisher of "Crain's New York Business," "Advertising Age," "Pensions & Investments," and "Crain's Chicago Business." They used it to identify trends, which is a crude use of data mining but something used to fill space nonetheless.
So I don't think data mining per se will help citizens and bloggers do more investigative journalism, but the increasing availability of the information in electronic form at all. With the many channels available for publishing information online nowadays, the only thing citizen journalists need to break stories is the ability to write and the will to ask questions. That gives them a huge advantage over traditional media, because for-profit news has a financial incentive to NOT have the will to ask questions.
Do what you can, with what you have, where you are.
I WIN!
--
A Jew
If you're in the world of investigative journalism I'd encourage you to take a look at a new class of semantic data generation tools. New capabilities like Calais (www.opencalais.com) from Thomson Reuters allow you to ingest unstructured text (news articles, press releases, FOIA documents, whatever) and automatically extract semantic metadata like people, companies, management changes, natural disasters and hundreds of others. You can take the output of these tools and load them directly into databases to query. You could take news stories and build a social network of family relationships then play news events against the network. We're already seeing some initial uses in the area of investigative journalism and would love to see more. Jump in and give it a try.
Sensationalism and hype dominate the media, the direct result being that I don't listen to them anymore unless I'm looking for something to laugh at.
> a new academic and professional discipline
> known as "computational journalism."
Differing only in complexity but not principle from the same sort of search engine journalism that's resulted in decline of both accountability and accuracy of news over the past decade. Perhaps some investigative journalism into the lack of actual investigation into investigation is in order. "Hits" != veracity.
"I may be synthetic, but I'm not stupid." -- Bishop 341-B
Nice to see an interest in computer assisted reporting (CAR), although I'm a little baffled at the article linked calling this an "emerging" practice. I've been at this for about a decade, and there were plenty here when I showed up.
A few observations:
1) Regarding other commenters. anyone who talks about "journalism" as if the field is one homogeneous, cohesive group are maybe not thinking too deeply about media. Kind of like how "Americans" or "humans" covers a lot of folks.
2) All journalism is data mining, in service of storytelling. Talking to people who know things is data mining. Googling is data mining. Crunching public databases, or building your own -- same stuff, but (sometimes) faster and more rigorous.
3) Minus buzzwords, this is just tech saavy reporters trying to pull out interesting facts cheaper, faster, and more accurately than the next girl. Given that much investigative reporting happens in non-traditional media these days, a lot of this is coming from nonprofits.
4) Some projects to check out if you're interested:
Center for Public Integrity (http://www.publicintegrity.org) - disclosure, I worked there back in the day
Sunlight Foundation: http://www.sunlightfoundation.com/
Two very smart people in the field - Aron Pilhofer ( http://www.oldmedianewtricks.com/old-media-interview-aron-pilhofer-interactive-guru-editor-at-the-new-york-times/ )
and Derek Willis ( http://www.thescoop.org/ ), currently both at the New York Times.
You can also check out the National Institute of Computer Assisted Reporting (NICAR).
Closing thought: you can make a smarter database, search algorithm, etc, but ultimately it comes down to a reporter who can interpret the information available to her, understand what stories matter and present that to the public in a form that is interesting and accurate. Technology is a helpful tool, but that's still a very human enterprise.
It doesn't matter how efficient journalistic gum-shoeing becomes, because the end product will still be subject to a certain amount of spin by the publisher.
The only thing that will save "journalistic integrity" is the journalism field adhering to openly stated ethical principle and practices. No amount of technology is going to fix that problem.
Life is hard, and the world is cruel
You may as well rename this, "Crackpottery goes mainstream". Instead of calling a few people, doing a couple of interviews, writing up their impressions as a story, journalists will now have automation to help them do what nuts do. Just like so-called UFO, alien and jfk assassination researchers do manually, journalists will be able to arrange players, dates and events to fit any tale imaginable. Government, UN, corporate, environmental conspiracy stories will abound, and the sky is the limit.
This is my sig.
I think we on the right need to stop crying about the "left wing" media, when, we now have our own media outlets too. We dominate radio, we have a good and growing presence on TV, and our print is expanding while theirs is shrinking.
The fact is, we lost this election because the Republican Party has tried to fuse libertarian economic policies with social conservatism and that plan could not work at a time when libertarian economics is in considerable doubt. The conventional wisdom is that Republicans should focus on free trade and low taxes and drop the religious stuff, but that's political suicide given that the religious stuff has kept Bush in the oval office despite two wars and the economy in the tank. Tossing out the entire south and the midwest to placate a few libertarians arguing the proliferation of Walmarts, Toyotas and plenty of imported Chinese stuff is a good thing for America seems hardly a winning message.
If the GOP wants to win, it needs to get more country, more parochial, wrap itself in the flag even more, ditch the free trade, and be as much in favor of Made-In-The-USA on economic issues as it is on social issues.
1) Be nice to unions
2) Tout American stuff
3) Support the bailout of Detroit
Right there, with that, the GOP picks up PA and OH and probably even Michigan, and that wins oval office. Support the repeal of the voting rights act, stay anti-abortion and pro-gun, and that locks up the south.
Instead, we had John McCain running around arguing about saving America and putting country first, while at the same time defending free trade. Charges of isolationism are silly. Remember, America re-elected George W Bush in 2004 because he told the United Nations to pound sand, not in spite of it. Remember, in America, when Jesus wants to go hunting, he does so in his Chevy.
This is my sig.
If you're standing next to Bush, all the medias (and even Berlusconi and Sarkozy) will look leftist.(http://www.politicalcompass.org/analysis2)
Just ask Himmler.
Godwin's Law... no one else was goning to say it.
Journalists will continue to use the NULL search technique.
To support your claim of <insert cause here>, do a search the REQUIRES a lot of words that are descriptive of your opposition. Then, when NO RESULTS are found, you can write that no one opposes your claim of <insert cause here>.
- I live the greatest adventure anyone could possibly desire. - Tosk the Hunted
Computational Journalism is much broader than just data-mining. At Georgia Tech I taught courses in the area in 2007 and 2008 which covered everything from mobile newsgathering, to information visualization, automatic content analysis, social computing, storytelling and authorship, aggregation, summarization, information mashups, and consumption interfaces. The bigger question is: how can computation help in every aspect of journalism: gathering, sensemaking, authoring, and dissemination, while still maintaining the values and ethics of good journalism. Anyone interested in delving deeper into this should watch the videos from the Symposium on Computation and Journalism that we organized at Georgia Tech in 2008.
Talking about computing & journalism is a tough conversation to without specific examples. As with non-journalism software, results depend on the problem that's being solved. Good examples of state of the art Computational Journalism are USAToday.com's airport capacity monitor and NYTimes.com's Represent ... they're automated apps that aren't so much data mining as they are coherent presentations of useful, interesting data in flux, a distinction that the source article author failed to grasp.
McCain was everything that you said the GOP needed and he got destroyed
I was wrong about McCain, but a look at the demographics of this election is illuminating. Free trade cost McCain dearly. Every state that McCain lost is a state that has lost big in free trade, and that includes Virginia and North Carolina. The conventional wisdom is that values don't matter and Republicans should stick to their economic guns, but its just political suicide.
This is my sig.