The Rise of Machine-Written Journalism
Hugh Pickens writes "Peter Kirwan has an interesting article in Wired UK on the emergence of software that automates the collection, evaluation, and even reporting of news events. Thomson Reuters, the world's largest news agency, has started moving down this path, courtesy of an intriguing product with the nondescript name NewsScope, a machine-readable news service designed for financial institutions that make their money from automated, event-driven trading. The latest iteration of NewsScope 'scans and automatically extracts critical pieces of information' from US corporate press releases, eliminating the 'manual processes' that have traditionally kept so many financial journalists in gainful employment. At Northwestern University, a group of computer science and journalism students have developed a program called Stats Monkey that uses statistical data to generate news reports on baseball games. Stats Monkey identifies the players who change the course of games, alongside specific turning points in the action. The rest of the process involves on-the-fly assembly of templated 'narrative arcs' to describe the action in a format recognizable as a news story. 'No doubt Kurt Cagle, editor of XMLToday.org, was engaging in a bit of provocation when he recently suggested that an intelligent agent might win a Pulitzer Prize by 2030,' writes Kirwin. 'Of course, it won't be the software that takes home the prize: it'll be the programmers who wrote the code in the first place, something that Joseph Pultizer could never have anticipated.'"
Oh great, we're starting ANOTHER arms race. As if SEO isn't bad enough already, now we'll have NEO.
FATMOUSE + YOU = FATMOUSE
Another "machines will take my job" story. This is as old as technology itself.
As with all other technologies, the future will be vastly different than what we envision.
If that's a demonstration I feel sorry for the people who think we'll get anywhere by 2030.
Well-written prose is far from formulaic. While financial institutions and baseball enthusiasts may happily forego a penetrating understanding of a situations meaning and emotions the literate will not.
...look forward to our robojournalist overlords.
There's no -1 for "I don't get it."
A great fear of mine is that a machine will decide what I should or should not know about. Another is that a machine like this could be tampered with by any human being to make the same decision.
Big Brother SkyNet is watching you, and telling you all you need to know.
"Be prepared, son. That's my motto. Be prepared." --Joe Hallenbeck
News agencies have already been turned into commodities, they just don't realize it yet. Now the reporter is being sent down that same drain. With original reporting set to become a 'premium' by the news agencies, their market is only shrinking.
Where were the reporters when millions of jobs were outsourced by H1B's or sent overseas? At best most stories were brief, with no follow up, and no outrage at the loss of middle class America. The same thing has happened in Europe and elsewhere as well.
Now the reporter faces the inevitable market forces that they previously ignored, and they expect anyone left to care? The programs will only get better, the markets and stories it applies to will only improve, and for the vast majority of stories the quality will be imperceivable to the average person.
For a demo check out blog spam, and anything else 'internet marketing' related. A lot of that stuff is written by automated software or a guy in India.
Both look the same really...
On the Oregon Cost born and raised, On the beach is where I spent most of my days
People are going to start designing corporate press releases (or ultimately, all news if it starts going this direction) in such a way that it gets them attention, just like when people try to game google.
"The latest iteration of NewsScope 'scans and automatically extracts critical pieces of information' from US corporate press releases"
Extracting useful info from press releases? This must be absolutely amazing software.
News flash: Robotic reports indicate that all humans have died.
Oops, sorry, that was a programming error. The robots haven't figured out verb tenses yet.
Update: Ten, nine, eight...
"This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
Every odd once in a while I'll be visitting some forum or news site such as this one. Then, unexpectedly, someone named "Weatherbot blah blah blah" spews off some hurricane or tornado warning for some US Region or another, with a bunch of interesting numbers to go with it. Barometric pressure, chance of precipitation, current heading, time of arrival, all that nice junk.
Now, when I look at the news today, anything political/entertainment wise is as predictable as the weather. Israel is declaring Nuclear Ambiguity? Britney Lohan got another DUI?
I wouldn't mind a concise, point form, robot-like news post.
And I, for one, Welcome our new robotic news reporting overlords.
These brainless news ai bots couldn't possibly do worse than the /. editors!
What we need next is a news story motivation analyzer program.
It reads gazillions of news stories, has general models of human motivations
and human loyalty groupings etc, has a model of situation logic
which models the likely or perceived gains and losses that different
people or groups would experience depending on how situations evolve,
match that with what is being reported about the situation, and...
Annotate the news stories or statements within them with credibility
colour markings (with supporting notes.)
(So don't try to patent that by the way. It's now public domain.)
Where are we going and why are we in a handbasket?
This is nothing more than extracting stats and then placing them in pre-generated sentences.
In sports, this is okay. Except when something interesting happens like someone head-butting another player.
Anyone want to place a bet on how long before companies are accused of "gaming" the financial reporting system with their press releases?
I say bring it on. Maybe this will be a wake up call to journalists who have been more and more in the habit of parroting hearsay in their stories rather than bringing some real intelligence and analysis to their stories. If all they are going to be is puppets, well, I've got a Perl script for that!
Spam is where it's at. Spam is where we are going to see strong artificial intelligence emerge, both defensively and offensively. Spam already represents some of the most cutting-edge algorithms in machine learning today. Think about it. In the undefined when of the future: you will have AI that stops spam. Spam will be AI that attempts to get through your filters. The only spam your AI will let through is spam you are genuinely interested in or that befriends you: it provides something of value. At the base level however it does have its purpose: get you to buy something. This is the motivation of why machine intelligence will emerge in spam first: somebody, somewhere will be making money. Would you like to buy this new computer, it is well built and will enhance the effectiveness of your communication with your network of contacts? Also, if you do I will cover the shipping myself.
Shh.
Machine written Journalism can work because even a computer is as smart as many if not most of them.
can't wait until this meme meets its end.
I don't like Linux. This doesn't make me a troll.
"[...] The latest iteration of NewsScope 'scans and automatically extracts critical pieces of information' from US corporate press releases [...]" The interesting thing on it is that it could actually raise (again) the text quality on articles (regarding grammatical correctness), since the press releases are usually carefully reviewed, and the automated part would be just a copy-and-paste process. I don't know how it goes in the US, but here in Brazil we used to have the best writing guides published by our newspapers editors - something like "The NY Times Manual of Style and Usage". They're still published, actually, but apparently not used.
Probably due to the advent of web-based latest news, the article authors are not necessarily journalists or professional writes in any way - which means the grammar is usually bad (often really bad), with errors *way* beyond the common typos. It means the articles are not even spell-checked (typos wouldn't survive here - come on, you have spell checking on Slashdot commenting!), and there's no way to get them revised or something. I've already tried to click on those please-let-us-know-what-you-thought-about-it links, and found out that they have a binary filter: you're either appraising the author or being rude/disrespectful/offensive, therefore the comment will be ignored. As an example, the last comment I made was: "Please, review you article. It's full of typos and grammar errors". Obviously, evil-flagged.
AT &F1DT0,T0800665544 - Real men, real help desk support.
We've completed the circle - various "automated systems" have been blamed for various market failures in recent years, as companies and small traders have used algorithms on computers to "keep up with the speed of the market". Of course, the actual failure was almost always in the design, such as allowing a computer to make blind decisions with large amounts of money faster than you could keep track of.
But here, we have a stronger case for a machine-driven market failure - automated news algorithms. Misunderstanding generated at the speed of the market. I've worked on AI professionally in games, studied it in the contexts of linguistics, nervous system simulation, and such - AI even in its most exaggerated modern state is not going to even know how to figure out how to extract a good quote with human guidance, much less report on a news release. If you thought computer generated music was entertainingly bad - wait until you see some of the awful things produced by automated news misunderstandings... random context switches mixed with "neutral language" bits, it'll be like Fox news switched its agenda to Cthulu-level madness of confusion rather than the usual rage agenda.
And since the market makes its decisions on the basis of news, rumors, and insider trading - and people get the three confused as they hear them, mixing this into the information stream seems a virtual guarantee of another market crash.
That's what I call another serious negative externality for the news business taking the cheaper road to reporting business news.
Ryan Fenton
I'm trying to figure it out. Is it a typo that wonderfully illustrates the benefit of welcoming automated editors? Is steakthskynet what our meatspace reporters should be called? Or is it simply an insightful tag tragically misspelled?
... what does this mean for the famous "liberal media bias"? Will these systems have a variable that can be used to "adjust" this so-called bias? If so, who gets to set it?
Maybe the horrible quality of journalism we've seen lately has been due to a prevalence of software written articles...
Then again, maybe the current crop of journalists can't write their way out of wet paper bag, even if you give them a chainsaw.
Considering the competition, the idea of software winning the Pulitzer seems almost inevitable...
...look forward to our meme-ending overlords.
You do know that the opposite of "liberal"
i.e. "allowing responsible citizens a measure of liberty and pursuit of their own judgement about how to conduct their lives"
is tribalist/authoritarian/totalitarian, don't you?
Where are we going and why are we in a handbasket?
Today, Slashdot.org, a technological news website published a story claiming that news stories could be automatically generated from computers. The absurdity of the story was not lost on the human rea...Attribute already declared at line 15, position 6 !.. Invalid I/O file..
----
Insert Amusing Human-like Pun Here
That stampeded the US public into a frenzy of redneck bloodlust against
"some random Arab country" (happened to be Iraq) that had zip
to do with any terrorist actions against the US?
Where are we going and why are we in a handbasket?
Wired occasionally carries good stories, but this ain't one of them. It sounds portentous and should play well to all the anti-journalism reactionaries and self-styled media pundits, but really this is just flying cars and robot butlers.
It's important to note here that NewsScope isn't a news service like Reuters; rather, it's a targeted data stream for the finance industry. Its output is not meant to replace the work of human journalists. Its output is not even meant to be read by humans.
But leave it to Wired to come up with an angle like "NewsScope has started carrying stories written by machines." A writer less enamored with breathless futurism might instead say that NewsScope parses corporate financial statements and extracts relevant data points, which it then summarizes in a machine-readable format, stripping out all the excess verbiage and historical statements that aren't useful to automated trading software. It's somewhat analogous to a search spider, one that builds an index of finance news as it crosses the wire, making the data easier for third-party software to query.
This isn't the Master Control AI writing news stories, people. It's a product -- and probably a pretty valuable one if you're in that industry.
Similarly, TFA says the program that generates news stories based on stats was "rigged up" by some college students. Is it useful? Potentially. Is its output capable of replacing human sports journalists? Is it even publishable? There's no evidence that anybody even suggested that. How many of your college projects changed the world?
TFA goes on to talk about how reporters have been forced to pick through information by hand -- for example, reading volumes of PDFs -- and how much nicer it would be to have machine-readable data to query. Well, no kidding! You're not alone there, brother; I like Google, too.
And then, like so many breathless Wired article, this one evaporates:
Further out toward the horizon lies the prospect of intelligent systems that filter vast quantities of unstructured content, drawing inferences that can be formatted according to journalistic norms.
Uh-huh. Where can we find that horizon, precisely? And "formatted according to journalistic norms" -- what does that even mean? And then:
Along the way, of course, intelligent systems will need to start coping with the complexities of human language have so far confounded them, including idiom, metaphor and sarcasm.
"Of course," indeed. As Han Solo once said, "Well that's the real trick, isn't it?"
Breakfast served all day!
Anyone want to place a bet on how long before companies are accused of "gaming" the financial reporting system with their press releases?
As opposed to 'gaming' the media with their press releases? Isn't that what a PR person is supposed to to, create press releases that cast the company in a favorable light?
HA! I just wasted some of your bandwidth with a frivolous sig!
What perfect timing! I just finished my newspaper reading robot!
Did Forbin put an ad in an obscure paper stating that he had died.
The computer read the obit and let its guard down.
Forbin comes back to the project under an assumed name and offs the computer.
'scans and automatically extracts critical pieces of information' from US corporate press releases, eliminating the 'manual processes' that have traditionally kept so many financial journalists in gainful employment.
That, I must admit, is an excruciatingly lame definition of 'journalism'.
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
some text to satisfy lameness filter
http://slashdot.org/~GuyFawkes/journal
So now they finally admit they've not been doing Journalism for a long time now, just turning press releases into articles and marketing them.
What a surprise.
An important distinction here is between real investigative journalism, and prompt event reporting. Losing this distinction will result in lame AI news by automated article generators, and slow information gathering by humans. Building on this distinction will result in faster and larger data input streams automated and always on, feeding real journalists helping them build bigger pictures and recognizing what really matters. Jon Stuart can then filter it all and give us the real news.
It used to be we needed journalists to be our eyes and ears, but now with bloggers and phone cameras and tweets, that is not so much the case. Only a machine could gather all this information in real-time. It used to be that journalists would read deep in between the lines and provide us with insight, but now with Fox and MSNBC and even CNN all driven by politics, that is not so much the case. Only comics enjoy true journalistic freedom and can write their material with any honesty.
This sort of scenario is being to pervade society. Algorithmically generated data delivered to algorithmicaly centric channels, with decisions being made by some programmers handiwork or some suit's business "logic", society's ability to rationalize, analyze, and pontificate is being systematically eroded. How much longer until roves of professors are wandering rusted train-tracks, remembering the once visceral world of fine-grained literature? The more we eliminate our own 'humanity' from the processes of life, the faster we eliminate life from humanity.
'We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress.' RPF
There's really two different capabilities being discussed here. One (the Northwestern example) is the actual generation of prose from an underlying data asset. There are certain well structured domains of information (baseball games, earnings announcements, etc) where this will most likely work quite well. The second capability is automating the analysis of new content. NewsScope falls into that category. It takes raw news (written by humans) and extracts key terms, entities and events to make that content more easily consumable by machines. If you're interested you can use the same Thomson Reuters tools that are under NewsScope on your own content. My site uses them to analyze news from feeds, throw most of it away and put the rest in the right places. Thomson makes this capability available to anybody for free at a project called OpenCalais (see http://viewer.opencalais.com/ to play with it). Another group has built it into a complete publishing platform called OpenPublish.
[We] can only hope and pray these otherwise award-winning programmers
...[its] Master, and ...Team and [me]
have the communication skills necessary to once and for all preclude the
possibility of common spelling, grammar and punctuation errs.
"Dog Rescues it's Master" and
"Join the News Team and I" instead of
--might be good enough for a precocious third-grader, but it is uneducated drivel.
There is nothing to FEAR but NOTHING itself; and I fear there is a whole lot of nothing going on. --scorpivs
... unless it can also replicate the bad spelling and poor grammar that I see in everything from corporate Web pages to newspapers to magazines to national advertising to Slashot ;-). Was it always so in ages past, or is it simply that published words have become more democratic? Even people who are unqualified for the task are now able to write words for the whole world to see, but perhaps a century or more ago the process was so much more difficult and expensive that only a more restricted group was allowed the privilege?
A lot of print news is ridiculously formulaic; eg. the red-tops in the UK. I can certainly envisage a near-future where a sub-editor feeds in the answers to the Five Ws and out pops a story indistinguishable from a lot of the crap churned out today. There'll still be a market for human-written journalism for some time to come. But there's a hell of a lot of stuff in some papers that it's hard to imagine was written by a sentient being. If it already looks like it was written by an unattended typewriter, why bother employing someone to sit at the keyboard?
http://ihatehate.wordpress.com
I know that people like to be dramatic, but I think this is taking people's jobs only in the sense that individual people working at the job can be more productive and so one would need fewer of them, something that's been happening since the dawn of the industrial revolution. In the New York Times Book review of November 15, 2009, the cover review is of a book by Malcolm Gladwell done by Steven Pinker. According to the NYT book review editors, Gladwell that if he were trying to break into journalism today, he would start by getting a master's degree in statistics. AI to help process the statistics could be a big help to reporters, and that's just one example of how machines could help. As others have pointed out, there would be attempts to 'game' the machines, and, as still others have pointed out, wetware intelligence gets gamed already. In some sense, it just takes same old same old to a new level, but what can you do? The world changes and pretty much everything has to change with it.
In theory, theory and practice are the same; in practice they're different. (Yogi Berra & A. Einstein)
We've already got program that automatically generates research paper for you, called SCIgen
Wednesday December 31 2009
Breaking News..........
THIS IS FUN TO MAKE AN ARTICLE IN THE COMPUTER NEWS WEB
The BARRACK OBAMA was looking in the windows because SUPER BIN-LADEN was playing the drums! ULTRA AHMEDINADZAD was playing the trumpets!
In other news...........
A man today was standing on the gigantic shoe balloons that were nine hundred thousand feet tall. The shrimps were also in the spaceships. Candace20 from Oklahoma has video footage here! [spurious youtube link]
You're welcome.
On the other hand, Slashdot has been using it for years!
[FUCK BETA]
More on this topic at this Blog that I came across:
http://netcomber.blogspot.com/2009/12/automated-blogging-and-journalism.html