Ask Slashdot: How Do News Organizations Keep Track of So Much Information?
dryriver writes: Major news organizations from CNN, BBC, ABC to TIME magazine, the New York Times and the Economist publish a tremendous amount of information, especially now that almost everybody runs a 24/7 updated website alongside their TV channel, magazine or newspaper. Question: How do news organizations actually keep track of what must be 1000s of pieces of incoming information that are processed into news stories every day? If they are using software to manage all this info -- which makes a lot of sense -- is it off-the-shelf software that anybody can buy, or do major news organizations typically commission IT/software contractors to build them a custom "Information Management System" or similar? If there is good off-the-shelf software for managing a lot of information, who makes it and what is it called?
If it follows the narrative, they keep and publish it.
If it doesn't, they purge it.
Twitter supports and protects racists - by smearing their critics with the "Hate Speech" label.
They just push out their antiwhite nonsense 24/7.
Clearly they just make it all up! Now back to twitter... *adjusts hair*
Grep it for Russia. If it has something to say about Russia, put it on the front page. Otherwise, forget about it!
We're not doing your legwork for you.
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
Excel spreadsheets tens of thousands of lines long.
http://store.steampowered.com/app/570090/articydraft_3/
Articy:draft 3.
There is an industry software that gets used a lot called iNews. There's a reddit thread with comments from people who work at news orgs. Vox Media (The Verge, SBNation, Curbed, Polygon) built its own CMS called Chorus. The NYTimes uses WordPress for some of its blogs. And I assume the Washington Post built their own since, well, Bezos.
Have you tried contacting and asking such an organization this very question?
Anons need not reply. Questions end with a question mark.
The Python based Django framework was originally designed for this purpose. No doubt others use a different system, but a few use it.
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
Newsrooms depend on well-informed editors and reporters who are often notoriously paper-based. I've worked in six newspaper and magazine newsrooms and it's generally a central CMS/publishing/workflow system, plus a person-by-person armory of solutions (reporter's notebooks, things like Evernote, spreadsheets, etc.) There are systems used in intelligence that could find use, but journalists are kind of sensitive about doing things their own way -- in my experience. The real lifeblood of a newsroom is the channels of incoming info: wires, cable tv, Google News, etc.
They just wait until their popular opinion and propaganda come back to them again and then they just recycle the stories with a few different words and pictures.
Smart people exist. Maybe not in your profession, but they do.
Big, big manila folder.
At least, not a one-stop centralized system in most/many newsrooms in the US. This generally falls under the category of a reporter's responsibility to maintain this information in the way that works best for his/her/the team (of course, situations vary). It is not unusual to see a reporter storing all of their data exclusively on a personal Dropbox/OneDrive/etc account.
As for collaboration, organizations may use a product like SharePoint, but I'd be willing to wager that 90%+ of organizations in the US are still primarily relying on a SMB share running on a (probably outdated) version of Windows Server that holds this stuff.
The real ability to recall previous articles, etc, is based on cataloging, keywords and search. But typically, these only apply to the finalized versions of articles - and rarely, if ever, to the versions that were worked on before being sent for publishing approval.
When news organizations have needed to see what coverage existed on a subject in past decades at least, they'd find the guy who had access to LexisNexis and get some results from that.
At least that's what always comes up in inside-baseball discussions on news gathering stuff I've seen.
Ryan Fenton
Maybe there is someone else. These are wire services who publish a lot of the raw stories the other news organizations pick up and republish and pay to do so. If it's a really big story CNN and others will send their own reporters out.
The less obvious solutions may be the AP ENPS & AVID iNews programs. These are news oriented programs that aid in scripting, production, & publishing as well as archiving. Those tools allow organizations to subscribe to alerts from AP, affiliates, weather, & other news providers.
Then there is the more traditional methods of databases and good ol' notebooks.
The only news I watch is CNN and I can pretty much say for sure they have that many news items a day. Mostly it's just about Russian investigation or some comment made by Trump.
Major news organizations from CNN ..... nope.
The big organizations have armies of news robots that do their bidding. "Find me stories about inflation, robot." "Yes, master".
The questions seems weird to me. The media organizations I've been involved with have all gathered, filtered, and kept track of information using a loosely networked system of devices known as trained human brains. Much of information-gathering is subjective; there are many "pieces of information" that cross your desk each day which ultimately can and should be discarded, often because the "information" is simply inaccurate. I imagine it would be very difficult to train any kind of computer to make value judgments on something as vague and indeterminate as "information."
That said, one system that may resemble what the poster is talking about is the Bloomberg Terminal. It gathers information and news -- chiefly about financial markets -- and allows users to slice and dice it in various ways. I'm not sure any news-gathering organizations outside Bloomberg itself use it (or are allowed to use it), but Bloomberg makes a lot of money making the Terminals available to Wall Street traders and the like. (Subscriptions are VERY expensive.)
Breakfast served all day!
I know this space well. My consulting/integration company works with many, many media companies including the majors on this exact area. AMA? I've been doing this for 13 years, and literally work with many of the largest media companies on the planet.
There are two layers to the answer to this question. The first is storage and networking infrastructure, which is evolving very quickly for many reasons. Object storage, cloud (public/private/hybrid) -- all of these trends are having a massive impact on how the industry does things, but media is 5-10 years behind many other industries in adopting IT to solve particular challenges (our data needs are very, very high). So the move to object and cloud storage, taking advantage of 10GigE much less 40 and 100, seeing where fibre channel goes (SANs are used very extensively), the changing cost environment for all this stuff -- all these things are hitting int he media space big time.
The next layer is the software management layer. We call this "MAM" for Media Asset Management. It's a bit of a catch-all term, and sort of folds up to DAM, or Digital Asset Management, and contains within it PAM, or Production Asset Management. It is sort of a shorthand term that refers to:
Getting your media and other data behind a database
Utilize software automation and integration technologies to orchestrate all sorts of interesting workflows
MAM too is taking more and more advantage of the cloud and hybrid deployments. There are dozens of MAM vendors, with a handful of leaders. For instance Avid has PAM and MAM platforms they brand as "Interplay" (it's two different things). There are dozens of others, and I know many of them quite well. Again, my company does major MAM and workflow deployments for top-tier global M&E companies (among others). If I can answer questions, shoot 'em over.
Spew hate, false accusations and never admit you are wrong.
If a fact proves you wrong, call it fake news and build a conspiracy theory with no basis to distract your base with nonsense long enough for their tiny minds to forget the fact that would have changed their world view.
They use Sitecore
its the only thing they can use on their macs.
I don't know what the news organizations use, but governments have some pretty big data sets and they use platforms like ckan and OGPL.
Nullius in verba
Once approved by the govt of the day it becomes 'news' if not its 'fake news'
This perpetual motion machine Lisa made is a joke, it just keeps getting faster and faster. - Homer
Like the "Making of a Muslim Protest" video shows.
Join SMPTE. Get articles from back issues of their "Motion Imaging Journal" that deal with IT in the production workplace. MOS, Media Object Server, is one of the key acronyms. SDI, Serial Digital Interface, is the specification for the video pipeline hardware in many installations.
MOS leads you to ENPS. Follow that down the rabbit hole to as much knowledge as most people would want if the motivation is only curiosity. The whole system is quite flexible and complex. (MOS is a relatively modest part of the generic "Electronic News Production System". It all fits together and it all works surprisingly well.
(I worked on some of this sort of software in the 2000-2010 time frame.)
{^_^}
Just thought the OP might be interested in an actual answer rather than endless ill-informed snark.
Obviously approaches vary between different organisations. When I worked at the BBC they used a system called ENPS - Electronic News Preparation System. It was developed by the Associated Press, and it was geared largely towards broadcast operations. It collated information from journalists on the ground, agency reports, broadcast scripts and contact information for sources and subjects. Over time, more and more of those contacts were appended: "DO NOT CONTACT - DEAD."
It was labyrinthine, clunky and looked like something from the '80s, but folks understood it, it handled everything from raw material from reporters to the text that appeared on a presenter's autocue. It seemed to get the job done in quite a robust and reliable fashion. But it was largely based around the idea of broadcast running orders, and I think that might be part of the reason it's being phased out and replaced with a software suite called OpenMedia - https://www.annova.tv/en/products/index.php
These days I work at UK-based newspaper which has become one of the world's biggest news sites. We used to run largely off of Adobe software, but in recent years as we've become a squarely digital-first outlet, we've been using more software that's been developed in-house. We have something that looks a bit like the Wordpress editor to compose and edit pieces, and it's made the process of filing, subbing and publishing content much more streamlined and efficient. We have an image system which reporters and photojournalists can upload pictures to directly, and which also pulls in the latest pics from agencies. They're all tagged by the uploaders, so if there's, say, a bombing in Bangladesh, we can filter for "Bangladesh" and see images as they come in. Similarly, we can see wires stories from the AP, AFP etc. as part of our integrated system. We see it all, and we pay for the stuff we run.
We also have an in-house system for handling user-generated content. As recently as two or three years ago, people assumed that this would be one of the most important developments in the industry. But it hasn't taken off to anything like the extent people thought it would. People caught up in big events are far more likely to share pics and video on social networks than to contact a newspaper or broadcaster, and whenever a big story breaks you'll see a host of media people on Twitter saying: "Hey, I hope you're ok. Can we use this picture on our site and credit you?"
But one thing I'd say applies across all news outlets is that a surprising amount of information management is done on a purely human level. Individual journalists have their contact books, and often when they leave an organisation, they take those contacts with them. That's because it's one thing being able to get in touch with a source, but another thing entirely to build a mutually trusting relationship.
There's also a load of information contained in reporters' notebooks, and it's common practice to hang onto them for years in case anyone queries or disputes something you've written, or the way you've carried out your job (or in case you want to retire and write a best-selling memoir. A man can dream, right...?). The industry might like to portray itself as digital and connected and high-tech, but a huge amount of information is recorded in diaries, notepads and sticky notes affixed to monitors. It's also difficult for the Russians to hack your notebook.
...Rolodex. Learn it, love it, live it.
They dont, it is very obvious these days that they are not as fantastic as one once thought.
It is easy. They outsource it all. No, not to India. They just outsource it to the companies who then send them press releases. That is about 90% of the work done
The other 10% they copy and paste from Reuters.
Don't fight for your country, if your country does not fight for you.
That's a job for systemd, right?
I could post an article on one of those sites that says "Trump supporter wearing MAGA hat stabs African refugee with baseball bat as he comes off his sinking boat in Austria. Refugee was a 14 year old mother of two who has been grossly underweight her entire life." and they would believe it despite there being four major contradictions in two sentences.
And the commenter who points out that Austria is landlocked gets branded a racist.
UK tabloids have a far simpler method: they just make shit up.
I'm no fan of Labour's Diane Abbott - she's a liability and there are far more qualified people willing and able to take her position - but the abuse levelled at her during this recent election campaign has been vicious. She's stepped down from her role now and I wouldn't be too terribly surprised to learn that she just can't take the abuse any longer. She's been accused of being anti-British for suggesting that there was institutional racism in the UK in the 1980s, something the Daily Heil categorically denies; as a white man in his early 50s, I can only ask if they were living in the same country as the rest of us during that period. But what do you expect from the rag that supported Hitler and until very recently employed Katie 'Too Shit for Amstrad' Hopkins.
Oh, and don't vote Tory or we're all boned.
I mean like OMGWTFBBQ
You fuckin weirdos turned this into a Republicans vs. Democrats in the arena of "fake news" pissing contest.
Haha, WEIRDOS!
Or I'm in an echo chamber full of bots.
How lonely I feel.
Catatonic
Their editors scour the news agencies, like Associated Press for what they deem "news-worthy". These are standarized gateways, web api for importing purchased articles, which get pushed into local CMS, then manually, or half-automatically laid out. Duplicates are avoided through marking all purchases. If anything newsworthy is announced ahead of time, and the "higher ups" want something exclusive, reporters are send to provide own scoop - but great most of data comes from the agencies.
Generally, a reporter working for a newspaper or media outlet directly is a much more rare sight than a reporter working for a news agency; news aggregated in the agencies and then distributed to news outlets.
Source: worked at a news portal. The token reporter team existed only so that the portal would be still protected by press law, as mere "news aggregation" media can't get that around here.
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
I do this for a living, so my answer is somewhat detailed.
Newspapers were using content management systems for this purpose beginning around 1970, before PCs. Previous to that, stories were transmitted electronically, stored on punch tape in a 6-bit format, but edited on paper and re-keyboarded as necessary.
If you wanted to use a story as-is, without editing, you could have a copyboy go find the right punch tape and hand-carry it to the typesetting department.
Computerizing the editing process/approval process allowed written material to be stored, edited on screens, and output directly to electronic typesetters (which were already computerized; a major use of the PDP-8 was automated hyphenation and justification). The story "files" were typically organized in "queues" or "baskets."
The earliest CMS were bespoke, but they quickly became standardized -- "off the shelf" with potentially a great deal of customization, produced by about a dozen companies around the world that often designed and built their own hardware components.
Electronic page layout was pioneered on these systems. One of the first was at the Minneapolis Star and Tribune; the project leader later created founded Aldus, created Pagemaker, and the desktop publishing revolution followed.
As desktop publishing emerged, it displaced bespoke layout systems, and networked PCs displaced proprietary terminals, and SQL databases displaced proprietary storage, but the putting them together into a usable workflow system remained a specialty. In general, the CMS companies got out of the hardware business entirely and focused on software and services.
Photos came later. Keep in mind that the JPEG standard didn't even exist until the 1990s. The first wirephoto storage-and-editing systems were big bespoke monsters that looked like something from a 1950s sci-fi serial, but they were quickly replaced by Mac-based tools, and then the core CMS systems embraced photo management.
Broadcasting trailed all of this in many ways. TV stations actually produce fairly little information in the common sense of the word, and have lighter requirements for handling text, but huge amounts of data in the form of video. When I first worked in TV, video was shot on film, then videotape. As video became digitized and companies like Avid created digital video editors, managing the data became a requirement there as well, and a specialty.
It's now possible to put together a text/image/video workflow system with open source tools. For a single publication, I could do it in a few days with Drupal, and if the Web is the target, it's all pretty straightforward. But the news CMS field is still dominated by specialty vendors.
Print is still a huge driver of revenue, and that means interfacing with advertising workflow and print page layout tools. Adobe InDesign is pretty much the standard there, although I know of one or two systems that have proprietary layout. As a result, a small (and shrinking) number of specialty vendors dominate. They integrate off-the-shelf components, including open source tools and commercial software.
Where I work, writers are using CKEditor, but it's implemented in a proprietary Web-based workflow system that publishes to multiple Drupal sites on the Web and integrates with InDesign for print. Wire service information, agency photos, etc., all come into the CMS.
Because most of the older legacy systems are utterly print-focused, they can be extremely frustrating in a digital world. Some news companies have assemble parallel production systems for the Web, stitching together any number of off-the-shelf components, or writing proprietary code. If you use Django, you should know that it was created at a newspaper company. The Washington Post has created its own system called Arc that it is peddling to other news companies.
They don't.
Document management systems
https://en.wikipedia.org/wiki/Document_management_system
Don't remember which one the news paper i used to work for had.
Instead of saying who you are, try making an actual point?
Twitter supports and protects racists - by smearing their critics with the "Hate Speech" label.
If it follows the narrative, they keep and publish it.
If it doesn't, they purge it to keep the narrative.
Twitter supports and protects racists - by smearing their critics with the "Hate Speech" label.
The news is so partisan that they only need to keep track of half of the facts. I find that reality doesn't fit neatly into the right / left paradigm, or Dem / Rep if you prefer, but many like to act as if this binary representation doesn't have substantial quantization error.
I worked at the AP as a software engineer for seven years at HQ in Manhattan. One system I worked on for a few years was the "Desk" system, which is a set of 3 OpenVMS Alpha clusters (NY, London, Tokyo). This was the primary news collection and dissemination system known to outsiders as "The AP Wire". It accepted thousands of stories per hour from contributors, and transmitted thousands to paying clients. Clients were typically newspapers that received various "feeds" from the AP such as Business, Sports, World, etc.
New stories are tagged (sports, business, world, etc.) upon ingestion, both by editors and an automated system that can infer what a story is about. Additionally, company names are detected and linked to their stock ticker.
Distribution to clients is based on the tags accumulated by the story after these steps. It's pretty much automated, and has to be, given the volume of news moving through the system. The editorial user interface permitted searching and filtering, which is how folks managed the news of the day.
- The Kessel run is for nerf herders. I can circumnavigate the entire Central Finite Curve in a lot less than 12 parse
a large news organization will always have specialists, used to be they were called "something" desks in the inky cigar-littered past. the crime desk reporters covered the cops. the business desk reporters covered the business wires, ticker-tape, and wires. sports desk, you had reporters assigned to each team. and so on. Desk Editors rode herd.
reporter's desks were a mess of folders and papers, and older information was filed in the news morgue, a wall of file cabinets. Facts On File, an annual compendium of important stories and personalities, added filler and color.
nowdays, it's all in computers, on Nexis/Lexus, and the organization's own servers.
your Eyewitness Local Team Leader News Source Station, various trademarks licensed but not to me, the reporters who usually work their beats save a copy on their own computer.
Sloppy Sam who keeps no notes works off the cuff, and doesn't last long.
if this is supposed to be a new economy, how come they still want my old fashioned money?
The Bible, which was probably the most influential source of decision-making up to some time in the 20th century, was not exactly a monument to deductive decision-making. It's sort of the most successful fake news of all time.
When it's fake news it's easy
They make most of it up as they go.
This accounts for only about 40% of the news. They don't like paying AP at all.
They get the remaining 60% from trolling Twitter, Reddit, and Facebook all day.
Up next on Anderson Cooper 360; @ILikeEatingDicks Tweets that they don't like Donald Trump. We'll be covering this in depth, when we return. Back to you, Wolf.
It's one thing to say who you are, but a "virtue signal" doesn't answer the original question.
Try again.
Twitter supports and protects racists - by smearing their critics with the "Hate Speech" label.