Ask Slashdot: What Happened To Semantic Publishing?
An anonymous reader writes There has always been a demand for semantically enriched content, even long before the digital era. Take a look at the New York Times Index, which has been continuously published since 1913. Nowadays, technology can meet the high demands for "clever" content, and big publishers like the BBC and the NY Times are opening their data and also making a good use of it.
In this post, the author argues that Semantic Publishing is the future and talks about articles enriched with relevant facts and infoboxes with related content. Yet his example dates back to 2010, and today arguably every news website suggests related articles and provides links to external sources. This raises several questions: Why is there not much noise on this topic lately? Does this mean that we are already in the future of Online (Semantic) Publishing? Do we have all the tools now (e.g. Linked Data, fast NoSQL/Graph/RDF datastores, etc.) and what remains to be done is simply refinement and evolution? What is the difference in "cleverness" of content from different providers?
In this post, the author argues that Semantic Publishing is the future and talks about articles enriched with relevant facts and infoboxes with related content. Yet his example dates back to 2010, and today arguably every news website suggests related articles and provides links to external sources. This raises several questions: Why is there not much noise on this topic lately? Does this mean that we are already in the future of Online (Semantic) Publishing? Do we have all the tools now (e.g. Linked Data, fast NoSQL/Graph/RDF datastores, etc.) and what remains to be done is simply refinement and evolution? What is the difference in "cleverness" of content from different providers?
What you publish should probably contain some 'semantic content' here or there, I guess, although that's not strictly necessary for modern media companies.
we have these newspaper boxes in NYC as well and they hold a lot of local and foreign language newspapers where people advertise local contractor services as well as rooms in their not so legally modified homes that were meant for one family.
stuff that people usually don't advertise through your internet ad agencies
I don't want Symantec publishing. Costs too much to renew every year while hogging all my available CPU and RAM
The publishers are (slowly) moving from simply copying plain-text, which they used to print (on dead trees), to web-sites, where hyper-linking is possible.
That's all you need — usually there is no reason to corral the links into a separate "info-box".
As the print-magazines wane and digital ones rise, this realization will come to the (still) technically-illiterate journalists and even their editors.
Meanwhile here on Slashdot (and other forums, where links are allowed), there is simply no excuse for making a claim without a clickable citation behind it... See the paragraph above for an example.
In Soviet Washington the swamp drains you.
There is a fine line between "clever" and "annoying". Very often, what gets considered as "related" content, is only tangently related, and sometimes the way it is displayed makes it indistinguishable from the content of the current article. Add to that all of the surrounding clickbait, and it just becomes a confusing mess.
Proverbs 21:19
I hate, hate, hate, hate web pages that have hot-linked words with popups. It is even worse when it is an advertisement. And those "recommended articles" at the end are just as bad. Click-bait links to content that is of no value.
People don't want "clever". They want "shiny".
And if web pages where every other word is a hyperlink of dubious value, then I'm afraid "semantic publishing" is a buzzword for "annoying and intrusive".
Some of us still prefer to read a single, coherent article by someone who can write in English. You want to put foot notes at the bottom, go ahead.
But, please, don't give me the blinking and whirling semantic web whereby every move of the mouse updates your AHDH-laden site.
Lost at C:>. Found at C.
Westlaw Next?
I think they're just anti-semantic.Publishers probably think they have a superior knowledge base.
I'm afraid if I don't agree with this I'll be labeled "anti-semantic".
Are the Rothschilds actually Jews?
I think there are two reasons why the whole rdf(s)/owl annotated web pages never really gained traction. First of all it's hard work if you have to do it manually, but most content management systems now offer some kind of key word adding feature though. The second reason, IMO, is that the current Big Data and Machine Learning techniques (and more computing power / persistence media / bandwidth than 15 years ago when the whole rdf/owl thing took off) trump the whole categorization and knowledge extraction / data mining process anyway.
I did not know KDE semantic crap has been in development since 1913
If someone produce an uber simple semantic language - just plain text - that could be tossed into a page or link and utilised with some popular js library then maybe it might gain traction, particularly if it was a micro dsl for highly specific jobs (e.g. stock quotes). Or if an organisation maintained an enormous repository of documents that had to be categorized and linked in a way for people find them. But beyond that, forget it. And you might as well be pissing into the wind to think anyone would willingly use RDF.
2) Hotlinks for things you don't want to read about are annoying and make it harder to read.
3) People and computers can however, easily link dictionary definitions, which a) the intended target of an article find extremely annoying (see point 2 above) but b) do allow non-specialists to read specialized works (such as scientific papers and legal documents). But the specialist/intended target are the major market so this is rare.
4) You can always Google/wikipedia search in a separate window, without annoying the knowledgeable people.
That is, when writing something that casually mentions casu marzu, it takes a lot of effort for the writer to hotlink to an article about casu marzu and most people do not want to read about it. So they don't hotlink to it. The few people that do want to know what casu marzu is can quite easily google/wikipideia it. (Note I warn you do NOT search for it unless you absolutely HAVE to know, it is just a disgusting type of food.)
excitingthingstodo.blogspot.com
Spam, SEO, etc. People lie in meta data. Semantic publishing was clearly doomed when the meta keywords tag turned into a big spam pit.
The trouble is that this is both boring (for a person) and hard (for a computer.)
So nobody wants to do it manually, and while everybody's got an algorithm to mark up text, they're all terrible and prone to being gamed by unscrupulous advertisers.
How many websites have you gone to and seen some random word in the middle of the text that's bolded, double-underlined, larger font and a completely different color to really draw your eye to it (and away from what you're actually there to read.. ie: be as annoying as fucking possible) and then you hover over it and discover its a Wikipedia link to a house or something equally as pointless?
This has been the problem with "the semantic X" ever since link farms were invented. They usually don't provide a whole lot of additional information (if any) and they distract from what you're trying to see.
If you really want a semantic experience, go to basically any popular wiki. They're explicitly curated and therefore the links you find are (usually) actually both informative and relevant. Of course they do this by going the boring (manual) route and compensating for it by having a million people doing the job instead of just a handful.
Go back and read that "mundane" Wikipedia article about the house and, if you have even the slightest amount of curiosity about anything, can probably spend several hours link chaining.. there's links to construction, history, archaeology, anthropology, etc -- and they're all placed in such a way that they're relevant to the article and yet kept subtle enough that you can read over the ones you aren't interested in without a significant drain on attention.
I would be so happy if everyone just put the date on the article so I know how old it is. Especially technology related stuff. if its more that 2 years old its worthless. that is one of the biggest worries I have about semantic content. how do I know if its the latest?
The Redshields? Look at the name for your answer, but here.
Because doing it right is not-automatable and therefore expensive. Really, really expensive. I worked for a company that effectively did nothing but take FDA data from package inserts and recoded it into machine form using industry-standard codes, taxonomies, etc. Even with the slow pace of FDA approvals and insert updates, it took a team of about a dozen clinicians, another dozen bio-informaticists, another couple dozen (relatively specialized - do you know what an ALP test is and what it's used for?) coders to keep up the data system to support this.
And why does it take all these people? Because you're trying to imbue more than information - you're trying to imbue structure and meaning so people can understand and find and code to stuff. And, even though a good Google search can always help, some of this shit is tricky.
For the example in the article, if you index news correctly, it's more than reporters typing a couple links. It's managing subject taxonomies, figuring out valid references, keeping external references updated, etc., etc., etc. It costs. A lot.
That is all.
From AIPAC on one side to the bankers on the other side. Interesting world that we live in.
But, please, don't give me a blinking and whirling semantic web whereby every move of the mouse updates your AHDH-laden site.
FTFY. The semantic web is a vision that has little to do with what you described:
According to the W3C, "The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries".[2] The term was coined by Tim Berners-Lee for a web of data that can be processed by machines.[3] While its critics have questioned its feasibility, proponents argue that applications in industry, biology and human sciences research have already proven the validity of the original concept.[4]
(From the related Wikipedia article.)
If the Semantic Web is so wonderful, then why did it fizzle out?
for the cost of doing it right; and to whatever degree you backed off doing it right you'd end up missing the point.
The big win of text based matching is that nobody has to prepare to be indexed in a search engine, search engine optimization notwithstanding. The big loss is that you get false matches due to polysemy (words that have more than one meaning) and false misses due to synonymous words whose equivalence the search engine doesn't know about.
If you go to something like RDF in which concepts have unique identifiers (URIs), the marginal win is that you get precise and accurate matches where a concept used in two places. I can write an app which searches the Internet for articles on John Williams the classical guitarist and not accidentally lead him to articles on John Williams the movie composer. The big downside is that content providers have to think carefully about how to index your content.
So the problem with the semantic web is that what is realistically achievable with semantic technologies is only a marginal (though real) improvement over what we have now, but that improvement requires content providers to make some effort. I have no expectation that everybody will do this, so the semantic web isn't likely to revolutionize everyone's web experience anytime soon. But I think it can serve many useful niche purposes.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
I'm beginning to see why Slashdot is famous for not reading articles. The articles are often poor. This article isn't the clickbait regularly posted by certain submitters. Instead it reads like a writing assignment.
"The Dynamic Semantic Publishing (DSP) architecture of the BBC curates and publishes content (e.g. articles or images) based on embedded Linked Data identifiers, ontologies and associated inference." This is one of those sentences that makes sense only to those who already know everything about it. It doesn't tell a new person what it is. This style of writing is a form of encryption.
The parts I could decrypt sound like things in existence for years:
"Think of an article that not only tells the new facts, but refers back to previous events and is complemented by an info-box of relevant facts." This is already done by hyperlinks. People who want to research a topic further know they can use Google.
"Another example would be a news feed that delivers good coverage of information relevant to a narrow subject." Isn't this RSS?
"Finally, if we use an example in life sciences, the ability to quickly find scientific articles discussing asthma and x-rays, while searching for respiration disorders and radiation." I don't know what to say. Google, Google Scholar, or Wikipedia. I don't think this writer knows about Google.
Whatever the writer is getting at, it's either already out or a bad idea. Already out: hyperlinks, search engines, Wikipedia. A bad idea: automatic hyperlinking (which also happens to be already out too).
why did it fizzle out?
I think it's too early to say that it did. Scholar has 10.5k hits for articles from this year alone...
I think part of the problem is defining what the "Semantic Web" or "Semantic Publishing" is. For me, it is being able to navigate information based on semantic content. For example, applied to web search, I'd expect the search engine to be able to present me with the topics present in my search results and allow me to re-rank/refine those results based on the presence of topics. If I search for cancer, I would expect the search engine to identify the topics within my search results (lets say: diagnosis, treatment, survival, research etc.) and let me rank results based on the relevance to topics. If I was interested in research, for example, then I'd indicate that via the interface and the results would update. This would seem obvious when it's a search for a term that you already know something about, like cancer, but it becomes powerful when your searching for something you don't know anything about. It allows you to learn and navigate much quicker. It's also easy to see its value when you extract this out to being able to navigate your social media and news stories.
There are startups today that have the technology to do this. Some rely on machine learning (including deep learning) to pre-build models that are then used to classify text based. Others build topic/concept models "realtime" based on the content you pass it. And others still rely on a linguistics approach (admittedly, not many recent companies take this approach for various reasons). One even has a tool that will let you search Google in a somewhat similar fashion to what I described. However, most are still looking for funding and some have been pulled away to focus on customer feedback/BI rather then the semantic web. Also, it can be difficult to start such a company given the amount of resources it needs to get going. Very difficult to bootstrap.
I think it mainly didn't catch on because it meant that you had to manually add a lot of markup to make your site (machine-readably) "semantic". Nobody was willing to make that effort.
Things have changed now that web sites are usually generated, having a separation of HTML templates and database/structured content. This makes it easier to make the structure you have in your backend available to the browser, e.g. using schema.org annotations or others. IMDB has metadata using the Open Graph Protocol (http://ogp.me/), Github also (plus some others).
Unfortunately, many sites don't do it that way yet, possibly because they are more interested in people looking at their ads than in making their content useful.
In any case, I don't think the semantic web is dead, even though many years ago I did think it was born dead (due to the manual effort needed).