Slashdot Mirror


Ask Slashdot: What Happened To Semantic Publishing?

An anonymous reader writes There has always been a demand for semantically enriched content, even long before the digital era. Take a look at the New York Times Index, which has been continuously published since 1913. Nowadays, technology can meet the high demands for "clever" content, and big publishers like the BBC and the NY Times are opening their data and also making a good use of it.

In this post, the author argues that Semantic Publishing is the future and talks about articles enriched with relevant facts and infoboxes with related content. Yet his example dates back to 2010, and today arguably every news website suggests related articles and provides links to external sources. This raises several questions: Why is there not much noise on this topic lately? Does this mean that we are already in the future of Online (Semantic) Publishing? Do we have all the tools now (e.g. Linked Data, fast NoSQL/Graph/RDF datastores, etc.) and what remains to be done is simply refinement and evolution? What is the difference in "cleverness" of content from different providers?

9 of 68 comments (clear)

  1. No by bhcompy · · Score: 4, Funny

    I don't want Symantec publishing. Costs too much to renew every year while hogging all my available CPU and RAM

  2. Not always clever. by wcrowe · · Score: 4, Insightful

    There is a fine line between "clever" and "annoying". Very often, what gets considered as "related" content, is only tangently related, and sometimes the way it is displayed makes it indistinguishable from the content of the current article. Add to that all of the surrounding clickbait, and it just becomes a confusing mess.

    --
    Proverbs 21:19
    1. Re:Not always clever. by Austerity+Empowers · · Score: 2

      Or it's "related" in some obscure way, but entirely unhelpful. When a journalist writes a science/tech related article, the "infobox" should contain the references consulted. When the journalist is writing about an incident that occurred, I'd like to see transcripts, reports from investigators, etc. that the journalist drew from to write the story.

      More often than not it seems like they make stuff up or attempt to assemble things they don't understand into a narrative that "seems" plausible but may not be supported by the facts. What I never want then is a link to another article with another faulty narrative that is even more confusing.

  3. I hope "semantic" != "annoying popups" by TuballoyThunder · · Score: 3, Insightful

    I hate, hate, hate, hate web pages that have hot-linked words with popups. It is even worse when it is an advertisement. And those "recommended articles" at the end are just as bad. Click-bait links to content that is of no value.

    1. Re:I hope "semantic" != "annoying popups" by gstoddart · · Score: 5, Insightful

      Sadly, almost all new "innovations" on the web are almost immediately co-opted by advertising, which more or less renders the technology as crap to be blocked.

      It's all about monetizing, and nothing to do with an improved experience.

      The internet has more or less been ruined by marketing.

      --
      Lost at C:>. Found at C.
    2. Re:I hope "semantic" != "annoying popups" by phantomfive · · Score: 2

      As far as I can tell from the article linked to, it means "auto-generated content." For example, a page that shows all the scores in the college orange-hoop-ball finals might be auto-updated when a team gets a score.

      It should be obvious that auto-generated content can't replace human generated content (unless we invent AI), because humans want to see new things that lead to deeper understanding. It should be obvious but "you won't believe what happens next when when Selena auto-generated this tweet!" kind of leads me to despair for humanity.....

      --
      "First they came for the slanderers and i said nothing."
    3. Re:I hope "semantic" != "annoying popups" by qpqp · · Score: 3

      It's our fault.

      It's Eternal September all the way down.

      Where people are in the habit of paying for things, the providers of those things worry about quality.

      Bullshit. The Internet was a fine place before youtube and google and continues to be so now. It just became more convenient, for everyone. Including the parasites.
      Go look at other segments of the Internet: email, ftp, irc, jabber, torrents... dominated by quality-oriented mentality!
      Look at linux (the systemd debacle notwithstanding;) ), BSD, the open source community in general... Sure, a lot is paid for, but even more is driven by enthusiasm first and foremost.

  4. Ugh. by Altrag · · Score: 2

    The trouble is that this is both boring (for a person) and hard (for a computer.)

    So nobody wants to do it manually, and while everybody's got an algorithm to mark up text, they're all terrible and prone to being gamed by unscrupulous advertisers.

    How many websites have you gone to and seen some random word in the middle of the text that's bolded, double-underlined, larger font and a completely different color to really draw your eye to it (and away from what you're actually there to read.. ie: be as annoying as fucking possible) and then you hover over it and discover its a Wikipedia link to a house or something equally as pointless?

    This has been the problem with "the semantic X" ever since link farms were invented. They usually don't provide a whole lot of additional information (if any) and they distract from what you're trying to see.

    If you really want a semantic experience, go to basically any popular wiki. They're explicitly curated and therefore the links you find are (usually) actually both informative and relevant. Of course they do this by going the boring (manual) route and compensating for it by having a million people doing the job instead of just a handful.

    Go back and read that "mundane" Wikipedia article about the house and, if you have even the slightest amount of curiosity about anything, can probably spend several hours link chaining.. there's links to construction, history, archaeology, anthropology, etc -- and they're all placed in such a way that they're relevant to the article and yet kept subtle enough that you can read over the ones you aren't interested in without a significant drain on attention.

  5. Re:It's just hard work and machine learning by rockmuelle · · Score: 2

    I don't think it's that computers and machine learning really trump an exact model. It's more that manual curated semantic information is difficult to do well and even when done well is simply the curator's interpretation of the key points. Ontologies and controlled vocabularies (necessary to make semantic solutions work) are always biased towards their creators view of the world. Orthogonal interpretations rarely fit with the ontologies and require mapping between knowledge systems. Rather than simplifying things, this just creates another layer of abstraction and meta-data that now must be managed.*

    Machine learning, on some level, basically admits this flaw in structured knowledge representation and punts. Instead, it provides tools for querying knowledge bases and finding patterns in them. I think the latter part is just as flawed as manual curation, but the query tools combined with a human are incredibly powerful.

    A simple example: Yahoo originally indexed and categorized the Web. When I interviewed there in '96 (and, silly me, turned down the offer), they had a room full of people that did just that. Google, on the other hand, used a graph algorithm combined with standard text search methods to leverage the structure of the web to give good search results. Yahoo eventually bailed on manual curation and we learned how to leverage Google's approach to search to mine knowledge.

    tl;dr: manual and automated curation will never properly capture human's representation of knowledge. Instead, better tools plus the human brain will improve our ability to leverage knowledge.

    -Chris

    *and there's that old saying: every software problem can be solved with another layer of abstraction.