Slashdot Mirror


Ask Slashdot: What Happened To Semantic Publishing?

An anonymous reader writes There has always been a demand for semantically enriched content, even long before the digital era. Take a look at the New York Times Index, which has been continuously published since 1913. Nowadays, technology can meet the high demands for "clever" content, and big publishers like the BBC and the NY Times are opening their data and also making a good use of it.

In this post, the author argues that Semantic Publishing is the future and talks about articles enriched with relevant facts and infoboxes with related content. Yet his example dates back to 2010, and today arguably every news website suggests related articles and provides links to external sources. This raises several questions: Why is there not much noise on this topic lately? Does this mean that we are already in the future of Online (Semantic) Publishing? Do we have all the tools now (e.g. Linked Data, fast NoSQL/Graph/RDF datastores, etc.) and what remains to be done is simply refinement and evolution? What is the difference in "cleverness" of content from different providers?

68 comments

  1. Yeah by Anonymous Coward · · Score: 0

    What you publish should probably contain some 'semantic content' here or there, I guess, although that's not strictly necessary for modern media companies.

  2. it is who pays the bills by alen · · Score: 1

    we have these newspaper boxes in NYC as well and they hold a lot of local and foreign language newspapers where people advertise local contractor services as well as rooms in their not so legally modified homes that were meant for one family.

    stuff that people usually don't advertise through your internet ad agencies

  3. No by bhcompy · · Score: 4, Funny

    I don't want Symantec publishing. Costs too much to renew every year while hogging all my available CPU and RAM

    1. Re:No by Anonymous Coward · · Score: 0

      But if you get enterprise platinum plus support, you can speak to a person in a noisy call center at 3am who will search the support knowledgebase for you!

  4. Hypertext is all you need -- /. included by mi · · Score: 1, Troll

    The publishers are (slowly) moving from simply copying plain-text, which they used to print (on dead trees), to web-sites, where hyper-linking is possible.

    That's all you need — usually there is no reason to corral the links into a separate "info-box".

    As the print-magazines wane and digital ones rise, this realization will come to the (still) technically-illiterate journalists and even their editors.

    Meanwhile here on Slashdot (and other forums, where links are allowed), there is simply no excuse for making a claim without a clickable citation behind it... See the paragraph above for an example.

    --
    In Soviet Washington the swamp drains you.
    1. Re:Hypertext is all you need -- /. included by lgw · · Score: 1

      Are you doing your part? Click here to learn more.

      I see "semantic content" all over the place - but the words are all linked to advertisements, not explanations. Sad, really.

      --
      Socialism: a lie told by totalitarians and believed by fools.
  5. Not always clever. by wcrowe · · Score: 4, Insightful

    There is a fine line between "clever" and "annoying". Very often, what gets considered as "related" content, is only tangently related, and sometimes the way it is displayed makes it indistinguishable from the content of the current article. Add to that all of the surrounding clickbait, and it just becomes a confusing mess.

    --
    Proverbs 21:19
    1. Re:Not always clever. by Austerity+Empowers · · Score: 2

      Or it's "related" in some obscure way, but entirely unhelpful. When a journalist writes a science/tech related article, the "infobox" should contain the references consulted. When the journalist is writing about an incident that occurred, I'd like to see transcripts, reports from investigators, etc. that the journalist drew from to write the story.

      More often than not it seems like they make stuff up or attempt to assemble things they don't understand into a narrative that "seems" plausible but may not be supported by the facts. What I never want then is a link to another article with another faulty narrative that is even more confusing.

    2. Re:Not always clever. by Aighearach · · Score: 1

      Well, if it was actual semantic content provided as such to an aware browser, then it would decrease annoyance by giving the user more control.

      Unfortunately for the summary, links are not in fact semantic content. You can have more, or less, links, and you haven't done anything with regards to semantic content. What you need is computer-understood meta-data, including links, that is separate from the main content, follows standard conventions, and can be used by the client software to give semantic information to the user.

      Links at the end of a story... that is just links at the end of a story. I know it is amazing, but the jargon in this case is actual jargon, not just a meaningless fluff word.

    3. Re:Not always clever. by wcrowe · · Score: 1

      Bingo. That is also a problem. Too often the article raises more questions than it answers.

      --
      Proverbs 21:19
    4. Re:Not always clever. by FilmedInNoir · · Score: 1

      What's that a big problem with semantic ads when they first came out?
      People would go look at a news article for a someone that had been burned to death and get ads for BBQs.
      If it's the same thing that is.

      --
      Sig. Sig. Sputnik
    5. Re:Not always clever. by Anonymous Coward · · Score: 1

      That's just to keep you on their site, so they get more ad impressions. It has nothing to do with the linked material being related, although I assume that would help.

      Although more clickbait is also effective, and in that case it doesn't even matter if you read the article, just as long as you keep clicking on links and generate ad impressions.

      On-line journalism isn't even about content. Just about getting people to load a page on your site, and ideally keep them loading more pages on your site.

    6. Re:Not always clever. by Anonymous Coward · · Score: 0

      test

  6. I hope "semantic" != "annoying popups" by TuballoyThunder · · Score: 3, Insightful

    I hate, hate, hate, hate web pages that have hot-linked words with popups. It is even worse when it is an advertisement. And those "recommended articles" at the end are just as bad. Click-bait links to content that is of no value.

    1. Re:I hope "semantic" != "annoying popups" by gstoddart · · Score: 5, Insightful

      Sadly, almost all new "innovations" on the web are almost immediately co-opted by advertising, which more or less renders the technology as crap to be blocked.

      It's all about monetizing, and nothing to do with an improved experience.

      The internet has more or less been ruined by marketing.

      --
      Lost at C:>. Found at C.
    2. Re:I hope "semantic" != "annoying popups" by Aighearach · · Score: 1

      If it was really semantic content, then your client (browser) could walk the graph of related (advertised) documents from those links and provide all sorts of information. For the advertising to be semantic, it would need to be wrapped in some sort of standard API or descriptive (semantic) access method that flagged it as advertising. You could then, in a good client, turn off all the advertising links, and even substitute dictionary entries with the same keyword.

      Semantic access is exactly that; providing the data for the client to make decisions based on, so that you can choose between different things that have the same keywords, depending on their meaning. If it isn't associated with a new browser feature, it probably isn't a semantic document at all, unless it is just a REST-based catalog that is easily client-walkable. Then it might be primitive "semantic web."

    3. Re:I hope "semantic" != "annoying popups" by phantomfive · · Score: 2

      As far as I can tell from the article linked to, it means "auto-generated content." For example, a page that shows all the scores in the college orange-hoop-ball finals might be auto-updated when a team gets a score.

      It should be obvious that auto-generated content can't replace human generated content (unless we invent AI), because humans want to see new things that lead to deeper understanding. It should be obvious but "you won't believe what happens next when when Selena auto-generated this tweet!" kind of leads me to despair for humanity.....

      --
      "First they came for the slanderers and i said nothing."
    4. Re:I hope "semantic" != "annoying popups" by ceoyoyo · · Score: 1

      It's our fault. We abhor anything on the Internet that's not free. Where people are in the habit of paying for things, the providers of those things worry about quality.

    5. Re:I hope "semantic" != "annoying popups" by Megane · · Score: 1

      Usually those ad links are done after page load by a script. If you can find out which script is doing that, it's not hard to tell Ad Block Plus to block it. Stuff like that gets a whole-domain block from me because the domain is usually from a company that does nothing other than web ads, thus nothing of value is lost.

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    6. Re:I hope "semantic" != "annoying popups" by Anonymous Coward · · Score: 0

      It's our fault for falling for it. They keep doing it (using clickbait crap) because it works for them.

    7. Re:I hope "semantic" != "annoying popups" by qpqp · · Score: 3

      It's our fault.

      It's Eternal September all the way down.

      Where people are in the habit of paying for things, the providers of those things worry about quality.

      Bullshit. The Internet was a fine place before youtube and google and continues to be so now. It just became more convenient, for everyone. Including the parasites.
      Go look at other segments of the Internet: email, ftp, irc, jabber, torrents... dominated by quality-oriented mentality!
      Look at linux (the systemd debacle notwithstanding;) ), BSD, the open source community in general... Sure, a lot is paid for, but even more is driven by enthusiasm first and foremost.

    8. Re:I hope "semantic" != "annoying popups" by phantomfive · · Score: 1

      It's our fault. We abhor anything on the Internet that's not free.

      Think about how much of the free internet you would be unwilling to pay for. Now imagine how much your life would be improved if all that were gone.

      Most of the internet is now just click bait, and would only be improved by removal.

      --
      "First they came for the slanderers and i said nothing."
    9. Re:I hope "semantic" != "annoying popups" by phantomfive · · Score: 1

      Go look at other segments of the Internet: email, ftp, irc, jabber, torrents... dominated by quality-oriented mentality!

      Technically email has become dominated by spam, but other than that......

      --
      "First they came for the slanderers and i said nothing."
    10. Re:I hope "semantic" != "annoying popups" by ceoyoyo · · Score: 1

      I'm not sure I really follow your argument, but the open source community seems like a reasonable example. Linux is paid for - big companies sink billions of actual dollars into it, and contributors put in even more value in time. Quality, in the things that are important to the people contributing to it, is high. Quality in the things that are not important to contributors, but are important to many of the people who do not contribute? Not so high.

      Quality is also high in ad encrusted click bait sites - in the eyes of the people contributing to them. But that's not you.

    11. Re:I hope "semantic" != "annoying popups" by ceoyoyo · · Score: 1

      I agree. Now turn it around. Think of all the things on the Internet you WOULD miss if they were gone. Now think of how many of them you would be willing to pay for. Think of the number of times you've seen the term "paywall" used on Slashdot.

    12. Re:I hope "semantic" != "annoying popups" by phantomfive · · Score: 1

      Think of all the things on the Internet you WOULD miss if they were gone. Now think of how many of them you would be willing to pay for.

      Most of them, actually (and I have, from time to time). I think a lot of people would be willing to, when you consider that the average family pays $90 for cable (not including internet).

      The primary difficulty would be finding out about new interesting things that you might be willing to pay for if you knew about them.

      --
      "First they came for the slanderers and i said nothing."
    13. Re:I hope "semantic" != "annoying popups" by qpqp · · Score: 1

      I'm not sure I really follow your argument

      Well the other services (except for email, obviously) are largely run by volunteers and don't even have ads (spam notwithstanding).

      Quality in the things that are not important to contributors, but are important to many of the people who do not contribute? Not so high.

      Now I'm not sure that I follow. Sure, there's lots of stuff that lacks the polish of countless missing man-hours, but we've all come a really long way since the 80s/90s. I'm sure we'll get there if we don't fuck up before that.
      I've also seen lots of examples of features that were unimportant to the contributors, but since there was an itch to scratch e.g. in getting recognition from their users, a similar level of rigor was applied to satisfy them.
      (Certainly, there's lots of negative examples too, but the point stands, that there was little "physical" value that some devs received for their work and yet still the projects thrive(d). I was, of course, assuming that you meant money when you said "paying for things" in your original post.)

  7. Clever? Yeah, right. by gstoddart · · Score: 1

    People don't want "clever". They want "shiny".

    And if web pages where every other word is a hyperlink of dubious value, then I'm afraid "semantic publishing" is a buzzword for "annoying and intrusive".

    Some of us still prefer to read a single, coherent article by someone who can write in English. You want to put foot notes at the bottom, go ahead.

    But, please, don't give me the blinking and whirling semantic web whereby every move of the mouse updates your AHDH-laden site.

    --
    Lost at C:>. Found at C.
  8. Thomson Reuters by Anonymous Coward · · Score: 0

    Westlaw Next?

  9. It's a Cultural Problem by 31415926535897 · · Score: 1

    I think they're just anti-semantic.Publishers probably think they have a superior knowledge base.

  10. Just my luck... by Anonymous Coward · · Score: 0

    I'm afraid if I don't agree with this I'll be labeled "anti-semantic".

  11. Re:What else was started in 1913? by Anonymous Coward · · Score: 0

    Are the Rothschilds actually Jews?

  12. It's just hard work and machine learning by tommeke100 · · Score: 1

    I think there are two reasons why the whole rdf(s)/owl annotated web pages never really gained traction. First of all it's hard work if you have to do it manually, but most content management systems now offer some kind of key word adding feature though. The second reason, IMO, is that the current Big Data and Machine Learning techniques (and more computing power / persistence media / bandwidth than 15 years ago when the whole rdf/owl thing took off) trump the whole categorization and knowledge extraction / data mining process anyway.

    1. Re:It's just hard work and machine learning by qpqp · · Score: 1

      [...] the current Big Data and Machine Learning techniques [...] trump the whole categorization and knowledge extraction / data mining process [...]

      Could you please explain, how a statistical approximation can trump an exact model? I think that big data & co. is a step in the right direction with the means that we currently have available and that we'll get there eventually. There's too many benefits that would result from doing it properly to neglect the required effort.

    2. Re:It's just hard work and machine learning by rockmuelle · · Score: 2

      I don't think it's that computers and machine learning really trump an exact model. It's more that manual curated semantic information is difficult to do well and even when done well is simply the curator's interpretation of the key points. Ontologies and controlled vocabularies (necessary to make semantic solutions work) are always biased towards their creators view of the world. Orthogonal interpretations rarely fit with the ontologies and require mapping between knowledge systems. Rather than simplifying things, this just creates another layer of abstraction and meta-data that now must be managed.*

      Machine learning, on some level, basically admits this flaw in structured knowledge representation and punts. Instead, it provides tools for querying knowledge bases and finding patterns in them. I think the latter part is just as flawed as manual curation, but the query tools combined with a human are incredibly powerful.

      A simple example: Yahoo originally indexed and categorized the Web. When I interviewed there in '96 (and, silly me, turned down the offer), they had a room full of people that did just that. Google, on the other hand, used a graph algorithm combined with standard text search methods to leverage the structure of the web to give good search results. Yahoo eventually bailed on manual curation and we learned how to leverage Google's approach to search to mine knowledge.

      tl;dr: manual and automated curation will never properly capture human's representation of knowledge. Instead, better tools plus the human brain will improve our ability to leverage knowledge.

      -Chris

      *and there's that old saying: every software problem can be solved with another layer of abstraction.

    3. Re:It's just hard work and machine learning by qpqp · · Score: 1

      I agree that the tools are currently insufficient (though quite powerful, e.g. Protege), but I also believe that it's quite possible to achieve a high level of accuracy by combining better tools, dividing the problem space and working on killer features that require this higher level of abstraction.
      Ideally, people (at first for industrial applications) would recognize the need for a proper machine-readable representation of the different states of a specific environment, so that eventually the different ontologies could be mapped to each other.
      An exhausting (i.e. universal) categorization of all possible states (of everything) is largely unnecessary, as even now, when we communicate with each other, use the respective vocabulary of the specific topic/area/system and only (comparatively) rarely need to "interface" or interesect with other areas/vocabularies, e.g. when we want to draw parallels to a similar concept in a different system. With time, I'm sure we'll could even get to a meta-ontology and evolve our language and understanding accordingly.

    4. Re:It's just hard work and machine learning by tommeke100 · · Score: 1

      To clarify, I don't think statistics (ML) would give a better model than an 'exact' - manual - model. I was more speaking in the sense of a 'good enough' system which is also scaleable.

    5. Re:It's just hard work and machine learning by mcswell · · Score: 1

      I am basically in agreement with rockmuelle. But to put (what I think is) his argument slightly differently, there is no such thing as an exact model, because the categories that you would want to mark in a model are inherently fuzzy. Library catalogers knew this decades (a century?) ago; they were trying to create a model, embodied in their card catalogs, of the information in books. But the inter-cataloger agreement was (from my observations) far from exact. A century later, and it's no different--and I suspect it never will be. Categorization, whether done by man or machine, is still fuzzy.

      And if I've misrepresented rockmuelle, or misunderstood your question, qpqp, it's because I don't have an exact model of what you're saying.

    6. Re:It's just hard work and machine learning by qpqp · · Score: 1

      And if I've misrepresented rockmuelle, or misunderstood your question, qpqp, it's because I don't have an exact model of what you're saying.

      Come now, don't blame everything on me!

      What I meant by exact model is of course a predictable, and in a sense deterministic process; inasmuch as that is possible for the given case.
      Even with machine learning you create a representation of the surveyed system, but this model will (currently, and in most cases) always be an approximation.
      By mapping concepts, their (often ambiguous) meanings, usage scenarios and other relations from different areas to each other, supported by these approximations, it should in time be possible to avoid the issues related to the fuzziness and create a truly smart and adaptive system.

      Of course, our universe (as far as we know) is (inherently?) non-deterministic. And obviously, if that is so, you'd have to somehow cheat (e.g. be able to observe our universe from more than the 4 dimensions we can perceive) to get a truly exact model, assuming that some (reachable) abstraction point is deterministic.
      What I'm suggesting is that with some effort it should be possible for us to come up with something with the ability to understand something (like you did with my question, despite lacking an exact model;) ). And while ML is quite crude and more like a sledgehammer, an accurate definition is more like a chisel. At least with respect to the model(s).
      Assuming such a system is created, it will have similar limitations like humans with regard to the ability to understand something, as we do not know everything as far as I am aware.

      But anyway, the librarians didn't have the technical capability to create such a multi-dimensional mess like we currently can, so maybe these things we're talking about just have their own math that we just need to understand the proper rules for. It's all metadata anyway, but currently, I guess the closest we have to an exact model is in the hands of the NSA...

    7. Re:It's just hard work and machine learning by qpqp · · Score: 1

      I admit that I took that quote a bit out of context. I apologize.
      But as mentioned above, I think we just lack a killer feature. And people do use semantically enriched data (also in addition to ML), mostly research, but some do actual work.

    8. Re:It's just hard work and machine learning by tommeke100 · · Score: 1

      No you didn't :) It was a valid argument.
      However this semantic enhancement requires a couple of things: the model (ontology) must be defined by consensus. A model is by definition an incorrect representation of reality. Hence even with a manually crafted model ontology, it still won't be 'exact'. If you apply this on big medical ontologies, you're really in trouble, as they may have hundreds of thousands of concepts. So this is the ontology part. Next you have the actual semantic annotation part of the document where you put actual trust in the annotator that his knowledge of the ontology is perfect and he's doing a good job of annotating the document. This requires plenty of training.

    9. Re:It's just hard work and machine learning by tommeke100 · · Score: 1

      Well, Machine Learning doesn't exclude the use of Semantic Tools like Ontologies. You can still use them to gazeteer your ML indexing process, inference over the Ontology hierarchy etc... Both aren't really mutually exclusive. However, I do think the idea of everyone annotating their webpages semantically is never going to take off. The closest thing we have successfully achieved on the interwebz in that sense is WikiPedia.

    10. Re:It's just hard work and machine learning by qpqp · · Score: 1

      There will always be some outliers/exceptions, but it should be possible to sufficiently specifically define the rules and vocabulary of a given system, possibly by breaking it further down into facets/perspectives and then mapping the relations and constraints.
      So then you could have many ontologies, which will gradually converge over time. I'm talking long-term, of course. The annotation part could also require consensus, or vetting, by multiple recognized entities. All in all, the result would still be more or less a fluid body, but then so is everything around us, as the only constant in our world is that everything is changing.

      And I agree with you that ML and annotation/classification & co. are complimentary tools. And it will take a lot of work to have end users semantically enrich their output.

      Where I disagree is in your definition of a model, which is not necessarily an incorrect representation. It's just a representation, the level of detail varies from use-case to use-case.

      So anyway, the big question is how to get there...

  13. Slow development by Anonymous Coward · · Score: 0

    I did not know KDE semantic crap has been in development since 1913

  14. Doomed to fail by DrXym · · Score: 1
    I remember a few years back attending a conference presentation from some university types trying to convince my company the future of the web was semantic and RDF. I found it hard to take seriously because a) RDF really sucks to read or write, b) it's a pain in the ass to imbue content with semantic information, c) it's largely irrelevant since web engines do a better job anyway.

    If someone produce an uber simple semantic language - just plain text - that could be tossed into a page or link and utilised with some popular js library then maybe it might gain traction, particularly if it was a micro dsl for highly specific jobs (e.g. stock quotes). Or if an organisation maintained an enormous repository of documents that had to be categorized and linked in a way for people find them. But beyond that, forget it. And you might as well be pissing into the wind to think anyone would willingly use RDF.

    1. Re:Doomed to fail by mugnyte · · Score: 1

      Better yet, if a semantic derivative of any web page is built by these powerful web crawlers, building a channel for pushing a link to it back the original web site would mean each crawler wouldn't need to start from scratch. Instead they could annotate and extend the semantic information, serve it from multiple locations, while the original site stayed larger out of the process, save for serving the link(s) or be amenable to a filtering proxy that decorates pages with the links.

      Reduced down, there would be a machine-friendly semantic version of the web that browsers plugin could tap to annotate the existing human-web, and the crawlers were constantly polishing this semantic version behind the scenes (with curated fixups). The infrastructure of the current web wouldn't need to change, but the experience of the browsing user would be greatly enhanced, largely raising the signal-to-noise ratio on "related" links.

    2. Re:Doomed to fail by Anonymous Coward · · Score: 0

      Have you looked at schema.org recently?

      People seem happy with annotating their web pages so that Google et al. can suck in their data. The format is close enough to RDF.
      WebDataCommons (http://webdatacommons.org/) has GBs of RDF triples from schema.org annotations.

    3. Re:Doomed to fail by Anonymous Coward · · Score: 0

      microdata, microformats, RDFa ... all semantic technologies and all widely supported by major search engines.

    4. Re:Doomed to fail by qpqp · · Score: 1

      Thanks for the WDC link. This is awesome!

  15. The hot linked stuff turned out to be worthless by gurps_npc · · Score: 1
    1) Computer software can not create clever hotlinks, it takes a very clever human to do it (not just a good writer). This is expensive to pay someone to do, but a computer CAN put a picture of side-boob and put a clickbait headline on anything. Guess what we end up having...

    2) Hotlinks for things you don't want to read about are annoying and make it harder to read.

    3) People and computers can however, easily link dictionary definitions, which a) the intended target of an article find extremely annoying (see point 2 above) but b) do allow non-specialists to read specialized works (such as scientific papers and legal documents). But the specialist/intended target are the major market so this is rare.

    4) You can always Google/wikipedia search in a separate window, without annoying the knowledgeable people.

    That is, when writing something that casually mentions casu marzu, it takes a lot of effort for the writer to hotlink to an article about casu marzu and most people do not want to read about it. So they don't hotlink to it. The few people that do want to know what casu marzu is can quite easily google/wikipideia it. (Note I warn you do NOT search for it unless you absolutely HAVE to know, it is just a disgusting type of food.)

    --
    excitingthingstodo.blogspot.com
    1. Re:The hot linked stuff turned out to be worthless by mcswell · · Score: 1

      "allow non-specialists to read specialized works (such as scientific papers and legal documents). But the specialist/intended target are the major market so this is rare." Being part of that specialist target myself, I'm afraid you're right. Google has a special search database for people like me, scholar.google.com; but they're constantly making it harder to find. It used to appear as a link at the top of a google search page, then it was relegated to a drop-down, now it's not even there any more. Guess they didn't make enough $ off of it.

  16. People lie in meta data, that's the problem by TheNarrator · · Score: 1

    Spam, SEO, etc. People lie in meta data. Semantic publishing was clearly doomed when the meta keywords tag turned into a big spam pit.

  17. Ugh. by Altrag · · Score: 2

    The trouble is that this is both boring (for a person) and hard (for a computer.)

    So nobody wants to do it manually, and while everybody's got an algorithm to mark up text, they're all terrible and prone to being gamed by unscrupulous advertisers.

    How many websites have you gone to and seen some random word in the middle of the text that's bolded, double-underlined, larger font and a completely different color to really draw your eye to it (and away from what you're actually there to read.. ie: be as annoying as fucking possible) and then you hover over it and discover its a Wikipedia link to a house or something equally as pointless?

    This has been the problem with "the semantic X" ever since link farms were invented. They usually don't provide a whole lot of additional information (if any) and they distract from what you're trying to see.

    If you really want a semantic experience, go to basically any popular wiki. They're explicitly curated and therefore the links you find are (usually) actually both informative and relevant. Of course they do this by going the boring (manual) route and compensating for it by having a million people doing the job instead of just a handful.

    Go back and read that "mundane" Wikipedia article about the house and, if you have even the slightest amount of curiosity about anything, can probably spend several hours link chaining.. there's links to construction, history, archaeology, anthropology, etc -- and they're all placed in such a way that they're relevant to the article and yet kept subtle enough that you can read over the ones you aren't interested in without a significant drain on attention.

  18. Just put the date on the webpage by Anonymous Coward · · Score: 0

    I would be so happy if everyone just put the date on the article so I know how old it is. Especially technology related stuff. if its more that 2 years old its worthless. that is one of the biggest worries I have about semantic content. how do I know if its the latest?

    1. Re:Just put the date on the webpage by stridebird · · Score: 1

      This. They want a semantic web and so far we haven't even got a reliable DatePublished. Technical search is slowly going to shit at the moment on account of this issue. And each lost forum post by bewildered users unable to parse search results for relevance adds further to the problem. Google has date search filters - they should be much more prominent.

  19. Re:What else was started in 1913? by Anonymous Coward · · Score: 0

    The Redshields? Look at the name for your answer, but here.

  20. Why you don't hear about it much? by frank_adrian314159 · · Score: 1

    Because doing it right is not-automatable and therefore expensive. Really, really expensive. I worked for a company that effectively did nothing but take FDA data from package inserts and recoded it into machine form using industry-standard codes, taxonomies, etc. Even with the slow pace of FDA approvals and insert updates, it took a team of about a dozen clinicians, another dozen bio-informaticists, another couple dozen (relatively specialized - do you know what an ALP test is and what it's used for?) coders to keep up the data system to support this.

    And why does it take all these people? Because you're trying to imbue more than information - you're trying to imbue structure and meaning so people can understand and find and code to stuff. And, even though a good Google search can always help, some of this shit is tricky.

    For the example in the article, if you index news correctly, it's more than reporters typing a couple links. It's managing subject taxonomies, figuring out valid references, keeping external references updated, etc., etc., etc. It costs. A lot.

    --
    That is all.
  21. Re:What else was started in 1913? by Anonymous Coward · · Score: 0

    From AIPAC on one side to the bankers on the other side. Interesting world that we live in.

  22. Re:Clever? Yeah, right. by qpqp · · Score: 1

    But, please, don't give me a blinking and whirling semantic web whereby every move of the mouse updates your AHDH-laden site.

    FTFY. The semantic web is a vision that has little to do with what you described:

    According to the W3C, "The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries".[2] The term was coined by Tim Berners-Lee for a web of data that can be processed by machines.[3] While its critics have questioned its feasibility, proponents argue that applications in industry, biology and human sciences research have already proven the validity of the original concept.[4]

    (From the related Wikipedia article.)

  23. Re:Clever? Yeah, right. by Citizen+of+Earth · · Score: 1

    If the Semantic Web is so wonderful, then why did it fizzle out?

  24. I think it didn't offer enough marginal value by hey! · · Score: 1

    for the cost of doing it right; and to whatever degree you backed off doing it right you'd end up missing the point.

    The big win of text based matching is that nobody has to prepare to be indexed in a search engine, search engine optimization notwithstanding. The big loss is that you get false matches due to polysemy (words that have more than one meaning) and false misses due to synonymous words whose equivalence the search engine doesn't know about.

    If you go to something like RDF in which concepts have unique identifiers (URIs), the marginal win is that you get precise and accurate matches where a concept used in two places. I can write an app which searches the Internet for articles on John Williams the classical guitarist and not accidentally lead him to articles on John Williams the movie composer. The big downside is that content providers have to think carefully about how to index your content.

    So the problem with the semantic web is that what is realistically achievable with semantic technologies is only a marginal (though real) improvement over what we have now, but that improvement requires content providers to make some effort. I have no expectation that everybody will do this, so the semantic web isn't likely to revolutionize everyone's web experience anytime soon. But I think it can serve many useful niche purposes.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
  25. Worthless Article by Art3x · · Score: 1

    I'm beginning to see why Slashdot is famous for not reading articles. The articles are often poor. This article isn't the clickbait regularly posted by certain submitters. Instead it reads like a writing assignment.

    "The Dynamic Semantic Publishing (DSP) architecture of the BBC curates and publishes content (e.g. articles or images) based on embedded Linked Data identifiers, ontologies and associated inference." This is one of those sentences that makes sense only to those who already know everything about it. It doesn't tell a new person what it is. This style of writing is a form of encryption.

    The parts I could decrypt sound like things in existence for years:

    "Think of an article that not only tells the new facts, but refers back to previous events and is complemented by an info-box of relevant facts." This is already done by hyperlinks. People who want to research a topic further know they can use Google.

    "Another example would be a news feed that delivers good coverage of information relevant to a narrow subject." Isn't this RSS?

    "Finally, if we use an example in life sciences, the ability to quickly find scientific articles discussing asthma and x-rays, while searching for respiration disorders and radiation." I don't know what to say. Google, Google Scholar, or Wikipedia. I don't think this writer knows about Google.

    Whatever the writer is getting at, it's either already out or a bad idea. Already out: hyperlinks, search engines, Wikipedia. A bad idea: automatic hyperlinking (which also happens to be already out too).

  26. Re:Clever? Yeah, right. by qpqp · · Score: 1

    why did it fizzle out?

    I think it's too early to say that it did. Scholar has 10.5k hits for articles from this year alone...

  27. Semantic content navigation isn't far off by rstuart · · Score: 1

    I think part of the problem is defining what the "Semantic Web" or "Semantic Publishing" is. For me, it is being able to navigate information based on semantic content. For example, applied to web search, I'd expect the search engine to be able to present me with the topics present in my search results and allow me to re-rank/refine those results based on the presence of topics. If I search for cancer, I would expect the search engine to identify the topics within my search results (lets say: diagnosis, treatment, survival, research etc.) and let me rank results based on the relevance to topics. If I was interested in research, for example, then I'd indicate that via the interface and the results would update. This would seem obvious when it's a search for a term that you already know something about, like cancer, but it becomes powerful when your searching for something you don't know anything about. It allows you to learn and navigate much quicker. It's also easy to see its value when you extract this out to being able to navigate your social media and news stories.

    There are startups today that have the technology to do this. Some rely on machine learning (including deep learning) to pre-build models that are then used to classify text based. Others build topic/concept models "realtime" based on the content you pass it. And others still rely on a linguistics approach (admittedly, not many recent companies take this approach for various reasons). One even has a tool that will let you search Google in a somewhat similar fashion to what I described. However, most are still looking for funding and some have been pulled away to focus on customer feedback/BI rather then the semantic web. Also, it can be difficult to start such a company given the amount of resources it needs to get going. Very difficult to bootstrap.

  28. Re:Clever? Yeah, right. by dataminator · · Score: 1

    I think it mainly didn't catch on because it meant that you had to manually add a lot of markup to make your site (machine-readably) "semantic". Nobody was willing to make that effort.

    Things have changed now that web sites are usually generated, having a separation of HTML templates and database/structured content. This makes it easier to make the structure you have in your backend available to the browser, e.g. using schema.org annotations or others. IMDB has metadata using the Open Graph Protocol (http://ogp.me/), Github also (plus some others).

    Unfortunately, many sites don't do it that way yet, possibly because they are more interested in people looking at their ads than in making their content useful.

    In any case, I don't think the semantic web is dead, even though many years ago I did think it was born dead (due to the manual effort needed).