Slashdot Mirror


Going from a 'Web of links' to a 'Web of meaning'

neutron_p writes "Computer scientists from Lehigh University are building the Semantic Web, which will handle more data, resolve contradictions and draw inferences from users' queries. The new improved Web will also combine pieces of information from multiple sites in order to find answers to questions."

37 of 142 comments (clear)

  1. When by cbrocious · · Score: 2, Insightful

    When will we be dropping HTTP and HTML in favor of more metadata-friendly protocols and file formats? I can see huge potential in a system built specifically for getting data out there and linking it all together.

    --
    Disconnect and self-destruct, one bullet at a time.
    1. Re: When by vasubhat · · Score: 2, Funny

      ... And when the computer does indeed possess understanding, thought, consciousness and the likes, and goes about doing something with it, the Vogons go and destroy it before the job is done.

  2. Ummm by bo0ork · · Score: 3, Insightful
    The new improved Web will also combine pieces of information from multiple sites in order to find answers to questions

    Sounds like a recipe for disaster to me.

    --
    Does everything include nothing?
    1. Re:Ummm by amalcon · · Score: 2, Funny

      Yeah, hopefully there aren't many easily-offended cat enthusiasts out there. They might not appreciate some of the more, er, "exotic" sites they find...

      --
      -Amalcon
    2. Re:Ummm by bobbis.u · · Score: 4, Insightful
      Yeah, I would tend to agree.


      One of the reasons the internet has become so popular is because everyone can have their say. Unfortunately, this has the side effect that there is a lot of incorrect and misleading information out there. Everything is also self-reinforcing, because one person often copies their "facts" from another website without first checking the veracity. Even major news outlets and scientific publications have been caught out by this in the past.

    3. Re:Ummm by ezzzD55J · · Score: 4, Insightful
      'Everything is also self-reinforcing, because one person often copies their "facts" from another website without first checking the veracity'

      There is another way in which it's self-reinforcing. People look for sites and pages and people that reflect their own opinions.

    4. Re:Ummm by LionKimbro · · Score: 2, Interesting
      Two things:

      1. "Webs of trust." People will make pages telling what pages they believe have a good reputation, and generally tells the truth. If someone fills the web with a ton of random statements, they will have a low reputation.
      2. Computers will have "beliefs" reflecting their owner's own. You will tell the computer, "I believe this is true," and the computer will absorb the package of information. You can say, "I believe this is false," and the computer will absorb the package of information, and put it into the "bogus" bin.
    5. Re:Ummm by jsebrech · · Score: 2, Insightful

      "Webs of trust." People will make pages telling what pages they believe have a good reputation, and generally tells the truth.

      That won't work for stuff that's politically sensitive, since people will mod sites down just because they dislike what the site says, even if it is accurate. It also gets really complicated with sites that are accurate on one subject but don't know jack about another.

      Computers will have "beliefs" reflecting their owner's own.

      In that case, what's the point? If your computer only accepts data that fits in with your predetermined conclusions, it will provide valueless results.

  3. Something similar. by modifried · · Score: 4, Informative

    Covered not long ago - an interview with Berners-Lee regarding the Semantic Web.

  4. Why is this news? by multipart · · Score: 4, Informative

    People at DERI in Ireland's Galway are also working on the Semantic Web (see http://www.deri.ie/). I thought lots of people are...

    1. Re:Why is this news? by BarryNorton · · Score: 2, Informative

      They are - there are several major European consortia, many involving the University of Sheffield where I work on Semantic Web Services, as well as lots of US work especially deriving from DARPA and CMU work on agents...

  5. Resolve Contradictions? by NoTheory · · Score: 3, Interesting

    I'll have to rtfa to see what they propose, but just the principle of resolving contradictions is a really difficult one, and most theories of knowledge (which are essentially networks of facts) aren't terribly robust, and contradiction repair, which involves running the entire network to find invalid assumptions, and then propigating the changes is NP complete :| i'm not positive that contradiction resolution is a reasonable thing to expect out of a massive distributed network.

    --
    There are lives at stake here!
    1. Re:Resolve Contradictions? by fugu13 · · Score: 3, Informative

      The Semantic Web's use to resolve contradictions is probably least applied, at least in these early stages. Also, it is not meant to be a global information store (in which all contradictions may be resolved). It is meant to be large numbers of globally connected information stores, and between small numbers of these contradictions may be resolved.

      Also, the ontology of the semantic web comes in 3 flavors, OWL Lite, OWL DL, and OWL Full. The first two are limited enough that they are decidable (I'm not sure if this is guaranteed or just true for most use cases). OWL Lite in particular is light weight enough that processing of it is in reach for data stores, but powerful enough far more information can be inferred than what is directly stated in the RDF.

      --
      For to end yet again.
    2. Re:Resolve Contradictions? by NoTheory · · Score: 2, Insightful

      alright, having read the friggin' article, all i have to say is that they have their work cut out for them.

      the problem with searching currently is that only librarians, who've had at least a year or two of graduate studies really know the ontology that libraries use. Common users bring their own concepts and ontologies to bear when they're searching for information. But if you move away from the monolithic single ontologies that libraries use, you have the problem that you have to be open to the fact that ontologies change, not just between individuals but over time, as cultures change the ontologies need to change as well. I guess the concept must be that there are a set of descriptors which are invariant, and can thus be interpreted based on the features of those objects by different ontologies.

      The crazy part about trying something like that is that you have to make people define their own ontologies. Furthermore you have the problem that you need to make sure that people are describing their data in an ontologically neutral manner.

      And that's the hidden third problem (the technology review article posted above has the dude citing 2 problems), getting people to behave in a sensible way when dealing with information organization. Unfortunately in so far as we know now, it's really difficult to get computers to automatically create meta-data (that doesn't mean we're not trying), but primarily humans have to be included in the decision process if you want to define what things are.

      the ironic thought that pops to mind is that if you've got a set of universal descriptors, then don't you already have an ontology? And if you don't have a set of universal descriptors, how would you ever create a coherent ontology?

      anyway, enough rambling for now

      -notheory

      --
      There are lives at stake here!
  6. Snake oil... by Alomex · · Score: 2, Insightful



    with their favourite mode of publication being the press release.

  7. Unless by Taco+Cowboy · · Score: 3, Interesting

    You gotta understand that "meaning" has no meaning at all to machines, at least not yet.

    And even for humans, the "meaning" of a certain thing can be different thing to different people !

    Although I applaud the job they are doing for Semantic Web, I wonder how they can inject "meaning" into the whole thing.

    My biggest fear is the 1984-like "my meaning is THE meaning and you canna have any other meaning" thing.

    --
    Muchas Gracias, Señor Edward Snowden !
  8. and draw inferences.. by murderlegendre · · Score: 5, Funny

    ..from user's queries.

    Clippy..? Is that you?

    --
    There's a Starman, waiting in the sky / He'd like to come and meet us, but he hasn't got the time.
  9. It's the authoring tools, stupid by Eloquence · · Score: 4, Insightful
    Who is "building the semantic web"? Academics or web authors? The only semantic web technology that has actually gained wide usage in the sphere of user-generated content is RSS, a syndication format (or rather, a bunch of competing syndication formats). The reason for this is that weblog engines like Slash and Movable Type support syndication. This then allowed programmers to create news aggregators and filters.

    The same can be said about any semantic web technology - whether it's FOAF (an RDF vocab for describing people and their interests) or a vocabulary for reviews. As soon as major authoring tools (i.e. both web editors and content management systems) start integrating these technologies, people will use them if they are useful. Do not expect web designers or bloggers to have a clue about all the great things that the semantic web can do - give them one useful thing which they understand, package it in a pretty UI, and they will start using it.

  10. The semantic Web and valid HTML by ToreTS · · Score: 2, Insightful

    I guess that the Semantic Web would need HTML documents to meet strict requirements when it comes to validation, use of logical instead of physical markup and so on. This could be an incentive for people to use HTML the way it was intended, instead of the crapload of pages that don't close tags, use hundreds of redundant FONT tags, use the H1..H6 elements to control font size instead of using them to indicate headings, and so on. Strangely enough, all "beginner's" HTML books still teach people to code this way.

  11. Being built by Lehigh university eh? by The_reformant · · Score: 5, Informative

    The semantic web is a pretty popular area of research right now and its far from being "built by computer scientists at Lehigh University", in fact I could have done an undergrad dissertation on the semantic web, and there were numerous phD positions being advertised at uni's around the world researching about the semantic web.
    Whichever lehigh uni professor submitted this is stooping pretty low trying to raise publicity (and hence finance) I would think!

    --
    I have discovered a truly remarkable sig which this post is too small to contain.
  12. it's the Gibson! by SuperBanana · · Score: 2, Funny

    Am I the only one who recognized the main graphic for the story as a lifted screencap from the movie Hackers? That movie's SOLE redeeming quality was Angelina Jolie...

    Well, ok, that and the laugh factor. Not quite as much fun as MST3K'ing The Mummy with about a half dozen friends though.

  13. A lot of work to be done by Mazzaroth · · Score: 2, Insightful
    Semantic web is an amazing adea that will profoundly transform the way we interact with information. But I can see huge amount of work remaining to be done:
    • We need an ontology that will cover many if not all aspect of human experience. And this experience has been evolving dramatically and will continue to evolve. This ontology is probably a moving target. This task alone of creating the ontology has been, and is still the holy grail of AI and Knowledge Management.
    • The amount of time we will have to invest in adding metadata to the data will dramatically increase over time. We will need a way to automate the filling of the metadata layer. This is where kicks in automatic image recognition and classification, speech to text, text summarizer and meaning extractor (Here, Copernic is is the right direction). Maybe the librarian profession will be the next hot job...
    • Almost every application will have to adapt and inter-communicate. No big deal, RDF will probably become the new data bus anyway.
    That will be interesting!!!
  14. Re:too little RDF by Tony+Hoyle · · Score: 2, Informative

    There's absolutely loads of it around... especially as people are starting to use more generated websites (like slashdot for example).

    If you search for *.rdf maybe you won't find as much... a lot of it is *.rss, *.xml and other things.

    Also, google doesn't index them.

  15. Welcome to 2001 by the_demiurge · · Score: 2, Insightful

    Hasn't everyone heard of this already?
    W3C semantic web activity from 2001.
    Heflin's Thesis from 2001.

    I'm rather skeptical of the whole thing, it seems to me to be like "Wouldn't it be nice if people documented their web page content better? Then we could do all these neat things." The second statement is right, but I fear the first statment is intractable.

  16. If for no other reason than IP law by ShatteredDream · · Score: 2, Interesting

    This could create huge problems for people to stay on the right side of copyright law. A medium that pulls information from several different sources could potentially make it much harder to avoid copyright infringement. For example, you pull from a Wikipedia entry, a NY Times entry and a Reason editorial. You better keep track of where you got each part if you use them in any of your own research, commentary, etc.

    How does it combine information from different sources in a way that keeps the user knowledgeable about where the data came from? How do you know who to cite, or whether something you're excerpting can be used in the context you want, when your "semantic web browser" pulled the data and combined it coherently or incoherently into a mish mosh of data sources?

    Am I the only one who thinks that this could be an IP trial lawyer's wet dream?

  17. I have my doubts... by ngunton · · Score: 4, Insightful

    It seems to be a common mistake for computer scientists to think that it's possible to make systems that "understand" the world (both real and abstract knowledge), with all its complexity and ambiguity, in the same way that humans do. I feel that there is a fundamental difference between using computers to enable humans to organize stuff, and having computers automatically do it. Every single attempt at getting computers to be "smart" about infering human intentions has ended up as an irritating impediment to using the system - look at clippy, Bob, "intelligent" voice systems that try to "help" you by stopping you from talking to a real person... what computers are very, very good at is amplifying and enabling human intelligence. Computers are not themselves intelligent, and (my personal opinion) I don't think they ever will be - unless we manage to "grow" them using processes that we probably won't fully understand. You can't construct something that is as complex as the human mind through deterministic (i.e. consciously designed architectural) means - all you'll end up with, at best, is a very complex rule inference engine that is limited by the rules you gave it. Every "holy grail" of intelligent programming that has come along - neural nets, genetic programming etc - has turned out to be very limited (though very useful in special situations).

    I also feel that talking about automatically organizing the world's knowledge in a semantic web is just more of the same hot air that we've been hearing from AI departments for the last few decades. You can't automatically allocate meaning to something unless you have the capability for "common sense" reasoning, and the world knowledge at your fingertips to be able to interpret the data intelligently, like a human would. And even then, different humans would interpret it differently... so there are multiple meanings, and anyway, how to allocate "meaning" to something abstract such as a poem or piece of art?

    And if we require real people to add metadata to everything... well, it just ain't going to happen, in my humble opinion. Adding meta data is a pain in the ass, since you have to define the categories of object, agree on meanings for all the different taxonomies that will have to be used to describe the world... then there's the potential for abuse, as spammers will inevitably seed their documents with inappropriate metadata. So, the "honest" people can't be bothered, and the dishonest people will wreck anything that does get built. So, it ain't gonna happen.

    The beauty of google (not that I love google, but they did hit a nail on the head) is that it requires no effort or "machine intelligence", beyond a very simple algorithm that depends not on AI but rather real, tangible relationships between words and documents (proximity and links). This is something that computers can be really good at.

    Just my opinion... obviously there will be others out there who will vehemently disagree, and that's fine! Go ahead and try, you'll learn a lot in the process and you will probably come out with some tangential technology that you never thought of initially but is useful nonetheless.

    1. Re:I have my doubts... by DrEasy · · Score: 2, Insightful
      The beauty of google (not that I love google, but they did hit a nail on the head) is that it requires no effort or "machine intelligence", beyond a very simple algorithm that depends not on AI but rather real, tangible relationships between words and documents (proximity and links). This is something that computers can be really good at.

      And that's the curse of AI right there. Because you happen to know the algorithm underneat Google, you don't think of it as "intelligent". But to the average Joe it can certainly seem that way.

      We used to say that the day a chess program could beat a human, it'd be proof that machines can be intelligent. But now that we know how to build such a system it has lost its magic, and therefore shouldn't count as AI?

      --
      "In our tactical decisions, we are operating contrary to our strategic interest."
  18. obscured by the cloud by Doc+Ruby · · Score: 2, Insightful

    Meaning is always "in context". Human communication always requires a "transmitter -> medium -> receiver" structure. Some say the universe is fundamentally structured on that model. When these sematic systems are overlaid on content, there's always these slippery, unresolvable mismatches of "intent" and "understanding", those "semantic arguments" that drive likeminded people crazy. Content searching is extremely powerful, without creating the "cracks" into which meanings can irretrievably fall. As long as there are alternative semantic indices to content still available "raw", semantics will just help. When we move to wrap all content entirely in semantics, we'll live in the "map is not the territory" problem forever. Ask CORBA programmers and EU language translators about the death of meaning by means of the dictionary. If we need to add semantics as a tool, we still get under the hood at the actual content.

    --

    --
    make install -not war

  19. Meaning = ability to Intelligently Handle by LionKimbro · · Score: 4, Informative

    A message has "meaning" if you can make special use of it.

    Normal web pages have meaning for browsers, it's just that that meaning is limited to "how to draw words for the user."

    What we're doing, is making it so that your computer can make special use of messages on the web, to do smarter things.

    It would be scary if the Semantic Web were about "my meaning is THE meaning." But it is explicitely not like that. In fact, one of the main things about it is that anyone can make up their own languages, their own way of modelling the world.

    There are tools that make it so you can say, "My word X is sort of like their word Y," but it's acknowledged that such translations will be imperfect. Likely, fuzzy logic, and systems that are able to ask for clarification (and remember responses), will be used to mediate that sort of things.

    You may also be interested in my favorite page on AI by Open Mind. The Semantic Web isn't explicitely about AI, but it opens the door for a lot of AI work.

  20. Like when I type "Unicycle Jousting" by briancnorton · · Score: 2, Insightful
    And I get 200 adds for herbal viagra, 300 nigerians that have inherited 15 MILLON USDOLLARS, and deviant pornography.

    A semantic web is only as useful as the metadata, and people go to great lengths to mislead and disguise.

    --

    People who think they know everything really piss off those of us that actually do.

  21. Representation of meaning is not the problem by kubalaa · · Score: 3, Insightful

    Semantic Web is the most ridiculous idea I've ever heard. The problem with meaning isn't representation -- English represents meaning just fine. The problem is meaning itself -- it doesn't matter if you figure out a way to encode it in some XML language, for every bit that it's easier for computers to use, it will carry that much less meaning.

    Another way of putting it is, any program capable of extracting the same meaning from XML that humans can, should be able to understand English without much trouble. It's the whole Intelligence-complete" thing. Like NP-complete, there seem to be a class of problems which can only be solved by real intelligence, and they're all pretty much equivalent in that with real intelligence, you can solve them all.

    --

    "If you look 'round the table and can't tell who the sucker is, it's you." -- Quiz Show

    1. Re:Representation of meaning is not the problem by fugu13 · · Score: 2, Insightful

      This got insightful?!

      Lets take a look at English, shall we?

      "Milk costs five dollars."

      "Milk always costs five dollars."

      "Milk's price is five dollars."

      "Isn't it cool that milk costs that low, low price of five dollars?"

      "I am so gosh-darn happy that I can obtain the glorious bounty of milk for a mere five (count 'em, one-two-three-four-five) bills featuring our esteemed former president, George Washington."

      Now, lets take a look at some possible semantic web statements.

      Milk hasPrice $5

      anonymousItem hasType Milk
      anonymousItem hasPrice $5

      KrogersItem54728 hasType Milk
      KrogersItem54728 hasPrice $5

      Now, the above are slight simplifications for the purposes of conveying the essential ideas (we're not getting into the ideas of common vocabularies, though it makes relating information far simpler if used. Its a bit too much to explain), but it is amazing that anyone could think that programs which can parse the latter sets of information can parse the former!

      --
      For to end yet again.
  22. Another Clippy by Odd+John · · Score: 2, Funny

    Great. An Expert System to do your google searches based on what it thinks you meant. The giant Semantic 'Clippy' knows what's best when it pops up to say:

    ''Here are the results to the question you should have asked.''

    Maybe next they'll have the Semantic Web manage the way electronic voting is counted. Semantic Clippy will count your 'intent' instead of your actual vote.

  23. why this will fail by ndunn · · Score: 3, Informative


    Google works because it is largely a statistical tool that uses some meta-information.

    While I could see frameworks being used for very specific purposes, like searching a homogeneous (e.g., slashdot, pubmed, nytimes) web-site where all content is controlled. But extending these ideas to a heterogenous web that would no doubt take advantages of such a volunteer system is ludicrous.

    I also take issue with the top-down mind-state that they will be able to predict what is useful to the user. This is why statistical importance and quantity is the only realistic method for such a massive undertaking (which google is still actively researching).

    I think that the only useful research to come out of such an endeavor would be to have news-sites, as mentioned above, implement and be scanned using an ontological browser. Of course, I am not sure how this would be different than Lexus-Nexus (sp?).

  24. Dependency: web of trust by tunabomber · · Score: 2, Insightful

    Anybody remember the demise of META keywords?

    I think we could run into the same problem with the Semantic Web, as it too allows web developers to attach arbitrary metadata to their pages. The only way to prevent unscrupulous web developers from embedding inaccurate RDF in their pages in hopes of attracting more hits is by establishing a web-of-trust framework.
    Google implements a very crude version of web-of-trust that assumes "incoming hyperlinks==trust". I think that in order for the Semantic Web to be something that is usable by web-wide search engines like Google, we will need a much more robust and fine-grained system of trust. The user should be able to specify some of the entities that they trust and the search engine will deduce the rest.
    However, without an adequate trust framework, the Semantic Web will just be a new fertile ground for for keyword spam and search engine "optimization".

    --

    pi = 3.141592653589793helpimtrappedinauniversefactory71 ...
  25. BETTER FORMATTED THAN PARENT by fugu13 · · Score: 2, Insightful
    Silly me, not previewing.

    The World Wide Web cannot "at its core handle inconsistent information" yet it seems to lurch along okay.

    The Semantic Web is not some attempt at global knowledge, perfect knowledge, perfect reasoning, or anything of the sort, regardless of what many posters, including yourself, seem to have construed it as.

    It is intended to be an analogue of the World Wide Web, which is primarily consumed by humans, that is instead primarily consumed by computers.

    Can it know everything? Of course not! But it can make it so computers "understand" a heck of a lot more than they do today.

    For instance: right now an everyday computer (or more accurately, the web browser) "understands" that (absent styling) a

    tag is presented in a certain way. The Semantic Web wants to make it so the triple GallonOfMilk hasPrice $1.25 (this would actually be expressed in several triples about a product with a certain id, probably, but you get the idea) can be "understood" by a program in the same way across multiple sources.

    The same as a person does not automatically assume a site is an absolute authority on the price of milk, semantic web enabled programs would not assume that this information was absolute (nor would it likely be presented as such). However, imagine how powerful it would be if one could give your browser the address of the RDF interfaces for local grocery stores (or it might autodiscover them at least in part), and then it would find out what the price of milk (and other groceries) is at each one of them.

    That sort of thing is already possible today without the Semantic Web (or other semantic frameworks), but only with methods that either require heavy lifting on the part of the client system (such as web scraping every grocery store site, killing extensibility and easy implementation) or aren't cross-domain (perhaps I want to chart the price of milk (from some milk-price-archive) vs real dollar value -- now my client has to understand two possibly very different ways of presenting information, not just one integrated way).

    The Semantic Web (and associated technologies) is an enabling framework that frees programmers from doing a lot of the heavy lifting involved in discovering meaning and relating meaning, just as SQL is an enabling framework that frees programmers from doing a lot of the heavy lifting involved in storing data and relating data.

    --
    For to end yet again.
  26. Re:no formal theory? get real. by ngibbins · · Score: 2, Informative

    There has been a considerable amount of work on ontology mapping within the knowledge engineering community, but the evolutionary aspects of ontologies have been largely overlooked. Ontology mapping is a harder problem than graph isomorphism, since classes from different ontologies may have extensions that overlap rather than cover each other. It's a difficult problem, certainly, but it's worth noting that game theory isn't applied here.

    Game theory tends to appear more within the multi-agent systems community than the semantic web community; they've been looking at the social models for trust for some years now.