Slashdot Mirror


Using the Semantic Web to Enhance Search

RobMcCool writes "At Stanford KSL, we really like the Semantic Web. So we've taken many of our favorite web sites, scraped them, and put together a huge pile of RDF, which we'll let you download. We've used that RDF to create a search application, in the spirit of Google Q & A or Microsofts recently announced MSN Search extensions. Our search can answer simple factual queries like the previously discussed population of Portugal but can also answer some more complex ones. We also have a smart autocomplete system, type "tom hanks birth" slowly to see it in action (best with Firefox). We're looking for people to be a part of this search system by running their own search sites, and by putting their data on the Semantic Web. Come check it out!"

150 comments

  1. Google watch out... by jason718 · · Score: 5, Insightful

    Semantic-driven search engines have awesome potential. However, it does place a lot of demand on the content provider to provide metadata-rich content - or to be able to provide intelligent mining tools to create metadata from existing sites.

    This is definitely one to watch...

    1. Re:Google watch out... by Anonymous Coward · · Score: 1, Interesting

      Note to self. Dreaming about the world tagging all their data isn't going to happen. It takes too much damn time. Semantic driven search using google's technique works. Producing a RDF graph is crap. Nothing to watch here.

    2. Re:Google watch out... by ShinmaWa · · Score: 2, Interesting
      However, it does place a lot of demand on the content provider to provide metadata-rich content

      This statement is why I was wondering why this was considered such a wonderful thing. For a while now, there's been a research project at IBM called WebFountain that not only does everything that Semantic Web attempts to do, but doesn't require any special mark up either. Its goal is to work with completely unstructured data of any type, including web pages, powerpoint documents, word docs, PDFs, etc etc. Based on the article I linked above (which is 18 months old), it seems Semantic Web is actually much more primitive.

      More to the point, in this blog there was an arcticle on WebFountain. In the comments section there was this mention of WebFountain in an RDF/OWL environment:
      if everyone were to agree on a tag set and apply it consistently, and tag everything of possible business interest, then yes, WebFountain would not be so relevant...and people would also need to tag for things that they don't even know will be businesses in 50 years [...] We'll see if that pans out!
      To me, that hit the nail on the head and why a markup-based semantic engine is doomed to failure. While the remark was in a business-context, I think its just as valid in any context.
      --
      The /. Effect: Thousands of users simultaneously accessing a site to not read its content.
    3. Re:Google watch out... by Azghoul · · Score: 1

      Geographers have been waiting for over a decade for metadata to catch on. Everyone hates building metadata, even when they know it makes their data infinitely easier for other geographers to use.

      In the context of GIS data, where metadata can be incredibly useful, creation of metadata is like pulling teeth.

      Unfortunately, until and unless there's automated tools - your "intelligent mining tools", this whole thing will never be more than a curiosity...

    4. Re:Google watch out... by Metasquares · · Score: 2, Informative

      As one who has written semantic web pages, it's also rather difficult. OWL is a real pain to write, and most interpreters don't support "OWL Full", which means I'm stuck writing for either "OWL Lite" (now with only half the calories!) or "OWL DL". Forget (X)HTML, too - you need to use XML+RDF to use OWL, which means that if you want content you either need a parser or you need to code two documents for each one: One for human readability, and one that contains the metadata. There used to be a language called SHOE that embedded metadata into HTML via meta tags, but that seemed to have been supplanted by DAML+OIL and OWL.

      If it's made easier to write (like SHOE was, actually), I can see widespread adoption, because the idea of adding machine-searchable metadata to a document is very good; the implementation is just very poor. Otherwise, expect to be paying your web developers a lot more, both rate and timewise, in the future!

    5. Re:Google watch out... by ngibbins · · Score: 1

      It's not that most interpreters don't support OWL Full, but that there are no tractable, sound and complete algorithms for subsumption reasoning in the logic that underpins OWL Full. If you write OWL DL there are restrictions on what you can express, but you do then have tractable algorithms. It's the tradeoff between expressivity and complexity, in short.

      SHOE was primarily the result of Jeff Heflin's PhD research, and he used his experiences of writing SHOE to good effect on the W3C's Web Ontology Working Group (which produced OWL, based on the DAML+OIL language).

      re: the embedding of SW data into web pages, there's a specification currently called RDF/A in the works at W3C that describes an XHTML-based serialisation for RDF data that will address the long-standing issue of embedding RDF metadata into web pages.

      (I must declare an interest here, since I was also a member of WebOnt)

    6. Re:Google watch out... by ngibbins · · Score: 1

      From the IEEE Spectrum article:

      WebFountain works by converting the myriad ways information is presented online into a uniform, structured format that can then be analyzed. The goal is to provide a general-purpose platform that can allow any number of so-called analytic tools to sift the structured data for patterns and trends. Creating the needed structure automatically is WebFountain's big advance, because it requires at least some understanding of what the information actually means.

      WebFountain complements the Semantic Web, rather than competes with it; it's primarily about information extraction or knowledge acquisition, which is something that SW researchers like myself recognise as an issue with some of the SW rhetoric.

      The examples of the "uniform structured format" given in the article are arbitrary pieces of XML markup, but this could as easily be RDF or OWL. Adopting SW technologies would benefit WebFountain by providing a foundation for defining the meaning of the common structured format (using the model-theoretic semantics for RDF or OWL) and expressing domain- and task-specific vocabularies or ontologies that can be used in the semantic annotation of the unstructured data.

      If there's one thing that the past two decades of knowledge engineering research have taught us, it's that the one-size-fits-all ontology is a myth, despite what Doug Lenat may claim. The characterisation of the SW environment as one in which everyone agrees on a single common tag set is not a vision of the SW that I recognise!

  2. Loading... by DrinkingIllini · · Score: 1

    As soon as you even begin to type it is loading something, it keeps loading with each character, guessing it is the autocomplete "feature" but it loads too slowly for me to tell. Anyone else have any luck?

    1. Re:Loading... by fa2k · · Score: 1

      It's /.'d

  3. From the check it out link... by Anonymous Coward · · Score: 2, Funny
    "Search on TAP was built to answer the following types of queries: There are also two actors named Harrison Ford: the one who played Han Solo, and a silent film star from the 1920's."

    That's nice and all but who shot first and is there a mash up of both scenes with crazy alien bar music mixed with 20's sinister piano.

  4. best with firefox by cloudreader · · Score: 0, Offtopic

    We also have a smart autocomplete system, type "tom hanks birth" slowly to see it in action (best with Firefox).
    In the early days one can see lots of "Best with internet explorer". 'It is nice to see best with firefox for a change'

    --
    sigbldr is currently in pre-alpha.
    1. Re:best with firefox by Timesprout · · Score: 4, Insightful

      No, 'works best with Firefox' is just as bad as 'works best with IE'. What would be nice would be to see 'works best with any standards compliant browser'.

      --
      Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
      What truth?
      There is no dupe
    2. Re:best with firefox by cloudreader · · Score: 1

      You have a good point here. But in my experiance "works best with firefox" is equivalent to it follows standards.

      --
      sigbldr is currently in pre-alpha.
    3. Re:best with firefox by FidelCatsro · · Score: 1

      "works worst on MSIE" that would do

      --
      The only things certain in war are Propaganda and Death. You can never be sure which is which though
    4. Re:best with firefox by Anonymous Coward · · Score: 0

      No, firefox is just as bad as IE was in its early days when it comes to standards conformance peculiarities.

      Try browsing with Opera for a while to see the problems with pages that claim to be standards compliant when they are in fact IE and mozilla compliant.

    5. Re:best with firefox by bcmm · · Score: 1

      Well, maybe Fx can do whatever they need fastest? Maybe they use pipelining or something?

      --
      # cat /dev/mem | strings | grep -i llama
      Damn, my RAM is full of llamas.
    6. Re:best with firefox by RobMcCool · · Score: 1

      Phrasing it that way, that it works best with any standards compliant browser, doesn't get the point across to those who think IE is a standards compliant browser.

      Search on TAP has been tested with Firefox on Linux, Windows, and OS/X, and with IE on Windows. I think Andy might have tried it with Safari. I haven't tested it with Opera. With IE, I had to redo how the dynamic HTML was being generated twice to get around its limitations, and it's still ignoring my alignment tags.

      Saying it works with standards compliant browsers assumes the reader knows that IE sucks, which isn't always the case.

      Besides, I'm ex-Netscape, we're supposed to cheese people off with our browser rah-rahs.

    7. Re:best with firefox by FlynnMP3 · · Score: 1

      It would be great if people would say their website works with any compliant browser. But much of the world doesn't care. In my opinion that's because standards doesn't carry connotations with anybody besides web/standards geeks.

      Now the cute little firefox plushtoy (have you seen it?) - that's what people will remember. As long as you keep the FF designers on the straight and narrow wwith regards to implementing web standards, then everybody gets what they want.

      Course, some will argue that Firefox isn't very complaint, or that it could be more complaint, or whatever predilection that their brain dreams up.

  5. 600MB?!?!?!? by phantasma6 · · Score: 1

    You're linking a 600MB file from slashdot?

    (oh, and I'm getting 503's for the searches)

  6. How 'bout... by Himring · · Score: 1
    --
    "All great things are simple & expressed in a single word: freedom, justice, honor, duty, mercy, hope." --Churchill
  7. autocomplete by cryptoz · · Score: 5, Insightful

    Autocomplete is a useless feature that nobody wants to see when the type "a"...and see it load everything that beings with "a". The user is not interested in items starting with "a". Perhas they're interested in terms beging with "anon" or something, which has many fewer items to load, therefore making the load time much faster and not annoying the user in the process.

    Or, even better, never have any autocomplete turned on automatically. Do a VB-like idea, where if you want to see possibilities at a certain point, hit a specific key that will register for the list to pop down.

    1. Re:autocomplete by coolcold · · Score: 1

      or maybe limit the auto complete list to like 20 results?

      auto complete list will not show up until only 20 results are returned

      of cos, this is just another example

      --
      I am harvesting funny/good quotes. Please help by putting them in your sigs :)
    2. Re:autocomplete by davegust · · Score: 2, Insightful

      Have you tried Google Suggest? Auto complete is very useful when it doesn't slow down the typing, and when the results are in a useful order.

    3. Re:autocomplete by Anonymous Coward · · Score: 0

      No, its not.

    4. Re:autocomplete by no+soup+for+you · · Score: 1
      Autocomplete is a useless feature that nobody wants to see when the type "a"...and see it load everything that beings with "a".

      Thats just usability, the concept is sound. Instead of filling in results with "a", fill them in on three letters like "ast", which could have asterisk, astronaut, etc. The idea is to 1) save time by not making them typein an extra 6 letters and 2) cut down on misspellings.

      --
      If you blog it...
    5. Re:autocomplete by RobMcCool · · Score: 1

      Normally I would agree with you, but we added autocomplete for a very real reason.

      As a prior post pointed out, the most important problem with the Semantic Web is getting people to generate data. Until that happens on a widespread basis, the data coverage will always be spotty compared to a keyword engine.

      We added the Autocomplete dropdown in response to user feedback that they had no idea what was in the system until they hit "enter", and by then it was too late. The dropdown gives immediate feedback about whether or not the system has a clue.

      The current autocomplete implementation is clumsy and harrasses the user with some garbage that it shouldn't, and it's also missing an important feature (suggesting properties to add to class requests), but since this is research we get to play around with ideas that might not pan out.

  8. Semantic Web? by DoctoRoR · · Score: 4, Informative

    The Stanford research is interesting, but I'm still trying to make up my mind about the Semantic Web, learning about RDF, and whether I need to bake in ways of handling these kinds of assertions in my web app. The Stanford group writes, "Our hope is that our search application spurs development of the Semantic Web, and leads to sites publishing their data in this format so that we don't have to." It obviously takes more work to encode such information and getting user contributions auto-marked for the semantic web. For a counter viewpoint, take a look at some of Clay Shirky's work -- in particular:

    Will the semantic web be supported by future versions of Drupal, phpBB, and other grass-roots content management web apps? Not sure. Since a lot of the content is visitor generated, you would have to build in ways of providing easy markup. Would be interested to hear /. thoughts on the matter.

    1. Re:Semantic Web? by maharg · · Score: 1

      Until truly intelligent semantic classifying engines are available, the semantic web is best suited to things like wikipedia where the information is (generally) of a higher quality than what you find on a more general purpose site, like the one you are viewing now !!

      For example, a slashdot story about a newly discovered type of <crab type="crustacean"/> would soon degenerate into postings about <crab type"venereal disease"/>. Marking quickie (pun intended) posts up semantically would detract from the whole slashdot experience.

      Thinking about it though, slashdot already has options for various mark-up schemes (plain old text et al), perhaps a semantic markup option could be useful in some cases where the poster could be bothered...

      --

      $ strings FTP.EXE | grep Copyright
      @(#) Copyright (c) 1983 The Regents of the University of California.
    2. Re:Semantic Web? by daviddennis · · Score: 1

      I have never been fond of articles like this. Slashdot points us to something new (at least to me), and links to horribly long-winded and incomprehensible explanations of what it is. Sure, I could understand them ... if I had an extra hour or two.

      Since it's obvious that you do understand, would it be possible for you to come up with a 1-2 paragraph explanation of what the Semantic Web is and does?

      I've spent some time on the linked to web site, and read Clay Shirky's essay, and I'm still not sure what it is and how it works.

      Many thanks.

      D

    3. Re:Semantic Web? by smartdreamer · · Score: 1
      take a look at the official w3c reference there. Read the header (first paragraphs). That's a very basic introduction.

      In short, the goal of the semantic web is to make the web (semantic) understandable to computers (by any mean possible). This, to bring new possibilities and automatism. For this to be possible, we need to explicit things in a formal manner.

    4. Re:Semantic Web? by miro2 · · Score: 1

      Clay Shirky's objections don't hold water. His examples of faulty logic assume that RDF statements should be reasoned on in isolation. In fact, many systems which pair truth-values with statements are quite capable of avoiding the faulty logic he claims is an inherant consequence of using RDF statments. Look at http://www.cogsci.indiana.edu/farg/peiwang/papers. html NARS or probabilistic term logic for example.

    5. Re:Semantic Web? by daviddennis · · Score: 1

      I guess what I'd like to see, instead of a vague initial paragraph and pages of formal specifications, is a concrete example of how you would code this, and then how it would be used.

      Many thanks.

      D

    6. Re:Semantic Web? by smartdreamer · · Score: 1
      Okay, you want more than words... I guess you ask to much. ;)

      Semantic web is not something you can thing of as a concrete application nor we can consider it mature. As you surely read, semantic web is an extention of the current web. So I can link you to firefox or some HTML editor. Joke aside, it is more complicated than that and if you want to embrass semantic web you should get to know XML, RDF and OWL (in this order). In fact, if you are not working to build sw, you should consider another approach. I suggest you to look at RSS there and foaf which are IMHO concrete, but limited, examples of semantic web working examples.

      As a web developper... try to generate web pages from RDF (mindswap as some tools) or XML (ala gentoo) source.

    7. Re:Semantic Web? by TrappedByMyself · · Score: 1

      I wouldn't worry about it right now. The Semantic Web and RDF and OWL and Ontologies all that jazz are still mostly in the academic circles. I could go one step further and argue that "metadata" is still an academic subject. It's big at universities and in the $$ government contracting world, but the average joe application of it all just isn't here yet.

      The smarties of the world know that metadata can be used to do all sorts of great things, but it just hasn't happened yet. The technology and the understanding of it still needs to mature

      --

      Help me take back Slashdot. When did 'News for Nerds' become 'FUD and Conspiracy Theories for Extremist Nutjobs'?
    8. Re:Semantic Web? by Anonymous Coward · · Score: 0
      I'm not in a position to tell tales out of school, coming, as I do from an Economics/Commerce background, but I've been reading Clay Shirky since his days at FEED and find him, then as now, not strong in logic.

      From his bio : "Mr. Shirky graduated from Yale College with a degree in art, and prior to falling in love with the internet, he worked as a theater director and designer in New York."

      I think a strong sense of drama still embues his writting.

    9. Re:Semantic Web? by Albinofrenchy · · Score: 1

      Alright, so let us check out a sample from OWL (Web Ontology Language):

      Wine Rdf

      Look through that RDF with emacs/notepad. You will probably not understand all of it, but you can get the gist of it. It attempts to classify things categorically almost, so finding out context for a word is simple. For instance, the owl:Class of "Wine" is a subClass of "PotableLiquid" with a couple restrictions and properties that wine could have in real life.

      Why is this useful? It dramatically increases the level at which computers can understand information. In theory, if you tell a computer with this RDF file, "This here is red wine", it will know by inference that the object is a wine that you drink, not that annoying people do, and it will be able to guess at other properites ( such as maker, year made, etc ).

      I am intrested in AI and this applies to my job, so this is fantastic news to me.

      --
      "A man is but the product of his thoughts what he thinks, he becomes." -Mahatma Gandhi
    10. Re:Semantic Web? by Anonymous Coward · · Score: 0
      I agree. After reading Shirky's article is is quite apparent that logic is not his strong suit. There are several conclusions that he draws from syllogisms that just don't hold true.
      For instance he states:
      - US citizens are people
      - The First Amendment covers the rights of US citizens
      - Nike is protected by the First Amendment

      You could conclude from this that Nike is a person, and of course you would be right. In the context of in First Amendment law, corporations are treated as people. If, however, you linked this conclusion with a medical database, you could go on to reason that Nike's kidneys move poisons from Nike's bloodstream into Nike's urine."

      That is totally false! The conclusions that he states just don't follow. If statement 2 was "The First Amendment covers the rights of ONLY US citizens" then his conclusion would follow. In fact, the first admendment covers the rights of other entities as well as US Citizens! Which is implied in the paragraph following the syllogism.

      Furthermore, despite his bad logic, Shirky is missing the larger point. The semantic web or the web in general was never meant to prove by deductive logic things that are subjective. It is incumbent on the user to interpret the results based on context. If I find a website that says "John Loves Mary" I don't accept that as fact.

      Regardless, this technology does have the potential to simplify data mining amongst different sources of data. Which is what its intended to do in the first place. The accuracy of the result is only as good as the data that it is derived from. Which is where trusted sources comes in to play. Should I reject using the Encyclopedia Brittanica just because, someone somewhere will post false information.

      In fact why don't we just all individually resort to raw empiricism, because if somebody can't be trusted it logically follows that nobody can be trusted!
  9. Can someone explain this in non-geek please? by Anonymous Coward · · Score: 0

    I'm tech-illiterate but interested none-the-less.

  10. Note to self by DrinkingIllini · · Score: 0, Troll

    Don't post ill-prepared, university hosted laboratory website on slashdot.

  11. slashdotted by maharg · · Score: 2, Funny
    --

    $ strings FTP.EXE | grep Copyright
    @(#) Copyright (c) 1983 The Regents of the University of California.
    1. Re:slashdotted by RobMcCool · · Score: 1

      I think I'll lock the door so the IS department can't find me.

      There's a coral cache of the static content, including screenshots, if you can't get through to my melted pile of servers.

    2. Re:slashdotted by Anonymous Coward · · Score: 0
  12. Semantic Web Pitfalls by aftk2 · · Score: 3, Insightful

    While the idea of the semantic web has been legitimately lambasted, I think it's a bit far from DOA. While I agree that it's not exactly practical, I think that if you get enough sites displaying their content in such a manner, you'll eventually reach a point at which others will do the same.

    I mean, think about it this way - while laziness or inertia might initially win out, once someone's competitors start to explore the idea of the semantic web, interest will start to be shown in it, especially once it becomes either profitable to do so.

    --
    concrete5: a cms made for marketing, but strong enough for geeks.
    1. Re:Semantic Web Pitfalls by Omnifarious · · Score: 1

      Well, part of Shirky's point is that it is so lacking in usefulness that there will be no advantage to anybody for display their content that way. I think he's right. I've watched AI based on these kind of logical rules and semantics stumble along for years without producing anything useful, and then along comes some program that takes little pieces of what other people said and 'mindlessly' strings them together in new ways and it wins a Turing contest.

      Logical reasoning of this kind, despite all the hype, is extremely overrated.

    2. Re:Semantic Web Pitfalls by Eternally+optimistic · · Score: 1

      It gets worse: the method relies on the web site content author to know the semantic content, and to honestly report it. How would you check these things? Voting to determine if the earth revolves around the sun?

      --
      What keeps me going is my inertia.
    3. Re:Semantic Web Pitfalls by RobMcCool · · Score: 1

      Logical reasoning is currently primitive and definitely overrated. We don't use OWL. The reasoning we do is very primitive, and is not of the sort that Clay Shirky is talking about. I actually agree with the thrust of his essay, despite the flaws that others have pointed out.

      TimBL has talked about the Semantic Web as less a thing of logic and more like a giant database. I think that characterization has some problems also, but it's closer to what Search on TAP is doing.

    4. Re:Semantic Web Pitfalls by RobMcCool · · Score: 1

      We haven't really dealt with the spam problem because it's a problem we'd love to have. Right now there's so little content that we can afford to only pick the highest quality sites.

      The automated techniques like those WebFountain uses are susceptible to the same problems, as is Wikipedia, so I'm not convinced that this is necessarily a Semantic Web problem as much as an Internet problem.

    5. Re:Semantic Web Pitfalls by Eternally+optimistic · · Score: 1

      As far as spam goes, and mistaking popularity for correctness, yes you are right, and both of these are a big problem already.
      But there remains the problem that this technique does not find semantic connections that the authors don't know about.

      --
      What keeps me going is my inertia.
  13. This won't work by holyshitholyshit · · Score: 2, Interesting
    Firstly scraping is the same as what google does, which is fine but only a fool would trust the scraper not to censor their output.

    Secondly, scraping doesn't always work and you will surely have low-grade porno and get rick quick schemes/scams littering your sematic data.

    But let us suppose that the main benefits of a semantic web are (A) access to reference data [which may be falsified, oops], and (B) access to product availability data [which may be falsified, oops, like mail order companies that pretend they have something in stock but don't and yet still charge your credit card].

    It's just won't work.

    It will always be a rough approximation of reality.

    It's just a way of bad way of caching the results of scraping.

    1. Re:This won't work by johnnyray · · Score: 1

      I don't think that scraping html pages is the point of this project. They scraped web documents in order to construct an RDF encoded knowledge base that is searchable as a semantic web.

      A semantic web relies on document authors to encode machine readable meaning into their documents. This codifies meaning and removes a burden of audience (read search tool) inference. It allows authors to apply another layer of clarification. This can be considered a burden or an opportunity by the author.

      The issue of trust can be addressed by the semantic web adding more meaning to the original document. This is how the shared experience of web users can be leveraged to handle untrustworthy sources.

      Nothing can replace human, tacit knowledge based filters as to what is useful and what is BS, but this is a new methodology that is better than what we've got.

  14. A similiar project that I worked on at school by matt_king · · Score: 1

    Check out QuASM (Question Answering using Semi-Stuctured Meta-data)...we used similiar processes and approaches to getting "answers" out of a large (40 TB) collection of .gov, .edu, .org web pages. The demo page is no longer available (we completed work on this in 2002) but you can checkout the paper at ACM:

    http://portal.acm.org/citation.cfm?id=544220.54422 8

    It was a really interesting project to be a part of!

    Go UMass!

  15. A tale of two technologies.... by Crimson+Dragon · · Score: 3, Interesting

    The Semantic Web appears to be a budding server-side solution to the paradigm of information glut online. Social bookmarking appears to be a client-side solution to the paradigm of information glut online.

    It is refreshing to see exciting new solutions to the problems we have at present of targeted information retrieval on the internet. I can remember years of stagnation in this field (read: early 90's), and any change from today's google-and-pray searching mentality among the majority of end-users will be welcome.

    --
    The Crimson Dragon
    1. Re:A tale of two technologies.... by Anonymous Coward · · Score: 0

      Years of stagnation in this field ... before the Internet?

    2. Re:A tale of two technologies.... by Anonymous Coward · · Score: 0
      any change from today's google-and-pray searching mentality among the majority of end-users will be welcome

      I have already changed my mentality to wikipedia-and-find searching. ;-)

  16. One more step... by LegendOfLink · · Score: 1

    ...towards the future.

  17. And the big deal is??? by Anonymous Coward · · Score: 0

    What does this give you over HTML + Search Engine, except it's 10x harder to code (and crumbles under Slashdot effect). Using Google with the first keywords that popped into my head:

    Population of Portugal: portugal population [first match]

    Buildings Taller than Sears Tower: tallest buildings world [first match]

    Roller coasters faster than 80mph: fastest roller coasters [fourth match contains link to table of top-10 fastest roller coasters]

    Countries with population greater than Indonesia: coutries population indonesia [first match]

    Color me unimpressed with the Semantic Web...

    1. Re:And the big deal is??? by Jane_Dozey · · Score: 1

      One word: Context.

      Currently keywords are used to search for relevant matches and yes, this seems to work ok for lots of things but imagine if you could add context:

      Imagine searching for the title of a peice of music that you heard in a certain film.
      Currently this could involve some digging but a semantic search engine could very quickly narrow this search. Have a look at this (theres a demo somewhere on the site). It's a research project run by Southampton Uni. It's pretty basic but hopefully you'll get the idea.

      --
      Silly rabbit
  18. standards-compliant means by jbellis · · Score: 1

    "everything but IE"

    not entirely, but pretty close -- if you write compliant html/js, it has an excellent chance of working in all of {firefox, opera, safari}

  19. awesome! by Anonymous Coward · · Score: 3, Funny

    ...now I can finally search for "images of women with breasts larger than 36D"!

    1. Re:awesome! by Anonymous Coward · · Score: 0

      Once again, this can only work if the content providers can be convinced to add the semantic web markup. Specifically, for your query to work, each image has to be tagged by the exact breast size. By comparison, making the logic that knows which size is larger and what breast size means, is simple. Of course, the latter part is what the semantic web researchers are concentrating on. They are leaving it for 'someone else' to do the boring work of writing down the exact breast sizes for each women in every image. Once again the question is: who is going to pay for that work?

    2. Re:awesome! by Anonymous Coward · · Score: 0
      They are leaving it for 'someone else' to do the boring work of writing down the exact breast sizes for each women in every image. Once again the question is: who is going to pay for that work?
      You're gay, right?
    3. Re:awesome! by RobMcCool · · Score: 1

      I'm not sure our sponsors in the military and intelligence agencies would fund research in breast sizes. They're kind of a sensitive bunch. But maybe I'll stick it in a proposal and see what happens.

      I think the comment that semantic web research has focused on logic such as query analysis, comparisons, and groupings is fair for the Semantic Web in general.

      For Search on TAP we don't have a lot of people or resources. Despite that, I spend an awful lot of time generating data. The compressed RDF, which we've made available for people to play with, is over a hundred megabytes.

      If the Semantic Web is going to happen, there needs to be a lot of data, so we're doing everything we can to make that data available for people to use.

    4. Re:awesome! by mikefe · · Score: 1

      Idiot.

      Just go to any hardcore site for that.

      Maybe that's why sites showing non-professionals are becoming more popular. Not everyone likes the big fake ones...

      --
      There: Something at a specific location.
      Their: Owned by someone.
      Please make sure your english compiles.
  20. Tom Hanks Birth??? by pulse2600 · · Score: 0

    God I hope they don't include an image search...that could be worse than goatse...

  21. RDF, RDFS, DAML, OWL?? by Anonymous Coward · · Score: 0

    It is interesting to note that they are only using RDF. What about RDF Schema, DAML, or OWL (the successor to all of these)? It has been my experience with all of these that they are too complicated for the everyday person to use and thus only a select few will be able to perform the markup. Only some excellent ubiquitous markup tools could alleviate this. My guess is that these will end up being academic solutions at best. Not to knock academia, but sometimes they overcomplicate things in the spirit of completeness or correctness and thus make it too impractical for everyday use.

    1. Re:RDF, RDFS, DAML, OWL?? by Reverend528 · · Score: 1

      RDFS and OWL are both RDF formats.

  22. Full disclosure by Anonymous Coward · · Score: 0

    One should also point out, "At Stanford KSL, we are paid by DARPA to like the Semantic Web.

    1. Re:Full disclosure by wan-fu · · Score: 1

      Actually, I saw a preso for this project a while ago. It was pretty neat, showed a lot of promise, and I see that it's been progressing nicely. Stanford KSL actually DOES like the Semantic Web. Sure, they receive DARPA funding, but that's not why the like the Semantic Web. Also, some of the features/scrapers have been built as requested by the gov't, but it's not like the entire project is for the gov't.

    2. Re:Full disclosure by Anonymous Coward · · Score: 0

      Darpa funding go bye-bye. I thought darpa was cutting back on research, which means the W3C projects currently funded by Darpa are out of luck. no money for semantic web.

  23. RDF? by Thnikkaman · · Score: 1

    We've used that RDF to create a search application... Steve Jobs is the only one using any RDF to get applications made. Has he finally gotten a distortion field so big that others think they have them? hmmmm....

  24. Might actually help by Artifakt · · Score: 3, Insightful

    This looks like it will broaden the volume of useful searches. Right now, there are at least two limits that show up when searching:

    1. For really popular subjects, the useful links are swamped in the noise of sites trying to make a buck off of getting you to look at their ads before directing you to somewhere else, that might have the actual content or might not.

    2. For many less popular subjects, there is some oddity, like an unusual term being borrowed by some other field, so that it is something most people have never heard of, but people in two or more specialties use it frequently, in very different ways. resulting in strangeness. (i.e. the search engine throws up 23,003 links for a search on "Sator Resartus". 30% are esoteric literary criticism, 20% relate to apoptosis (cell biology), 20% relate to building moral inhibitions into A.I., 10% to Keith Laumer novels, and the rest are probably noise).

    (I'm sure there are more than these two limits. Someone else may want to comment on some others).

    This is likely to help with the second case, oddities in the data set grouping. (it could sort links into the larger sub-categories, query the user which one(s) seemed most applicable, and maybe even sort out a small set of links that explain, for the previous example, how a high brow literary term got borrowed by the other fields).
    It's not as likely it would help with the first case, though, as sites that don't have actual content are actively duplicitous. Something that is actively trying to fool humans is still likely to be very successful at fooling our tools.

    --
    Who is John Cabal?
  25. Semantic Horse shit by Anonymous Coward · · Score: 1, Interesting
    I hate to say it, but Semantic Web blows chunks. No business is ever going to tag all their data so that anyone can use it. Business prefer to build specific webservices to integrate and charge customers. Get a clue W3C, RDF is fertilizer. So far, all the RDF rule engines out there suck from a scalability and performance perspective. There are two RDF rule engines that claim to implement RETE, but several people have analyzed it and shown that neither Jena2.2 nor pychinko implement RETE.

    The best part is the W3C looks down on the business rules world and openly snubs them. for a long time, the W3C camp snubbed RETE algorithm, claiming RDF graphs are better. Once people saw how horrible RDF engines perform as rule count and data increases, they rushed to hack together junk and label it RETE. Sorry, but you have to first understand RETE to implement it. A clueless bunch of impractical day dreamers.

    1. Re:Semantic Horse shit by Anonymous Coward · · Score: 0

      The thing about this is that it appears to be a solution in search of a problem.

      While one might say that the "problem" is unorganized data, this isn't actually the solution, because it doesn't work until the data is organized.

      It's like the solution to the problem is "the web is unorganized", these "semantic web" people say "well organize it! There, problem solved!"

    2. Re:Semantic Horse shit by matt_king · · Score: 1

      The Internet is used for things other than businesses, or have you forgotten that? The concept of a Semantic Web has huge implications for many reseach projects, as a way to get better "information" out of all the "data" that is available on the internet today.

      Although it would be nice, no one is mandating or asking every website out there to mark up all their pages semantically. But if you want your information to be shared, a good way to start is to mark it up semantically so that more and better information can be gleaned from it.

    3. Re:Semantic Horse shit by f00zbll · · Score: 1
      The thing about this is that it appears to be a solution in search of a problem. While one might say that the "problem" is unorganized data, this isn't actually the solution, because it doesn't work until the data is organized. It's like the solution to the problem is "the web is unorganized", these "semantic web" people say "well organize it! There, problem solved!"

      semantic web definitely is a solution in search of a problem. it's probably naive to think organizing data is easy. the original post was a bit harsh, but that doesn't negate the real problem of unorganized data. demanding the world organize the data is never going to make the dream materialize. People are lazy and that isn't something technology can change without a frontal labotomy.

    4. Re:Semantic Horse shit by Anonymous Coward · · Score: 2, Insightful

      Nice straw man argument. How many people making their own personal site is going to dedicate 2/3 of their time to tag their content? The only people that are going to tag their content are those looking to abuse the system. No sane individual is going to spent 3 months of time to go back and edit all their pages with tags. Even then, you still have the problem of conflicting categories (aka ontologies). There will never be a globally accepted set of Onotologies. It's all pipe dream. Why should users spend hours and hours to tag their site when google is already doing a good job of indexing pages?

    5. Re:Semantic Horse shit by smartdreamer · · Score: 1
      In fact semantic web is already there in some forms : foaf, mindsap site or think of every RSS feeds.

      People who don't have a clue about semantic web tend to refer about it as semantic horse shit. It's a petty that those who don't believe in things try to demolish them rather than let it go... or let it perish if they are so sure about it's doom.

    6. Re:Semantic Horse shit by matt_king · · Score: 1

      It's going forward that this is important...if you are designing a new site today, it might be worth your while to try and represent the data semantically. Just as real web designers no longer design with nested tables *shudder* and use CSS to seperate out presentation logic from content, so too will people start going even deeper, into making their "web data" into "web information.

      This is not about arguing over a set of standards over the ontology of how the data should be represented; this is about thinking forward as to how to better design web sites to get their information across in more ways than just a human reading the text on the screen.

    7. Re:Semantic Horse shit by Anonymous Coward · · Score: 0
      It's going forward that this is important...if you are designing a new site today, it might be worth your while to try and represent the data semantically. Just as real web designers no longer design with nested tables *shudder* and use CSS to seperate out presentation logic from content, so too will people start going even deeper, into making their "web data" into "web information. This is not about arguing over a set of standards over the ontology of how the data should be represented; this is about thinking forward as to how to better design web sites to get their information across in more ways than just a human reading the text on the screen.

      you referring to personal websites or corporate websites. Look at how aweful most personal websites are. HTML is popular because it quick and dirty. the data on how people build websites does not back up that opinion. after 10 years, most websites still look like crap. corporate websites have moved away from bad practices, but not personal sites.

    8. Re:Semantic Horse shit by ultranova · · Score: 1

      I hate to say it, but Semantic Web blows chunks. No business is ever going to tag all their data so that anyone can use it. Business prefer to build specific webservices to integrate and charge customers.

      Fine with me. I don't want their information. In fact I'd like to get rid of their information (banner ads and spam).

      If I want to deal with businesses, I go to my local shop. If I can't find what I want there, I look up the yellow pages of my local phonebook. If I can't find what I want there, I look up a specialized website. In no case will I start googling for businesses (except for the home page of some particular company).

      Internet's main use is not a marketplace. It's main use is information exchange between individual people and non-profit organizations. Elfwood, Wikipedia, kernel.org, Debian... Those are what makes Internet hot, not businesses.

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    9. Re:Semantic Horse shit by pfafrich · · Score: 1
      There quite a few things where people might want to use some semantic mark-up:
      • Creative Commons, use rdf to specify copyright and licence info about a page, you can now search on this using special pages on google and yahoo.
      • Anyone who want to sell something, will be interested in making their content easy to find. A little bit of semantic mark-up , could help them shift units.
      • Anything pulled out from a database. Here its relatively easy to modify the code to add some extra mark-up.
      • Tagging this seems to be all the rage, with sites like deli.cio.us et al
      • Specialised content, stuff like dates, contact info

      My guess is we'll see more and more semantic mark-up creep in through the back door. A few years time we be griping how MS has invented its own tag format. My guess is that we'll see p.s. Why is this in the Hardware section?

      --
      There are four sorts of people in the world: fools, lunatics, idiots and morons. - Umberto Eco, Foucaut's pendulum.
    10. Re:Semantic Horse shit by handslikesnakes · · Score: 1
      corporate websites have moved away from bad practices, but not personal sites.
      Depends on whose sites you're looking at. I'd bet that most off-the-shelf blog software spits out better markup than Google (or Amazon or eBay or Slashdot or...), for example.
    11. Re:Semantic Horse shit by MarkWatson · · Score: 1

      You have a valid point of view, but just one quick clarification:

      Rete scales really well as you add rules but scales really poorly with the number of items in working memory.

      I believe that rete would be a bad choice for the SW where you would have a very large data set in working memory.

      (I used to do a lot of rete hacking: commercial expert system tools for Xerox Lisp Machines and the Mac, and hacking OPS5 to support 'multiple data worlds' for in house use.)

  26. My question by News+for+nerds · · Score: 4, Interesting

    Does it have a countermeasure against 'semantic spam'?

    1. Re:My question by smartdreamer · · Score: 2, Interesting

      There is no such thing as semantic spam. What you refer to is desinformation or information junk. Like the actual web, semantic web is about freedom, openess and accessibility. So, everybody can publish (I don't refer to governement laws, repression, etc.). But semantic web has a solution to this wave of information in a thing called the web of trust which propose giving trust ranking to information and introduce inference engines to compute which links/sites may interest you and why. But this is not for today. ;)

    2. Re:My question by RobMcCool · · Score: 1

      I replied to a lower-scored post with this question that we haven't had this problem yet, but that it's a problem that exists with any technique, whether it's Wikipedia, and automated technique like WebFountain, or the Semantic Web. It's an Internet problem.

      A followup to this post mentioned using a web of trust to counteract spam. That's something that Guha has done a lot of work with, and Paulo is working in the lab here on some prototypes based on movie data.

      Spam is a problem I would love to have because it would mean that people are serious enough about the Semantic Web to find something to gain in spamming it.

  27. In Related News by MrAnnoyanceToYou · · Score: 1

    The average starting salary offer for Stanford graduate students has raised 30% in the last hour, as Microsoft, Google, and Yahoo each vied tooth and nail for their services.

    (starts filling in application)

    1. Re:In Related News by Anonymous Coward · · Score: 0

      The average starting salary offer for Stanford graduate students has raised 30% in the last hour, as Microsoft, Google, and Yahoo each vied tooth and nail for their services.

      Consider that two of those companies were started by Stanford graduate students and that both consist primarily of Stanford grad students. Microsoft is just late, as usual.

  28. auto-complete by derubergeek · · Score: 1
    One wouldn't think this would be particularly newsworthy here in supposed geek-haven, but Google has an auto-complete feature as well.

    Of course, it's a beta feature at Google Labs. FYI...

    --
    Trust me. This is an inactive account. Regardless of what the /. bean counters might report.
    1. Re:auto-complete by Anonymous Coward · · Score: 0

      WHOA!!! NO SHIT!!! And it actually WORKS!!! Unlike any of the stanford links.

      Google pwns the w3b

  29. Check Owt Piggy Bank - Semantic Web Firefox Gmonky by Anonymous Coward · · Score: 0

    Piggy Bank is an eleet RDF creating, greasemonkey web scraping, meta plugin.

    http://simile.mit.edu/piggy-bank/

    props to waxy 4 the link

    http://waxy.org/links

    check out Sir Tim Berners Lee the Knight that goes nee rap on the semantic future of the web at the Royal Society London - total futurosity.

    http://www.royalsoc.ac.uk/page.asp?id=3110

    Do you think theres a porn site somewhere using sign ups to process secretly referred slashdot catchems?

  30. Slashdotting Google bomb? by bcmm · · Score: 2, Interesting

    That second link goes to http://www.google.com/url?sa=U&start=1&q=http://ww w.w3.org/2001/sw/&e=9707
    How is that different to linking to http://www.w3.org/2001/sw/?

    Is Slashdot trying to improve someone Google ranking?

    (Also, did Slashdot always linkify URLs entered as plaintext? I didn't write any "a href" for those two.)

    --
    # cat /dev/mem | strings | grep -i llama
    Damn, my RAM is full of llamas.
    1. Re:Slashdotting Google bomb? by RobMcCool · · Score: 1

      No, I just did a search for "semantic web" and copied and pasted the first result. I didn't realize they were sending people throught google.com/url now; it used to just go straight there. When did they start doing that?

    2. Re:Slashdotting Google bomb? by Stauf · · Score: 1

      They always did it, for a random number of links every few queries or so. It's so they can collect data on which sites people thought were relevant to their query. These links seem to have become more and more common though.

    3. Re:Slashdotting Google bomb? by bcmm · · Score: 1

      Thats right and proper and everything, because thats part of how they rank pages. Your explaination was nice, because I had been noticing both direct and monitored links and wondering what was going on.

      --
      # cat /dev/mem | strings | grep -i llama
      Damn, my RAM is full of llamas.
  31. I'm still trying to figure out... by rah1420 · · Score: 2, Funny

    ...not only what the Semantic Web is about, but more pragmatically why this is in "Hardware." :)

    --
    Mit der Dummheit kämpfen Götter selbst vergebens.
    1. Re:I'm still trying to figure out... by Anonymous Coward · · Score: 0

      Easy, that was semantic web in action!

      Explicitely, there was an RSS feed coded in OWL-XML streams that were improperly translated into WDL-OIDL tokens that resulted in a misconfigured relation of term 'sw' as being part of a 'bicycle', the rest was just the normal WODL-RDFS-XTHML processing.

  32. The semantic data is already there by saddino · · Score: 1

    Although I find the Semantic Web project intriguing, the idea of tagging data to define it is somewhat of a cop-out. The "meaning" of any given page is already there: in the page. Instead of spending so much time tagging pages, how about working on algorithms to derive meaning from the content. Surely those in the field of Computational Linguistics can make a real push at this: "artificial" corpora aren't needed anymore: the web offers more data than you'll ever need.

    Shameless promotion: for OS X users, theConcept offers an example of mining key words and phrases, and contextual elements automatically from pages returned by Google queries.

    1. Re:The semantic data is already there by RobMcCool · · Score: 1

      An automated technique that could do better than a human tagger would have an additional feature of being able to pass the Turing Test.

      I admire your faith in automated techniques, since the ones I've seen have a catastrophic error rate and can't provide particularly rich data. The state of the art there is constantly improving, though, and there's no reason why such algorithms can't generate RDF anyway. The Semantic Web is about file formats and conventions, it doesn't necessarily mean human tagging.

      For instance, at the lab here we work with IBM researchers who created the UIMA framework, and with some of the people who did WebFountain. The UIMA framework people that we work with dump us their data in two forms, a big OWL file, and a database that contains information from the extractors about where in the text each piece of information came from.

      This theConcept tool you link to, at a casual glance, looks similar to Yahoo's recent Y!Q beta. I haven't put the two next to each other to see how they compare, though, so I could be off in the weeds.

  33. RSS is not Semantic Web by Anonymous Coward · · Score: 0
    RSS is a very specific markup for syndication. Look at the RDF specification and the RSS specification. Which one is easier to understand and implement? yes, I have looked at both. RSS is dead simple, RDF is not.

    foaf and mindsap do not represent mainstream usage. If you consider that Semantic Web being reality today, then sure. But that doesn't mean it's useful for those not in the semantic web research world. It's fringe technology looking for a problem.

    1. Re:RSS is not Semantic Web by smartdreamer · · Score: 1
      I agree with you, semantic web is not a reality outside certain circles. But, my point is that it wont come like many think it will : in a big google like demo. It will come from many little implemantations like we see with foaf. We can imagine a big mainstream ISP provider encouraging users in such community.

      As for RSS, it is limited, but it took off rapidly. RSS v1.0 introduced RDF. That is another step in the right direction.
      BTW RDF isn't that complicated. Think of it as a triplet : Subject Verb Objet.

      So semantic web is coming, little step at a time.

    2. Re:RSS is not Semantic Web by Anonymous Coward · · Score: 1, Interesting
      As for RSS, it is limited, but it took off rapidly. RSS v1.0 introduced RDF. That is another step in the right direction. BTW RDF isn't that complicated. Think of it as a triplet : Subject Verb Objet.

      I don't think the evidence on RDF mailing list supports that opinion. Look at the literature in the bookstores about semantic web. If anything, it is full of confusion and the specification is poorly written compared to the HTML and XML specification.

      Triplet does not equal (Subject verb object). What the RDF spec describes is closer to Natural Language parsing concepts. There are many similarities between what the RDF describes as RDF Model graph and dependency grammar techniques http://w3.msi.vxu.se/~nivre/research/sdg.html.

      Anyone remotely interested in NLP knows the problem is very hard to solve using dependency grammar techniques. Statistical approaches have been shown to perform much better.

      Semantic Web is essentially repeating the same mistakes already made in the AI world with NLP. the W3C seems blind to these facts and that's why semantic web is doomed to fail.

    3. Re:RSS is not Semantic Web by smartdreamer · · Score: 1
      I don't think the evidence on RDF mailing list supports that opinion. Look at the literature in the bookstores about semantic web. If anything, it is full of confusion and the specification is poorly written compared to the HTML and XML specification.
      I don't know which mailing list you refer to, nor which books but the web is an excellent source of information for that matter. Take a look at links returned by google for RDF : here, RDF homepage full spec, RDF primer for some graphs and there or this excellent online book, not to mention tutorials, etc. And BTW there is many good books to buy.
      Triplet does not equal (Subject verb object). What the RDF spec describes is closer to Natural Language parsing concepts. There are many similarities between what the RDF describes as RDF Model graph and dependency grammar techniques http://w3.msi.vxu.se/~nivre/research/sdg.html.
      I said think of it as a triplet : Subject Verb Objet. That is a little inaccurate, let me correct this to Subject Predicat Object. Now, RDF is little more than that : a Resourse Description Framework (I'm not talking RDFS). Maybe my popularization confused you to think RDF as something to do with NLP but that is completely false.

      The fact is RDF is really just triplet. Not surprising that it can be represented in N3 (where 3 stands for triplet). Take a look at this example taken from wikipedia :

      http://en.wikipedia.org/Tony_Benn> http://purl.org/dc/elements/1.1/title> "Tony Benn" . http://en.wikipedia.org/Tony_Benn> http://purl.org/dc/elements/1.1/publisher> "Wikipedia" .
      which can also be represented in XML/RDF like this
      <rdf:RDF
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax -ns#"
      xmlns:dc="http://purl.org/dc/elements/1.1/">
      <rdf :Description rdf:about="http://en.wikipedia.org/Tony_Benn">
      <dc:title>Tony Benn</dc:title>
      <dc:publisher>Wikipedia</dc:publisher>
      </rdf:Desc ription>
      </rdf:RDF>
      (the output isn't pretty, see wikipedia link)

      So take another look at RDF, you'll be surprised.

  34. Gathering Metadata from Apple's Filesystem? by Oori · · Score: 1

    Seems to me there's a useful metadata resource available now due to the way that OSX-Tiger is now allowing metadata to be attached to a file (either as xattribs, or via the Spotlight keyword field). See here.
    Does anyone know if web crawlers/gatherers (google, harvest, combine etc') have the ability to access that information and associate it with the file?
    I would love an automatic gatherer extracting my metadata from the filesystem and allowing searches on it, in combination with the full text option.

  35. Bashers watch out... by Anonymous Coward · · Score: 0

    "Note to self. Dreaming about the world tagging all their data isn't going to happen. It takes too much damn time."

    Note to self. Dreaming about the world writing their web sites in notepad isn't going to happen.

    "Semantic driven search using google's technique works."

    It's spotty.

    "Producing a RDF graph is crap."

    I guess you do everything by hand.

    "Nothing to watch here."

    Nothing to understand here.

    1. Re:Bashers watch out... by Anonymous Coward · · Score: 0

      What exactly are you saying? As far as I can see, Semantic Web can only be useful insofar web designers begin to insert additional semantic information to their sites. It isn't important how they do it, but it is clear that it will require real human beings being paid to do this additional work.

      If computeres could do the RDF tagging automatically, then Google could do the exact same thing without any additional data. Semantic Web is only useful if it contains information that cannot be derived by a computer.

      So, where do you find the business case that justifies web designers all over the world spending even 10 % extra time to specify the information needed by the Semantic Web???

    2. Re:Bashers watch out... by Dasch · · Score: 2, Insightful
      So, where do you find the business case that justifies web designers all over the world spending even 10 % extra time to specify the information needed by the Semantic Web???

      if it would mean that their sites would rank higher in the search results, I'd say that they all would...
  36. Backwards by Anonymous Coward · · Score: 0

    The semantic web is a backawards step. We should be working on being able to use even more unstructured data on the web.

    And the "confirm you are not a script" on the Slashdot anonymous submissions page is an underhanded way of coercing "membership," viz. tracking and sales opportunities.

    1. Re:Backwards by smartdreamer · · Score: 1
      We should be working on being able to use even more unstructured data on the web.
      I don't see how it can be more unstructured that it already is! You want to get rid of HTML? You can! Maybe we could forget about protocols and standards? We should all have our own language, now that would be unstructured!

      I really don't know what you mean by this nor how can it be good.


      P.S.: Everyone as to "confirm you're not a script" even logged users.

  37. You missed the point! by holygoat · · Score: 2, Insightful

    The Semantic Web is about describing resources, not tagging pages.

    Indeed, you might output RDF from your processing of Web pages.

    Extracting information from semi-structured text is very different to making logical assertions about resources.

    1. Re:You missed the point! by Anonymous Coward · · Score: 0
      The Semantic Web is about describing resources, not tagging pages. Indeed, you might output RDF from your processing of Web pages

      The most prevalent form of "resource" is a page. Webservices have no need for semantic web. Describing resources is tagging the data.

    2. Re:You missed the point! by holygoat · · Score: 1

      "Webservices have no need for semantic web"

      Ignoring your grammar, I would reply: tell that to the people trying to develop Web Services standards! Specifically, I'd point you to OWL-S, and its simpler, ad-hoc cousins.

      One of the most common uses of the Semantic Web at present is describing PEOPLE (FOAF, as used by LiveJournal and countless others). Do you not see that the Semantic Web goes beyond a Web of human-readable documents into a machine-understandable Web of data? You don't find pages on the Semantic Web, you ask questions. Focusing on tagging HTML pages is hopelessly naive.

      *sigh*

    3. Re:You missed the point! by Anonymous Coward · · Score: 0
      Do you not see that the Semantic Web goes beyond a Web of human-readable documents into a machine-understandable Web of data? You don't find pages on the Semantic Web, you ask questions. Focusing on tagging HTML pages is hopelessly naive.

      tagging data with meta-data for human or computers is still tagging data. Get over it. A much simpler, less bloated format for describing data like RSS is much more appropriate than the bloated ugly spec called RDF.

    4. Re:You missed the point! by holygoat · · Score: 1

      What do you think RSS 1.0 is?

    5. Re:You missed the point! by saddino · · Score: 1

      Yes, that is a valid point. However, considering the (IMHO) substantial barriers to widespread adoption (getting authors to provide semantic descriptions, dealing with SPAM or purposefully misleading descriptions, etc.), I still would like to see more effort in context analysis research. The AI field has been floundering for so long, a catchy phrase such as "Semantic Web" (which, has been quite a successful meme) applied towards AI applications in contextual derivation could be helpful in moving things along in that direction as well.

    6. Re:You missed the point! by smartdreamer · · Score: 1
      The most prevalent form of "resource" is a page.
      You live in the past. ;)
      You are refering URL but semantic web stands on URIs which are a superset URL. If you limit yourself to URL we better not talk semantic web because it transcend the current view of resources. This concept is very significant.

      Just to be a little clearer, let's take a simple example. You want to refer to yourself. How can you do this? You can't download yourself on the net (can't you? ;). What you can have is an homepage : your page. None the less, it isn't you. That's the limit of URL ; it must describe a resource physically (should I say virtually) on the web. URIs don't have this restriction so you can refer to yourself as http://my.home.page/me/ ... or anything else for that matter as long as you always refer to this exact same URI and that it is unique in your view of the world.

      See the perspective brought by the semantic web?

    7. Re:You missed the point! by Anonymous Coward · · Score: 0
      You are refering URL but semantic web stands on URIs which are a superset URL.

      No, I'm not confusing URI with URL. The problem is that the world at large uses them interchangably. URI is abused all over the place. That's human nature. Telling the world "you're mis-understand how to URI" doesn't solve the problem that Semantic Web defines URI as authorative source. Talk about building a house on a beach with a hurricane coming.

    8. Re:You missed the point! by smartdreamer · · Score: 1
      No, I'm not confusing URI with URL. The problem is that the world at large uses them interchangably. URI is abused all over the place. That's human nature. Telling the world "you're mis-understand how to URI" doesn't solve the problem that Semantic Web defines URI as authorative source. Talk about building a house on a beach with a hurricane coming.
      This is normal that people uses them interchangably, current web uses exclusively URL, so URI is like a new concept for most. As a matter of fact, they could not confuse them becaue they were not aware of URI.

      Even though people were confused, there is nothing to worry about. The worst that can happen is that two people refering to the same thing use to different URI (in any event, it is something that is already assumed in open worlds). So, I really don't see the problem here.

  38. Useless? I don't think so. by jxyama · · Score: 1
    I don't agree that it's completely useless. Don't we all tend to type the most important query word first?

    In any case, for Japanese/Chinese/Korean - autocomplete is almost a natural part of using a web search engine, so it's not a "useless feature that nobody wants to see."

    Those languages use alphabet-based inputs which are then converted into native text. Why bother converting if you can take the direct alphabetical input and start showing native text autocompletes?

  39. What!? by HishamMuhammad · · Score: 0, Offtopic

    In the examples page, PLO and Al Fatah are listed under "Terror Organizations". This is a horrible misrepresentation.

    The PLO is the organization representing the Palestinian people that eventually evolved into the Palestinian Authority. It had observer status in the UN General Assembly and even special permission to participate on Security Council debates (sans voting rights). Al Fatah is a political party which was involved in guerilla activities in the 70s, but that has, since the Oslo Accords, accepted the statehood of Israel.

    1. Re:What!? by Anonymous Coward · · Score: 0

      You probably didn't read this entry from the Council of Foreign Relations:
      http://cfrterrorism.org/groups/alaqsa.html
      I guess the horrible misrepresentation can be evaluated by asking yourself: "are the martyr brigades more Like a terrorist organization or more UNLIKE it". categories, as we know, are not conceptualized by necessary and sufficiency relations, but by family resemblance.

    2. Re:What!? by smartdreamer · · Score: 1
      categories, as we know, are not conceptualized by necessary and sufficiency relations, but by family resemblance.
      Hey, that sounds like a logical (ontology building) reasoning. This thread is not offtopic. :)
    3. Re:What!? by HishamMuhammad · · Score: 1

      Apparently, you missed the point that I was not talking about Al Aqsa Martyr Brigades.

  40. Re: poor explanation by Anonymous Coward · · Score: 0
    I said think of it as a triplet : Subject Verb Objet. That is a little inaccurate, let me correct this to Subject Predicat Object. Now, RDF is little more than that : a Resourse Description Framework (I'm not talking RDFS). Maybe my popularization confused you to think RDF as something to do with NLP but that is completely false.

    my explanation was poor. according to the RDF spec, rdf consists of two parts: model and graph. The model represents an object, like car, cat, boat, house, etc. The graph represents the relationship between the objects, like honda->car->vehicle. The graph is suppose to allow the system to "infer" facts which are not explicitly stated. In other words, a RDF engine would be able to infer a Honda is a type of vehicle.

    If you look at what the spec describes in terms of building the Graph, it is very similar to dependency grammar techniques. After all, both attempt to interpret data.

    I read the spec plenty of times, but it is still a horrible specification. RDF engines (reasoners as RDF people like to call it) are attempting to do the same thing AI researchers have been working on for 3 decades. The only differnce is the W3C RDF people have a huge chip on their shoulders and refuse to see reality is dirty and messy. Trying to infer anything from dirty data is an unbounded problem. there's no getting around that.

  41. Piggy Bank (by MIT) by panck · · Score: 1

    I haven't RTFA yet, but I wanted to link to Piggy Bank, which is a Firefox plugin by the Simile MIT group, which seems to be making a large step forward in bringing the usefulness of the sematic web to the users.

    It contains a RDF engine, and allows you to install "screen scrapers" for different sites, plus it knows automatically how to read FOAF and some other ontologies that have spread on the net a little bit. When you see the "Semantic web coin" icon in your status bar, you can click on it and it will extract what semantic information it can about the given page. Using javascript or XSL based screen scrapers makes this a bit like developing for Greasemonkey.

    As examples, they have screen scrapers for Craig's List Jobs, and they can merge the location (lat/long) information pulled from that along with other info pulled from other sites and display it all on a Googlemap.

    It's just getting started, but it seems very cool.

    --
    "What thou shalt not, I shalt did!" -Bart Simpson
  42. Re: poor explanation by smartdreamer · · Score: 1
    my explanation was poor. according to the RDF spec, rdf consists of two parts: model and graph. The model represents an object, like car, cat, boat, house, etc. The graph represents the relationship between the objects, like honda->car->vehicle. The graph is suppose to allow the system to "infer" facts which are not explicitly stated. In other words, a RDF engine would be able to infer a Honda is a type of vehicle.
    I see what you mean. None the less, reasonning over hierarchies uses RDFS since hierarchy isn't trully possible in plain RDF.
    If you look at what the spec describes in terms of building the Graph, it is very similar to dependency grammar techniques. After all, both attempt to interpret data.
    Sure, there are similarities between dependency grammar techniques and RDF. But, RDF does not interpret, it describes. RDF could be used to represent such dependencies. Over RDF you can put RDFS and/or OWL to infer.
    I read the spec plenty of times, but it is still a horrible specification. RDF engines (reasoners as RDF people like to call it) are attempting to do the same thing AI researchers have been working on for 3 decades. The only differnce is the W3C RDF people have a huge chip on their shoulders and refuse to see reality is dirty and messy. Trying to infer anything from dirty data is an unbounded problem. there's no getting around that.
    IMHO, semantic web is not about bringing old IA concepts, it is more about building a framework for representing knowlegde (which is a field of IA). When this is done, you can do what you want ; plug IA or everthing else.
    I think we agree that we are really not there and only future will tell.
  43. Google watch out...Protege. by Anonymous Coward · · Score: 0

    http://www.co-ode.org/resources/tutorials/ProtegeO WLTutorial.pdf

    http://protege.stanford.edu/plugins/owl/

    BTW:
    "Slow Down Cowboy!

    Slashdot requires you to wait 2 minutes between each successful posting of a comment to allow everyone a fair chance at posting a comment.

    It's been 15 minutes since you last successfully posted a comment

    Chances are, you're behind a firewall or proxy, or clicked the Back button to accidentally reuse a form. Please try again. If the problem persists, and all other options have been tried, contact the site administrator."

  44. Re: poor explanation by Anonymous Coward · · Score: 0
    Sure, there are similarities between dependency grammar techniques and RDF. But, RDF does not interpret, it describes.

    I don't agree. Data (aka assertion) that is not explicit fact, is an interpretation of explicitly stated facts. NLP tries to label each part of a sentence into nouns, verbs, adjective, pronouns, adverbs and so on. It's goal is to form a graph that describes what the sentence means. The NLP term for the relationship is valence. Here is a random paper from google on the topic http://nats-www.informatik.uni-hamburg.de/~ingo/pa pers/tal2000.pdf.gz

    RDF Graph (not RDF schema) attempts to describe data in a way that computers don't have to "figure out" (ie parse) the subject predicate object. This is achieved through writing rules about a given topic. Dependency grammar "can be considered" techniques and frameworks for representing and generating knowledge. NLP also uses rules to process/parse sentences. In the case of NLP, it's grammatical and contextual rules.

    The two topics are much closer and have much more common than superficial similarities. One difference is that RDF does not address the issue of building knowledge through automated process. Humans have to do it. People are lazy. Ignoring the problem doesn't make the problem go away.

    I agree there's a lot more is needed to achieve the goal of semantic web, but the question is, will W3C listen to others and change? So far it hasn't.

  45. Qualify as Semantic Web ? by copdk4 · · Score: 1

    The most basic aspect for any application to qualify as a "Semantic Web" app (from SW challenge, http://www-agki.tzi.de/swc/swapplication.html) is that the application should use "some formal description of the meaning of the data" ! RDF by itself doesnt give any *meaning* or *semantics* to the data. You need to associate your RDF data to RDFS/OWL for that purpose (TAP doesnt have a published OWL ontology http://tap.stanford.edu/tap/tapkb.html)

    Also given that you dont have any 'meaning' to nodes and links in your RDF, I presume your searching again boils to 'keyword' based searching ! People find it cool to term their search as "Semantic Search" but I find it difficult to see any 'semantics' in the current application.

    1. Re:Qualify as Semantic Web ? by ngibbins · · Score: 1

      Tap doesn't appear to have an ontology (OWL or RDFS) that's published separately to the RDF data, but the RDF data files do appear to contain class definitions. In my book, that's sufficient meaning to qualify as a SW application under the rules laid down by the SW Challenge. It's certainly about as much meaning as we had in CS AKTive Space when we won the first SW Challenge in 2003.

    2. Re:Qualify as Semantic Web ? by copdk4 · · Score: 1

      SW 2003 Challenge was in October, W3C-OWL standard wasnt yet finalized (It was a Recommendation Standard in Aug 2003, http://www.w3.org/2001/sw/WebOnt/#L151) and to my knowledge no reasoner (Fact/Racer) supported full OWL/DAML+OIL reasoning. So I guess the 'semantics' aspect was not a big concern then..

      Today OWL is formalized. Several OWL based api/reasoners are in place. Using such 'RDF only' applications misguides people and the community. My only request to you all Semantic Web Gurus is to preach right message and best practices :)

    3. Re:Qualify as Semantic Web ? by ngibbins · · Score: 1

      I disagree in part. The semantics aspect was as big a concern then as now, but it's important to recognise when the representational requirements of an application require the use of OWL (or DAML+OIL), or when you can get away with the use of RDFS only. It's all about choosing the most appropriate tool for the job.

      Regarding the timing of SWC2003 and the publication of the OWL Recommendation, OWL made Proposed Recommendation in December 2003 and Recommendation in 2004 (this being largely a rubber-stamping step). However, the major building blocks of the language had been in place since the publication of the Last Call Working Draft in March 2003, and the similarity of OWL to its lineal predecessor DAML+OIL meant that most people with DAML+OIL software and ontologies were able to adapt with less effort than if they were to have started from scratch.

      For OWL to have passed Candidate Recommendation, the working group needed to demonstrate that there were sufficient implementations of the language in the form of reasoners, etc, one of which was FaCT. At that point, FaCT was a mature DL reasoner that was developed for OIL and had already been adapted for DAML+OIL.

  46. How is this different from HTML? by klatty · · Score: 2, Insightful
    The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.


    Isn't this basically what HTML is supposed to do kind of?
    1. Re:How is this different from HTML? by The+MESMERIC · · Score: 1
      Not really.
      • RDF does not display pictures
      • RDF does not contain self-styling
      • RDF does not contain scripting
      • RDF conforms to XML, HTML doesn't (xHTML does)
      • The semantics of RDF is highly different from the (much ignored) semantics of HTML.

      HTML is a "hyper-textual" document with images, objects and links.
      RDF's prime purpose is for organzing resources and creating catalogues.
  47. metacrap by braddock · · Score: 1

    Maybe the Sematic Web can work someday, maybe not.

    However, anyone who thinks this is a utopia in the making should the infamous MetaCrap essay by Cory Doctorow:

    Metacrap: Putting the torch to seven straw-men of the meta-utopia.

    After you are done reading, go to e-bay and pick yourself up a cheap Plam Pilot. :)

    1. Introduction
    2. The problems
    2.1 People lie
    2.2 People are lazy
    2.3 People are stupid
    2.4 Mission: Impossible -- know thyself
    2.5 Schemas aren't neutral
    2.6 Metrics influence results
    2.7 There's more than one way to describe something
    3. Reliable metadata

    -braddock gaskill

  48. Who'd want to see that? by imthesponge · · Score: 1

    'type "tom hanks birth" slowly to see it in action' hmm

  49. Re: Smokin crack by Anonymous Coward · · Score: 0

    have you ever tried to type chinese in windows? it is a pain in the arse compared to typing english. chinese writing system is very different than english and autocomplete is much harder. In english you have A-Z. In chinese the number of radicals ranges in the hundreds for the most common. Then when you get into the essoteric radicals, no auto complete is going to be able to handle it today.

  50. Semantic Web?-Trust Me...and Wikipedia. by Anonymous Coward · · Score: 0

    "Should I reject using the Encyclopedia Brittanica just because, someone somewhere will post false information."

    Your point would be even stronger if you had used Wikipedia instead of Britannica. There's a great deal of fannism surrounding it, even though trust issues involve both, and some of the same "It's OK" arguments likewise apply to both.

  51. meaning by tute666 · · Score: 1

    i wouldn't like, nobody, especially a machine, telling me what i mean. the possibility of censorship is enormous

  52. Re: poor explanation by Anonymous Coward · · Score: 0

    The two topics are much closer and have much more common than superficial similarities.

    Not only close, but identical from a formal pow.

    Metadata is data, the 'meta' part vanishes the moment you try to describe-it.
    Plain old philosphical knowledge, always forgotten.

  53. Link in story goes to Google by nokilli · · Score: 1

    Check out the link in the story to the Semantic Web... it's a redirect from Google to w3c.org.

    I'm sure this isn't what the submitter meant to do.

    For those who haven't been paying attention, Google has recently begun giving redirects from their own site as search results, so, in effect, they get to record every site you end up visiting.

    I was on the fence as to whether this was a good or bad thing, but now I see that clearly it's the latter, simply because when the link is copied-and-pasted, as it is here, the person who visits the link won't have any idea that Google is recording that visit.