Semantic Web Under Suspicion

It's cool! by crazyjeremy · 2006-05-25 01:21 · Score: 2, Funny

"Semantic web" might make it easier for HackerBillyBob to find a potential identity fraud victim's information. So basically, HackerBillyBob can get dumber and dumber but do more and more damage. Fortunately the good side of this is PhisherBillyBob can decrease his R&D time because SemantiGoog will give him thousands of ACTIVE email addresses EACH AND EVERY MORNING.

--
Funnypics

Re:It's cool! by Anonymous Coward · 2006-05-25 06:39 · Score: 0

More importantly, how is this "Semantic Web" going to help me find more porn that I like.

If it isn't going to make it easier to look up red heads in high heels and lab coats, I don't really see the point.

Porn drives the internet.

All Talk by eldavojohn · 2006-05-25 01:23 · Score: 5, Informative

So I know a lot of people that get all excited when they read articles on the "semantic web."

I think that we are all missing some very important aspects of what it takes to make something capable of what they speak of. In all the projects I have worked on, to create something geared toward this sort of solution, you need two things: training data & a robust taxonomy.

First things first, how would we define or even agree on a taxonomy? By taxonomy, I mean something with breadth & depth that has been used and verified. By breadth I mean that it must be capable of normalization (pharmacetical concoctions, drugs & pills are all the same concept), stemming (go & went are the same action, dog & dogs are the same concept) and also important is how many tokens wide a concept can be. By depth I mean that we must be able to define specificity and use it to our advantage (a site about 747s is scored higher than a site about airline jets which is scored higher than a site about planes). By rigorous I mean that it must be tried and true ... you start with a corpus of documents to "seed" it and have experts (or web surfers) contribute little by little until it is accurate. Oh, it must also be able to adapt quickly and stay current.

Without a taxonomy, how will we index sites and be able to tell between "water tanks" and "panzer tanks." I think that this is one of the great things that Google is missing to really improve its searching abilities. If you suggest an ontology to replace it, the problems encountered in developing it only multiply.

Where is the training data? Well, one may argue that the web content out there will suffice as training data but I think that more importantly, they need collections of traffic for these sites and user behavioral patterns to quickly and adequately deduce what the surfer is in need of.

I feel that these two aspects are missing and the taxonomy may be impossible to achieve.

Why are we even concerned with security if we can't even lay the foundations for the semantic web? I would argue that once we plan it out and determine it's viable, then we concern ourselves with the everyone's rights.

--
My work here is dung.

Re:All Talk by RobotWisdom · 2006-05-25 01:37 · Score: 2, Informative

I agree. My own (universally ignored) proposal for the taxonomy problem starts with person, place, and thing as 'elements' and builds complex ideas as compounds of these: [faq]
Re:All Talk by Who235 · 2006-05-25 01:48 · Score: 1

I think a taxonomy could be culled from Wordnet, or some other similar smantic project.

http://wordnet.princeton.edu/
Re:All Talk by $RANDOMLUSER · 2006-05-25 01:48 · Score: 4, Interesting

I've always thought that the Table of Contents for Roget's Thesaurus was one of the greatest works of mankind. I don't think many people realize just how difficult the problem really is, and how long it's going to take.

--
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
Re:All Talk by Who235 · 2006-05-25 01:51 · Score: 3, Funny

Of course how would it handle typos like "smantic"?
Re:All Talk by mytrip · 2006-05-25 01:57 · Score: 1

>Without a taxonomy, how will we index sites and be able to tell between "water tanks" and "panzer tanks." >I think that this is one of the great things that Google is missing to really improve its searching >abilities. If you suggest an ontology to replace it, the problems encountered in developing it only >multiply ISO 2788 and a few other specifications talk about how to do a multilingual thesaurus. I'm working on doing one namely for geography and such. Computational Linguistics is the field you want to study if you're serious about it.

--
Contrary to popular belief, Unix is user friendly. It just happens to be particular about who it makes friends with.
Re:All Talk by Trigun · 2006-05-25 02:03 · Score: 3, Funny

By pointing them out and laughing at them? Seems to work around here.
Re:All Talk by InsertCleverUsername · 2006-05-25 02:10 · Score: 1

I would have to agree that, although the idea is fascinating, implementing it would be a gargantuan effort. And it's unclear how difficult it would be to maintain.

I think it might be easier to approach the problem from another direction. Once a semantic A.I. like Cyc has reached a level at which it can begin categorizing and "understanding" the information on the Web, it could do the enormous chore of creating a semantic web for us.

--
Ask me about my sig!
Re:All Talk by Temposs · 2006-05-25 03:08 · Score: 2, Informative

As another reply mentions, WordNet is a promising avenue of success for creating a taxonomy and an ontology for the web(just read a paper on ontologizing semantic relations using WordNet, actually). In fact, it already is a taxonomy of sorts(and a multi-dimensional one at that), although a generalized one. And there are multitudinous other projects building off of WordNet and paralleling WordNet.
There's VerbNet, FrameNet, Arabic WordNet, and probably others I don't know about.
WordNet has become a standard for working with semantic relations computationally these days. It works by storing all known senses of every dictionary word, and each sense has links to other words based on how it's semantically related(synonym, antonym, hyper/hyponym, meronym, troponym, cause, is_a, morphological derivative, etc...)
There's not any model that can compete with it currently, and it's widely accessible and very easy to use. As this tool improves, so will the semantic web.

--
Knowledge is just opinion that you trust enough to act upon. -Orson Scott Card
Re:All Talk by jim_mcneely · 2006-05-25 03:25 · Score: 1

The idea, as I understand it, is that you can transfer data formatted as XML within a certain grammar, such as RSS or some others defined within technology and industry niches, or make up your own. Then you don't need to base your own systems on this grammar, because they have provided transformation tools like XSL to transform a known grammar to the grammar your system requires. XML provides the taxonomy, which can be defined however the data source decides. I believe they are hoping for, and beginning to see, islands of agreed upon grammars arise which can be used in many places with different grammar expectations using transformation. The main problem with this is that XSL is extremely arcane, which is a barrier IMHO to widespread acceptance.
Re:All Talk by SubRosa · 2006-05-25 03:52 · Score: 1

Couldn't statistical analysis be used effectively? I've been playing with crm114 lately (a "Markov based" filter) for a backup to Spam Assasin on my servers. Once trained, it's ability to pick out spam is almost uncanny at times.
The context of a word seems to me (obviously not a math of CS geek) to be a good, and relatively easy to calculate, indicator of the word's relation to other terms. By context I mean the "physical" proximity to other terms on a page, rather than the normal written language context.

--
Better living through obfuscation. Project White Noise
Re:All Talk by Poltras · 2006-05-25 04:38 · Score: 1

Semantic slashdot, where everyone can typo and be laughed at.

--
Of Code And Men
Re:All Talk by Phillip2 · 2006-05-25 04:54 · Score: 1

You're being too negative about things; we don't have to define an ontological representation for everything in the entire world, for the ontology to have use.

If we can help to define standards for some part of knowledge then we have helped the world a little bit, which is a better place than we started off.

As for how we do it, well, there is lots of experience around the world at doing this. Check out the Dewey Decimal system, or the Library of Congress classification. If you want something bit, then SNOMED might be an example. Or the Gene Ontology. Or, if you are feeling brave, even Cyc.

If you worry about disambiguation of every word ever, then you are going to get depressed. If you just worry about part of the world, then you too could be like me, getting on quite happily using semantic web technology, as part of the solution, to some of the worlds problems.

Perhaps, I just lack in ambition.

Phil
Re:All Talk by rickwood · 2006-05-25 09:17 · Score: 1

Hey, man, you're not universally ignored, it's just that there isn't much agreement about what the root of the ontology should be. For at least a decade I've been asking anybody I thought might have an answer worth hearing, "What are the properties of the uber-object?" Never gotten the same answer twice. This isn't really suprising as how people view and categorize the world is greatly effected by their experiences and knowledge.

For what it's worth, I can think of two reasons you feel univerally ignored. First, I think people don't look past your "example ontology" to see the underlying sense of the fractal thicket and why it's a good idea. Second, I feel like ontological-taxinomic-semantic AI people are looked down upon by people who think that only neural nets and axiomatic reasoning is "Real AI." Sadly, these are the people who seem to control the funding for research.

My main goal for now is an ontological database, i.e. a database that isn't made up of columns and rows, but stores data by concept in a given ontology and allows searches based on the strength of the connections between concepts.

I find it fascinating that you and I seem to have thought along the same lines. Many times in the last ten years I've sat down to create a schema for the "Uber-Ontology of the Theory of Everything, including kittens, and this time I mean it." Got notebooks filled with 'em. I have shelves filled with Hofstader and Minsky and dozens of others. Never have found the answer. It does make me feel better that at least I'm not the only one asking the question.
Re:All Talk by Anonymous Coward · 2006-05-27 11:46 · Score: 0

See also S. R. Ranganathan's Colon Classification http://en.wikipedia.org/wiki/Colon_Classification. It's not that we haven't got taxonomies, it's just getting agreement on one (and solving the problems mentioned previously).

Smarter Machines by jekewa · 2006-05-25 01:24 · Score: 4, Interesting

I personally fear the day that a machine or algorithm can better determine the purpose for my keyword-based search than I can. Sure, there's a lot of improvement that can be done to make the searches more precice, but certainly in the end it'll be my decision what's important and what isn't.

What I really want to see is the search engine reduce the duplicated content to single entries (try Googling for a Java classname and you'll see how many Google-searched websites have the API on them), or order them by reoccurrance of the word or phrase giving the context more value than the popularity of the page.

--
End the FUD

Re:Smarter Machines by Irish_Samurai · 2006-05-25 01:41 · Score: 5, Insightful

What I really want to see is the search engine reduce the duplicated content to single entries (try Googling for a Java classname and you'll see how many Google-searched websites have the API on them), or order them by reoccurrance of the word or phrase giving the context more value than the popularity of the page.

There is a huge problem with this, and it goes back to the days of people jamming 1000 instances of their keywords at the bottom of their pages in the same fant color as the background. Also, your desire to rate the pages on context requires an ontology type algo, which is NOT easy. Google has been working on this for a little while now, but it is a big hill to climb. They are using popularity as a substitution for this. It is not the most effective, but it is a pretty decent second option.

There is another issue with the approach you suggest. If Google decides that javapage.htm is the end all be all of JAVA knowledge, and removes all other listings from their database - then everyone and their grandmother will be fed information from this one source. That will ultimately reduce the effectiveness of Google to return valid responses to people who do not use search like a robot.

There is a human element at play here that Google is attempting to cater to through sheer numbers. Not everyone knows how to use search properly, hell most people have no idea. Keyword order, booleans, quotes - these will all affect the results given back, but very few people use them right off the bat. If you reduce the number of returned listings for a single word search to one area that was detirmined to be the authority, you have just made your search engine less effective in the eyes of the less skilled. I would be willing to bet that this less skilled group composed most of Googles userbase.

If you don't cater to these people, then you lose marketshare, and then you lose revenue from advertisers, and then you go out of business.
Re:Smarter Machines by der+wachter · 2006-05-25 01:48 · Score: 1

Didn't Turing say "A sonet written by a machine, is best appreciated by another machine"?
Re:Smarter Machines by suv4x4 · 2006-05-25 02:34 · Score: 0, Flamebait

I personally fear the day that a machine or algorithm can better determine the purpose for my keyword-based search than I can.

I fear the day where typing on an electronic device will produce better looking text and typography than me painstakingly painting every letter and produce one book a year.
Re:Smarter Machines by Toba82 · 2006-05-25 03:22 · Score: 1

Not to mention the fact that the 'one true page' could be wrong. Or incomplete. I personally often read 3 or 4 of the results if I actually care about what I'm searching for.

--
I pretend to know more than I really do by mooching off google and wikipedia.
Re:Smarter Machines by dan+the+person · 2006-05-25 04:54 · Score: 1

I personally fear the day that a machine or algorithm can better determine the purpose for my keyword-based search than I can. [...] in the end it'll be my decision what's important and what isn't.

So you'd prefer google just return all pages in it's index with your keywords in them, in a random order, and let you go through the 3 million results and look for the important ones by hand?

The symantic web is all about allowing you to more precisely specify your keywords. More precise search results then follow from more precise keywords.

So when you search for "michael moore" you can specifiy michael the director of film, and not michael the directory of the WTO.
Re:Smarter Machines by AnyoneEB · 2006-05-25 09:41 · Score: 1

I don't think the solutions to the many API pages problem is simply not listing the copies. I think it would be more like the current limitation on how many hits from the same site Google will normally display. Just have a link "Show more sites with the same content." Not similiar content, identical content. Although, determining that is difficult becuase formatting is different and sites may have their own navbars or headers/footers.

--
Centralization breaks the internet.
Re:Smarter Machines by t35t0r · 2006-05-25 15:07 · Score: 1

There is another issue with the approach you suggest. If Google decides that javapage.htm is the end all be all of JAVA knowledge, and removes all other listings from their database - then everyone and their grandmother will be fed information from this one source. That will ultimately reduce the effectiveness of Google to return valid responses to people who do not use search like a robot.

Unfortunately that is exactly what is happening today.
Re:Smarter Machines by PhraudulentOne · 2006-05-26 00:56 · Score: 1

There is another issue with the approach you suggest. If Google decides that javapage.htm is the end all be all of JAVA knowledge, and removes all other listings from their database - then everyone and their grandmother will be fed information from this one source. That will ultimately reduce the effectiveness of Google to return valid responses to people who do not use search like a robot.

You could just thread the result. If you did a search for a certain java class, and it turned out a whack of pages with the same content, why not just thread the rest underneath. This way you could quickly see that these 40 results are all from the offical Java manual, but they only take up one line on your main search result page.

--
You create your own reality - Leave mine to me.

This is already in place, just not on the web. by tinkertim · 2006-05-25 01:29 · Score: 0, Offtopic

I can think of at least a dozen chain drug stores who have a retail store evey 10 blocks in every major city in the country. Their sales are recorded centrally.

Hypothetically, if all of them decided it would be for the good of humanity to allow someone to examine their sales in real time as a whole to identify flu outbreaks early - then the process of doing that would not be too difficult.

UPS and Fed Ex track their packages in real time, know who sent them and who is receiving them and how much it weighs.

Data about us is being collected more than I'd care to think about, and I think its inevitable that it become centralized. We may be rapidly approaching a point where we decide what we want, privacy or technology. Having both doesn't seem like its a viable option for very much longer.

Re:This is already in place, just not on the web. by tinkertim · 2006-05-25 01:39 · Score: 1

I mention this :

UPS and Fed Ex track their packages in real time, know who sent them and who is receiving them and how much it weighs.

Because in and of itself how much my package weighs doesn't amount to a hill of beans. However if I know a natural disaster recently struck an area, and found some more "harmless" data to add to my filter, I can tell how much replacing of stuff via insurance claim people do on-line, and some other very interesting things.

I just thought I'd clarify :)

NSA goes public by packetmon · 2006-05-25 01:29 · Score: 1

Say what? Sounds similar to what the Bush administration via the NSA is doing only on a public level. For those who want to ramble on about privacy and abuse take note that just about every other week some company has lost their records, or someone has infiltrated their networks and gained access to records. If that's not enough to make you throw in the towel when it comes to protecting your privacy, never ever apply for a credit card, never sign a document, never reveal anything about yourself to anyone. What most fail to realize when complaining about these things is that hardly anyone takes the time to read the Terms of Service agreements. Whether its via purchasing something online, or paying your T-Mobile bill. If one did, one would take note at craftily worded crud a vendor created spelling out how they plan to share your information with others. So who is to blame when you bought something from Company A who sold your info to Company B and Company B loses your info or dishes it out. Or... Who will you blame when Company A who promised not to disclose your info is bought by Company B who made no promise to... As for privacy, get over it... Its diminishing slowly because you the people are allowing it.

--
Infiltrated dot Net

A bit confused? by Anonymous Coward · 2006-05-25 01:31 · Score: 0

OK, I'm a bit confused by this statement from the article:

Depending on which sources the semantic web references, it could gather together health records, lists of recent purchases or even contact details, that would build a more complete picture of a person than ever before.

"All of this data is public data already," said Mr Glaser. "The problem comes when it is processed."

Huh, health records being public? Since when? Wouldn't that violate HIPPA? And AFAIK, retailers would never make purchase information free for the taking, so it's hard to imagine them allowing a search engine to start mining that data (not that they couldn't work out a deal were they get a piece of the ad revenue that a search may produce).

communication and marketing? by DawnArdent · 2006-05-25 01:35 · Score: 1

"I personally fear the day that a machine or algorithm can better determine the purpose for my keyword-based search than I can."

I wonder how this will influence our language and communication in general. Can language itself (not only its use) assimilated by marketing?

I shudder at the thought of 'marketspeak'...

It's already happening... by gravyface · 2006-05-25 01:38 · Score: 4, Insightful

...and growing and evolving.

Take a look at the "blogosphere" and the tagging/classification initiative that's happening there.

Sure, it seems crude and unrefined but it's working, like most grass-roots initiatives do when compared with grandiose "industry standards" and the big, bulky workgroups that try to define them.

--
body massage!

The idea is to make the web intelligent by Jon+Luckey · 2006-05-25 01:39 · Score: 2, Funny

Obligitory Skynet joke in

5...4...3...2...1

--
-- 3 events that reshaped the world in the 20th century: WW1, WW2, and WWW

Biz School by Doc+Ruby · 2006-05-25 01:50 · Score: 2, Insightful

"Big business, whose motto has always been time is money"

That motto is really "anything for a buck". Even if business has to wait or waste time to get money, it will wait until the cows come home - then sell them.

--

--
make install -not war

Re:Biz School by Doc+Ruby · 2006-05-25 14:59 · Score: 0, Offtopic

Moderation -1
100% Troll

TrollMods must get paid in dollars, because they certainly don't have sense.

--
--
make install -not war

Semantic Web ~- evil by tbriggs6 · 2006-05-25 01:51 · Score: 5, Informative

The article does a pretty bad job at explaining the situation. The idea behind the Semantic Web is simply to provide a framework for information to be marked up for machines rather than human eyes. The idea is that using an agreed upon frame of reference for the symbols contained in the page (an ontology), agents are able to make use of the information contained there. Further, an agent can collection data from several different ontologies and (hopefully) perform basic reasoning tasks over that data, and (even better) complete some advanced tasks for the agent's user.

The article would have us believe that this is going to expose everyone to massive amounts of privacy invasion. This is not necessarily the case. It is already the case that there are privacy mechanisms to protect information in the SW (e.g. require agents to authenticate to a site to retrieve restricted information). Beyond simple mechanisms, there is a lot of research being conducted on the idea of trust in the semantic web - e.g. how does my agent know to trust a slashdot article as absolute truth and a wikipedia article as outright fabrication (or vice versa).

As for making the content of the internet widely available, some researchers feel this will never happen. As another commenter noted that it is essential that there is agreement in the definition of concepts (ontologies) to enable the SW to work (if my agent believes the symbol "apple" refers to the concept Computer, and your agent believes it refers to "garbage", we may have some interesting but less than useful results). I am researching ontology generation using information extraction / NLP techniques, and it is certainly a difficult problem, and one that isn't likely to have a trivial problem (in some respects, this is goes back to the origins of AI in the 1950's, and we're still hacking at it today).

For some good references on the Semantic Web (beyond Wikipedia), check out some of these links

Re:Semantic Web ~- evil by reblace · 2006-05-25 02:00 · Score: 1

I'm glad someone finally portrayed an accurate image of what the semantic web really is.
Re:Semantic Web ~- evil by Mayhem178 · 2006-05-25 02:10 · Score: 1

Thank you very much for saving me a few minutes of typing. This is a very accurate description of what the semantic web is intended to be. I thought for a minute that I was actually gonna have to dig up that buried knowledge from college, when I actually took a class based on the principles of the semantic web. Our final project was a website (using our own personally created ontology) that searched a database of Magic: The Gathering cards and could infer from even the most rudimentary search query which card(s) you were looking for with a surprisingly high success rate.

Mod parent up, up, up.

--
"You will pay for your lack of vision..." - Emperor Palpatine to Ray Charles
Re:Semantic Web ~- evil by Irish_Samurai · 2006-05-25 02:18 · Score: 1

I have done a little bit of casual research into ontology and have a question maybe you could answer.

Is it possible to have a markup structure that could handle this issue by searching for a "secondary key" bit of information to qualify the identifier? Using your example above of "apple":
<Item> <Primary ID>Apple</Primary ID> <Supplementary Id>Computer</Supplementary Id> <Supplementary Id>Power Book</Supplementary Id> </Item> <Item> <Primary ID>Apple</Primary ID> <Supplementary Id>Garbage</Supplementary Id> <Supplementary Id>Granny Smith</Supplementary Id> </Item>
I know the code is ugly, and probably incorrect, but its just for an example.

Using a structure such as this couldn't two pieces of information cross reference each other and evaluate their relevancy to each other based on the "supplementary" data? Since the two pieces of "Apple" data have nothing else in common, they must not be related at all. And as my limited understanding of xml structures leads me to believe, couldn't two non-identical (one has more or less children than another) XML entities still compare data with each other if they at least agreed on the heirarchy and naming convention, allowing items with greater description still be matched to those with lesser descriptions based on major characteristics?
Re:Semantic Web ~- evil by tbriggs6 · 2006-05-25 02:38 · Score: 2, Informative

Ontologies for the Semantic Web are based on description logics (OWL-DL) or first-order logics (Owl-Full). We define classes and their relationships (T-Box definitions), and we define instance assertions (A-Box definitions).

For example, we could define the Apple domain as :

Classes: Computer, Garbage, ComputerMfg
Roles: makesComputer computerMadeBy

We can assign the domain of makesComputer to be a ComputerMfg, and the range to be a Computer (the inverse would be flipped).

Class rdf:ID="Computer"

Class rdf:ID="Garbage"

Class rdf:ID="ComputerMfg"

ObjectProperty rdf:ID="computerMadeBy, domain rdf:resource="#Computer", range rdf:resource="#ComputerMfg", inverseOf rdf:ID="makesComputer",

ObjectProperty rdf:about="#makesComputer", domain rdf:resource="#ComputerMfg", range rdf:resource="#Computer", inverseOf rdf:resource="#computerMadeBy"

Nothing about Apple yet. So,

We can assert that "APPLE" is a ComputerMfg (not Garbage), and that it is related to the symbol PoweerBook by the makesComputer / computerMadeBy relationship.

Computer rdf:ID="PowerBook", computerMadeBy ComputerMfg rdf:ID="APPLE"
makesComputer rdf:resource="#PowerBook"

So, using the Semantic Web (as it stands) requires crisp description logics, and admits (almost) no ambiguity. For those who want to pick at me, yes, OWA and UNA make things a little strange.

Given that natural language is fraught with uncertainty, this is the root of the automatic ontology generation problem (and the beginning of my research).

Pfff, the problem is marketing by SmallFurryCreature · 2006-05-25 01:54 · Score: 4, Insightful

Lets use the holiday example giving in the article. So I got a hotel that is 54 dollars per night. That means I am not going to be included in the below 50 dollar search. Hmmm, I don't want that. I want maximum exposure. So I lower my price to 49 dollars + 10 dollars in extra fees that are a suprise when you receive the bill (what you say? 49+10 > 54? Offcourse you idiot, any price cut must be offset by higher charges elsewhere.)

You could already do this semantic web nonsense if people would just stick to a standard and be honest with what they publish.

Nobody wants to do that however. Mobile phone companies always try to make their offering sound as attractive as possible by highlighting the good points and hiding the bad ones. Phone stores try to cut through this by making their own charts for comparing phone companies but in turn try to hide the fact that they get a bigger cut from some companies then others.

It wouldn't be at all hard to set up a standard that would make it very easy to tell what cell phone subscription is best for you. Getting the companies involved to participate is impossible however.

This is the real problem with searching the web right now. It wouldn't be at all hard to use google today if everyone was honest with their site content. For instance, removed the word "review" from a product page if no review is available.

Do you think this is going to happen anyday soon? No, then the semantic web will not be with us anyday soon either.

--

MMO Quests are like orgasms:

You may solo them, I prefer them in a group.

Who Web? by geobeck · 2006-05-25 02:05 · Score: 2, Funny

How many people read this and thought "Okay, what have they done with Norton now?"

--
Find environmentally and socially responsible products on http://buy-right.net

Someone is being hysterical... by FellowConspirator · 2006-05-25 02:10 · Score: 1

The term "semantic web" generally refers to Tim Berners-Lee's notion of "semantic web" -- he coined the term. TBL's vision is simply a model of describing information. One expression of it is RDF-XML, another is data in N3 notation, but the core of it is the idea that you can express most information as a simple triplet of data: subject-predicate-value (e.g.: the sky - is - blue). That's basically it. It doesn't even have to have anything to do with "the Web" in the sense of the Internet.

The idea, however, is that you can represent information, and even information about that information, as a graph (where the object is itself the subject of another triple). If everything is uniformly presented, there's a slew of common operations you can perform on it, and merging data, making inferences from it, and such becomes easier.

The "semantic web" could be used to represent social networks (the apparent use case that is cause for hysteria), but it is not necessary and might not even be the best approach. Certainly, "semantic web" doesn't equate to social networks. For me, the greatest value is a method to store information about biological systems -- but then that's the field I work in. It could be used to add useful metadata to the web, it could be useful for making taxonomies and ontologies, all sorts of things.

However, the "semantic web" is itself about as sinister in nature as plain text.

This is actually insightful by Flying+pig · 2006-05-25 02:11 · Score: 1

Semantic web and growing databases will just cause people to elect to disappear behind a cloud of confusion. Many Google searches are already useless because of dishonest advertisers; Godwin's Law says that eventually the web will become useless because bad information will flood out good information. Meanwhile, the technically literate will continue to hide themselves with anonymity and usage patterns that are deliberately inconsistent and disruptive. Eventually the only people the government and the advertisers will know about is the people too poor or stupid to take steps - and they are not the people whose data is wanted.

Although I'm not particularly paranoid, I have no loyalty cards, only one credit card, frequently pay with cash and never borrow. In the UK, other people are registering cars at accommodation addresses so they cannot be located by speed trap and congestion charge operators, renting to avoid local government records, and generally finding ways to disappear.

--
Pining for the fjords

Re:This is actually insightful by TobascoKid · 2006-05-25 03:12 · Score: 1

Although I'm not particularly paranoid, I have no loyalty cards, only one credit card, frequently pay with cash and never borrow.

It's because of people like you that we're getting identity cards. Would it have killed you to join Tesco's loyalty programme? ;-)

renting to avoid local government records

How does renting help? You still need to be on the electoral register and you need to pay council tax - in both cases it doesn't matter if you're renting or owning. And what about TV Licencing? Not having a TV License is almost as likely to land you on a government database as having one is.

generally finding ways to disappear

Good luck to them, but life as a blank has got to suck, and it's just going to get worse as time goes by.

--
At some point, somewhere, the entire internet will be found to be illegal.
Re:This is actually insightful by Anonymous Coward · 2006-05-25 10:52 · Score: 0

And what about TV Licencing? Not having a TV License is almost as likely to land you on a government database as having one is.

And there I'm apparently listed as "The Legal Occupier", if the futile threats & bluster the TV licensing computer prints out and sends to me every month are any indication. It almost tempts me (apart from the waste of money) to get a TV licence for a year as "His Holiness the Pope", just for the letters they'd send on non-renewal.

Healthcare? by Anonymous Coward · 2006-05-25 02:12 · Score: 1, Informative

The guy claims that health records are public data? Well, that's a BBC site, but in the U.S. they decidedly are not, since HIPAA was passed.

But all this semantic web stuff makes me giggle when they start talking about healthcare, anyway. I worked in that industry up until a couple years ago. Semantic web people want to move everybody away from EDI...while the healthcare people are struggling to upgrade to EDI. In 2003 I was setting up imports of fixed-length mainframe records. By the time healthcare is exchanging RDF over the Web, we'll all have nanobots in our blood and won't need doctors anymore anyway.

Re:Healthcare? by SixDimensionalArray · 2006-05-25 04:27 · Score: 1

I was interested that you posted about the healthcare industry, because I work in it today, and also went to a university which has done quite a bit of research into the area of health & bio informatics. From the research, it is clear that the semantic web and healthcare are actually a great match for each other, particularly when it comes to things like concepts & ontologies (for example, check out MeSH if you haven't seen it before).

Another example of how semantics make sense for healthcare is in identifying how to properly code a particular diagnosis or procedure. For example, if I know that going to visit your doctor in the office is a procedure represented by the number 99213, I need to be able to figure that out when I send in my medical bill. Here's a hierarchy that might help in this scenario:
General Example: Where->Who->How->What
Specific Example: In a doctor's office->by a doctor->in person examination->99213

However, good will only come of this if semantic technologies ever make the leap from research to practice for ordinary healthcare professionals.

As for EDI, it is a crime that such poor standards exist today (the HIPAA X12 transaction set springs to mind), and that a technology like XML markup exists, but is not the basis of our standards. XML has a lot of overhead over say, your fixed width files, but it sure makes parsing this data and finding columns and fields a lot easier!

SixD
Re:Healthcare? by Anonymous Coward · 2006-05-25 05:35 · Score: 0

Oh, I agree, I'm just skeptical that the mainframe-based insurance companies we used to work with will actually implement anything like this in the forseeable future...although there were also a couple using XML, so maybe there's some hope. But getting a connected system, as opposed to standalone software that uses this stuff internally...that'll be a long time. Nobody wants to spend money changing until the government makes them do it, and we just changed to X12.

(Speaking of which, I wrote a parser for X12...took some work, but at least it was systematic, and a reasonably simple grammar...not as plug-and-play as XML, but better than those arbitrary fixed-width files!)

The next great leap by truthsearch · 2006-05-25 02:20 · Score: 1, Insightful

The next great leap in searching the web won't be due to the semantic web. It'll be natural language processing. Soon the day will come when you will be able to type in a "real" question and truely get the best answers back. We all know keyword searching doesn't cut it. But a complete question can be interpolated to a logical query. It'll require no change to current web pages. Just a much smarter search engine.

--
Developers: We can use your help.

Semnatic Web vs. Contextual Web Mining by saddino · 2006-05-25 02:26 · Score: 2, Insightful

All the hoopla around the Semantic Web reminds me exactly of the days "XML" became the latest high-flying meme touted by "tech" writers en masse. Witness:

The semantic search engine would then cross-reference all of the information about hotels in Majorca, including checking whether the rooms are available, and then bring back the results which match your query.

And here in all its glory is the 1999 version:
The software would then use XML to cross-reference all of the information about hotels in Majorca, including checking whether the rooms are available, and then bring back the results which match your query.

Of course, the problem with this fantasy of XML was that no standardization of schemas led to an infinite mix of tagging and thus, the laypersons idea that "this XML document can be read and understood by any software" was pure bunk.

Granted, the semantic web addresses many of these problems, but IMHO the underlying problem remains: layers of context on top of content still need to be parsed and understood.

So the question remains: will the Semantic Web be implemented in a useful fashion before some develops a Contextual Web Mining system that understands web content to a degree that it fufills the promise of the Semantic Web without additional context?

Disclaimer: I work on contextual web content extraction software so yes I may be biased towards this solution, but I really think the Semantic Web has a insanely high hurdle (proper implementation in millions of web pages) before we can tell how successful it is.

Re:Semnatic Web vs. Contextual Web Mining by Peter+Mork · 2006-05-25 03:07 · Score: 3, Interesting

The semantic web is a step up from XML. In an XML document, there is a great deal of information implicitly stored in the structure of the document. A human is (often) able to guess what the implied relationship is between the parent element and the child element, but machines are still poor at guessing. By making the relationship explicit (using RDF) a machine has a better chance of identifying the nature of the relationship. Of course, you still need standard tags, but it's easier to talk about named relationships rather than tacit relationships. (And my dissertation revolved around building Semantic Web infrastructure in a peer-to-peer setting.)
Re:Semnatic Web vs. Contextual Web Mining by apathy+maybe · 2006-05-25 12:40 · Score: 1

XML documents can be read everywhere. But two things need to happen, they need to have the doctype (DTD whatever it is) avaliable to the software, and some styling (CSS for example) that enables the document to be displayed nicely. If everyone did this (assuming that the XML is well formed) it would be wonderful. XHTML is XML and can be read everywhere, the DTD is avaliable freely, and CSS is either included with each document or webbrowsers have a default.

--
I wank in the shower.

OT THANKS by Irish_Samurai · 2006-05-25 02:54 · Score: 1

Thanks for the information. A quick search for the W3C references on OWL fleshed out alot of what you were saying, and let me know know where I was getting off track.

Well by aftk2 · 2006-05-25 02:55 · Score: 2, Informative

The semantic web would have to be feasible before it posed some sort of threat, so I wouldn't get too up in arms about this.

--
concrete5: a cms made for marketing, but strong enough for geeks.

Re:Well by Anonymous Coward · 2006-05-25 03:43 · Score: 0

A rebuttal.

Practical Applications??? by vrochette · 2006-05-25 03:03 · Score: 0, Troll

I've been trying to look for practical applications of Semantic Web. Can't find them anywhere. So far, what W3C proposes is a very high-level language. Of course, DAML,OWL... have nice features like cardinality constraints, axioms and so so forth. The idea is to build more logic into webpages. This is all theory. In practice, I don't really see how this will help making the web smarter. You can't expect people to write OWL by hand--like in the early days of HTML. We still lack an automated way of building taxonomies, and deriving the document's context and logical links with all other WWW documents. That's anyway the purpose of search engines, and so far Google is doing a pretty nice job at it. The next step is to build The Inference Engine, moving closer to A.I. Obviously we're not there yet. I think People who have the answer to this would reach the holy grail of computing. Not too mention being super rich!

Re:Practical Applications??? by Anonymous Coward · 2006-05-25 03:53 · Score: 1, Interesting

There are plenty of OWL editing tools, besides which most people won't be writing their own ontologies anyhow.

There are already lots of inferencing engines, too - Sesame, cwm, etc. It's really not a big deal; the whole point of RDF is that the architecture makes this stuff easy.
Re:Practical Applications??? by Anonymous Coward · 2006-05-25 04:15 · Score: 1, Interesting

There are plenty of OWL editing tools, besides which most people won't be writing their own ontologies anyhow. There are already lots of inferencing engines, too - Sesame, cwm, etc. It's really not a big deal; the whole point of RDF is that the architecture makes this stuff easy.
CWM sucks big time. Just go ask the semantic web researchers out there how aweful it is and poorly it scales. In fact, google and see what results you find.

Oy Vey! by Itninja · 2006-05-25 03:13 · Score: 1

At first glace I thought this was about Semitic Web.....
Now THAT would be something.

--
I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.

outrage !! by TTL0 · 2006-05-25 03:13 · Score: 1

this is anti-semantic. does the ADL know about this ?

--
Sanity is the trademark of a weak mind. -- Mark Harrold

symantec web? by jweller · 2006-05-25 03:14 · Score: 1

Oh good. They are finally going to include anti-virus on the web.

Glass Houses by Baavgai · 2006-05-25 03:16 · Score: 4, Insightful

"All of this data is public data already," said Mr Glaser. "The problem comes when it is processed."

The privacy and security concerns are bizarre. They're saying that there is currently an implicit "security through obscurity" and that's ok. However, if someone were to make available data more easily found, then it would be less secure?

Here's a radical thought; don't make any data public you don't want someone to see. Blaming Google because you put your home address on your blog and "bad people" found you is absurd. If data is sensitive it shouldn't be there now.

You can't really bitch about peeping Tom's if you built the glass house.

Re:Glass Houses by Narphorium · 2006-05-25 07:46 · Score: 1

The Semantic Web isn't about data, its about metadata. Metadata is often automatically extracted from the data itself and made available without any interaction from the user. So its not that fact that your data is published on the web that makes it a security risk, its the fact that a bunch of automatically generated metadata was added to it, possibly without your knowledge. Think of all the trouble that has come from Word document metadata being put on the web.
The problem is that no one is willing to manually annotate their data because time required usually outweighs the benefits so the only way to make the Semantic Web work is with tools that automatically annotate data as its processed.
Re:Glass Houses by Baavgai · 2006-05-25 23:37 · Score: 1

a bunch of automatically generated metadata was added to it, possibly without your knowledge. Think of all the trouble that has come from Word document metadata being put on the web.

I'm not sure that this is the gist of the article I read, but it is an interesting thought.

That's more under the heading of the cute, insecure ideas that just wont die. Why does a web server tell me it's name and build number? Why does web publishing software include the registered user's name in the meta data? It's value added but security daft.

The crux of the problem is that development is antithetical to security. A software designer wants to provide features that make the user happy and make their product more desirable to use. The problem is, every feature has the potential to be a hole and some of the more desirable features are the biggest holes.

I would agree that web publishing tools should have personal meta data turned off be default. Would be nice if they stuck in some useful Dublin core stuff, though... ;)

Already took care of it by alienmole · 2006-05-25 03:28 · Score: 1

I sent a terminator unit back in time to kill all the Slashdotters who were about to make an obligatory Skynet joke.

I have a chapter on SW in my new book by MarkWatson · 2006-05-25 03:38 · Score: 2, Informative

I am both a sceptic and a fan of the SW. I dislike XML serialization of RDF (and RDFS and OWL) - to me the SW is a knowledge engineering task and frankly 20 year old Lisp systems seem to offer a friendlier notation and a much better working environment. If you are a Lisp-er, check out the (now) open source Loom system that supplies a descriptive logic reasoner and other goodies.

The Protege project provides a good (and free) editor for working on ontologies - you might want to grab a copy and work through a tutorial.

I think that the SW will take off, but its success will be a grass roots type of effort: simple ontologies will be used in an adhoc sort of way and the popular ones might become defacto standards. I don't think that a top-down standards approach is going to work.

I added a chapter on the SW to my current Ruby book project, but it just has a few simple examples because I wanted to only use standard Ruby libraries -- no dependencies makes it easier to play with.

I had a SW business idea a few years ago, hacked some Lisp code, but never went anywhere with it (I tend to stop working on my own projects when I get large consulting jobs): define a simple ontology for representing news stories and writing an intelligent scraper that could create instances, given certain types of news stories. Anyway, I have always intended to get back to this idea someday.

Going around the utopy by Hekanibru · 2006-05-25 03:49 · Score: 1

The ideal Semantic Web is a beautiful dream, nothing more. Sure it'd be nice to have it around to start worrying about the consequences regarding security, privacy and everything else, but the ideal conceptualization seems to be impossible to implement.

The Semantic Web I see coming is one where many different, limited domains are identified and semantically annotated, allowing some kind of agents to perform well-defined activities for us (i.e., book tickets, make appointments, search info, etc.). This sounds more reasonable but although feasible, it is not an easy thing to do, for several very interesting challenges arise. Hence, the task of the research community should be to make this limited Semantic Web possible.

Uhmm... by C10H14N2 · 2006-05-25 03:54 · Score: 1

The "Semantic Web" is basically the intersection of RDF+OWL, that is to say, it is entirely about taxonomy. The whole idea is that you have a certain nomenclature that you assert against known values, someone else has a different nomenclature that they assert against the same values. You can now cross-reference with a high degree of confidence. For example, using the Dublin Core.

I get people all the time dismissing the whole idea because "man, you'd have to agree on definitions" or "how does 'it' know?" Right. "It" doesn't unless it is explicitly told. If what you call a "House" is in a well-known schema, you simply add an equivalency in your schema et voila, une maison est une 'House.' So, someone else comes along and they want to assert that "'ein Haus ist ein 'maison'," so they assert against the previous schema, and now implicitly ein Haus=une maison=a House. No one had to make the last assertion as it was implicitly true from the previous assertions. So now, in your schema, you make all sorts of categorical assertions about other things relating to houses. Your French and German counterparts now have them for free, as do you theirs. Yes, it takes work, no it isn't completely automatic, yes it is limited to strict taxonomies, but it is still very, very powerful.

SKYNET vs "Intelligent web" .. by Entropy · 2006-05-25 03:57 · Score: 2, Funny

The thing is, SKYNET was a military based computer and it gave us "Judgement Day".

I dare hypothesize that if a truly intlligent web ever arose, it would have a strong porn background.

I shudder to think of what it's version of Judgement Day would be ..

--
The sea changes color, but the sea does not change.

Lie to Telemarketers! by ldheinz · 2006-05-25 04:01 · Score: 1

All this cross-referencing of data for the purpose of data mining (this appears to be simply a refining of data mining) is why I lie my butt off to telemarketers. Only by filling their collective coffers with garbage can we hope to undo their efforts. Currently, I'm a white male computer engineer when doing politically correct things, I'm a hispanic female for others, and I have many other identities. I pick identities at random for activities that I don't want traced, and I keep them active by doing strange things for the hell of it. Incidentally, I've been doing this since the mid-seventies when I registered for college as a black man. I changed races several times in college, as black student groups kept insisting that I join up. They get downright pushy about it. I think I ended up an American Indian - oops, Native American. What's it called this week?

books, newspapers, magazines are dangerous too by dbc001 · 2006-05-25 04:11 · Score: 1

I hope you people all understand that books, magazines, and newspapers are just as dangerous - anyone can publish private information using these technologies!
Come on, this is absurd. If anything this article underscores the need for privacy laws - but the privacy implications of the semantic web are hardly any more significant than any other publishing method.

It's the protocols, silly by Anonymous Coward · 2006-05-25 04:15 · Score: 0

The key problem is that we're bound by the inherent limitations of HTTP/HTML, which were developed as generic mechanisms for serving hypermedia. Within the parameters of hypertext, free text search with a nifty ranking mechanism is about as good as it gets, and Wikipedia is the killer app.

We need new protocols for richer information. How about a multimedia interaction protocol? Commerce/finance could probably use its own protocol. An "office productivity" protocol would be better than current efforts to shoehorn the MS Office filetypes into hypertext browsers. Machine-parseable semantic information may be a good extension of HTTP/HTML, but the waters have been muddied by the overuse of those protocols for information that doesn't really fit.

A heterogeneity of protocols and clients, customized for information types, would reduce the need for monolithic security fixes. It would be good for the software industry and -- in the long run -- good for users. The hypertext browser, with all its interface assumptions, has somehow been turned into the do-everything program and that model is not serving anyone well.

Big Business Already Has This: by Rude+Turnip · 2006-05-25 05:05 · Score: 1

"Big business, whose motto has always been time is money, is looking forward to the day when multiple sources of financial information can be cross-referenced to show market patterns almost instantly"

That's the Bloomberg service in a nutshell. Yes, same company founded by current NYC mayor, Michael Bloomberg. As an example, I was able to simultaneously examine various financial ratios of about 1,200 companies along with their current market values. Depending upon where certain ratios went, I flagged them as candidates for a mailing of my company's financial consulting services. I was even able to use the Bloomberg system to download the names and addresses (office, not home of course) of the CEO and CFO of each company I flagged as a mailing list candidate.

--
Bill Clinton: Pimp we can believe in. - The Shirt!!!

Re:Big Business Already Has This: by Anonymous Coward · 2006-05-25 07:44 · Score: 0

I don't think full bloomberg service is free to the public. It's available if you pay for a license or bloomberg terminal.

The OTHER massive issue by SamSim · 2006-05-25 06:54 · Score: 2, Insightful

The huge, glaring issue with the Semantic Web idea that I see is: how do you force the creators of web content to put the right semantic tags on their content? What's to stop there being thousands of sites full of nothing but semantic tags so that even Swoogling for "747" brings up porn first? The clear answer is that the tags will have to be out of the control of the creators of the web content. That means somebody or someTHING else - namely, your Semantic Web search engine of choice - has to figure out your site's tags for you. And the ONLY way to accurately judge, classify and rank a web page is by its actual, real content. This is just another way of looking at the same problem. I'm waiting to be impressed.

--
qntm.org

You're kidding by kahei · 2006-05-25 08:55 · Score: 1

There's still people out there doing that stuff? That's too much! Good luck, semantic web dudes!

N.B. The above is a flippant, snide, and unhelpful comment. However, in my defence, I submit that that is _exactly_ the sort of comment that any remaining semantic web diehards should be most used to hearing.

--
Whence? Hence. Whither? Thither.

79 comments