No, I just did a search for "semantic web" and copied and pasted the first result. I didn't realize they were sending people throught google.com/url now; it used to just go straight there. When did they start doing that?
I'm not sure our sponsors in the military and intelligence agencies would fund research in breast sizes. They're kind of a sensitive bunch. But maybe I'll stick it in a proposal and see what happens.
I think the comment that semantic web research has focused on logic such as query analysis, comparisons, and groupings is fair for the Semantic Web in general.
For Search on TAP we don't have a lot of people or resources. Despite that, I spend an awful lot of time generating data. The compressed RDF, which we've made available for people to play with, is over a hundred megabytes.
If the Semantic Web is going to happen, there needs to be a lot of data, so we're doing everything we can to make that data available for people to use.
An automated technique that could do better than a human tagger would have an additional feature of being able to pass the Turing Test.
I admire your faith in automated techniques, since the ones I've seen have a catastrophic error rate and can't provide particularly rich data. The state of the art there is constantly improving, though, and there's no reason why such algorithms can't generate RDF anyway. The Semantic Web is about file formats and conventions, it doesn't necessarily mean human tagging.
For instance, at the lab here we work with IBM researchers who created the UIMA framework, and with some of the people who did WebFountain. The UIMA framework people that we work with dump us their data in two forms, a big OWL file, and a database that contains information from the extractors about where in the text each piece of information came from.
This theConcept tool you link to, at a casual glance, looks similar to Yahoo's recent Y!Q beta. I haven't put the two next to each other to see how they compare, though, so I could be off in the weeds.
I replied to a lower-scored post with this question that we haven't had this problem yet, but that it's a problem that exists with any technique, whether it's Wikipedia, and automated technique like WebFountain, or the Semantic Web. It's an Internet problem.
A followup to this post mentioned using a web of trust to counteract spam. That's something that Guha has done a lot of work with, and Paulo is working in the lab here on some prototypes based on movie data.
Spam is a problem I would love to have because it would mean that people are serious enough about the Semantic Web to find something to gain in spamming it.
Phrasing it that way, that it works best with any standards compliant browser, doesn't get the point across to those who think IE is a standards compliant browser.
Search on TAP has been tested with Firefox on Linux, Windows, and OS/X, and with IE on Windows. I think Andy might have tried it with Safari. I haven't tested it with Opera. With IE, I had to redo how the dynamic HTML was being generated twice to get around its limitations, and it's still ignoring my alignment tags.
Saying it works with standards compliant browsers assumes the reader knows that IE sucks, which isn't always the case.
Besides, I'm ex-Netscape, we're supposed to cheese people off with our browser rah-rahs.
We haven't really dealt with the spam problem because it's a problem we'd love to have. Right now there's so little content that we can afford to only pick the highest quality sites.
The automated techniques like those WebFountain uses are susceptible to the same problems, as is Wikipedia, so I'm not convinced that this is necessarily a Semantic Web problem as much as an Internet problem.
Logical reasoning is currently primitive and definitely overrated. We don't use OWL. The reasoning we do is very primitive, and is not of the sort that Clay Shirky is talking about. I actually agree with the thrust of his essay, despite the flaws that others have pointed out.
TimBL has talked about the Semantic Web as less a thing of logic and more like a giant database. I think that characterization has some problems also, but it's closer to what Search on TAP is doing.
Normally I would agree with you, but we added autocomplete for a very real reason.
As a prior post pointed out, the most important problem with the Semantic Web is getting people to generate data. Until that happens on a widespread basis, the data coverage will always be spotty compared to a keyword engine.
We added the Autocomplete dropdown in response to user feedback that they had no idea what was in the system until they hit "enter", and by then it was too late. The dropdown gives immediate feedback about whether or not the system has a clue.
The current autocomplete implementation is clumsy and harrasses the user with some garbage that it shouldn't, and it's also missing an important feature (suggesting properties to add to class requests), but since this is research we get to play around with ideas that might not pan out.
No, I just did a search for "semantic web" and copied and pasted the first result. I didn't realize they were sending people throught google.com/url now; it used to just go straight there. When did they start doing that?
I'm not sure our sponsors in the military and intelligence agencies would fund research in breast sizes. They're kind of a sensitive bunch. But maybe I'll stick it in a proposal and see what happens.
I think the comment that semantic web research has focused on logic such as query analysis, comparisons, and groupings is fair for the Semantic Web in general.
For Search on TAP we don't have a lot of people or resources. Despite that, I spend an awful lot of time generating data. The compressed RDF, which we've made available for people to play with, is over a hundred megabytes.
If the Semantic Web is going to happen, there needs to be a lot of data, so we're doing everything we can to make that data available for people to use.
An automated technique that could do better than a human tagger would have an additional feature of being able to pass the Turing Test.
I admire your faith in automated techniques, since the ones I've seen have a catastrophic error rate and can't provide particularly rich data. The state of the art there is constantly improving, though, and there's no reason why such algorithms can't generate RDF anyway. The Semantic Web is about file formats and conventions, it doesn't necessarily mean human tagging.
For instance, at the lab here we work with IBM researchers who created the UIMA framework, and with some of the people who did WebFountain. The UIMA framework people that we work with dump us their data in two forms, a big OWL file, and a database that contains information from the extractors about where in the text each piece of information came from.
This theConcept tool you link to, at a casual glance, looks similar to Yahoo's recent Y!Q beta. I haven't put the two next to each other to see how they compare, though, so I could be off in the weeds.
I replied to a lower-scored post with this question that we haven't had this problem yet, but that it's a problem that exists with any technique, whether it's Wikipedia, and automated technique like WebFountain, or the Semantic Web. It's an Internet problem.
A followup to this post mentioned using a web of trust to counteract spam. That's something that Guha has done a lot of work with, and Paulo is working in the lab here on some prototypes based on movie data.
Spam is a problem I would love to have because it would mean that people are serious enough about the Semantic Web to find something to gain in spamming it.
Phrasing it that way, that it works best with any standards compliant browser, doesn't get the point across to those who think IE is a standards compliant browser.
Search on TAP has been tested with Firefox on Linux, Windows, and OS/X, and with IE on Windows. I think Andy might have tried it with Safari. I haven't tested it with Opera. With IE, I had to redo how the dynamic HTML was being generated twice to get around its limitations, and it's still ignoring my alignment tags.
Saying it works with standards compliant browsers assumes the reader knows that IE sucks, which isn't always the case.
Besides, I'm ex-Netscape, we're supposed to cheese people off with our browser rah-rahs.
We haven't really dealt with the spam problem because it's a problem we'd love to have. Right now there's so little content that we can afford to only pick the highest quality sites.
The automated techniques like those WebFountain uses are susceptible to the same problems, as is Wikipedia, so I'm not convinced that this is necessarily a Semantic Web problem as much as an Internet problem.
Logical reasoning is currently primitive and definitely overrated. We don't use OWL. The reasoning we do is very primitive, and is not of the sort that Clay Shirky is talking about. I actually agree with the thrust of his essay, despite the flaws that others have pointed out.
TimBL has talked about the Semantic Web as less a thing of logic and more like a giant database. I think that characterization has some problems also, but it's closer to what Search on TAP is doing.
I think I'll lock the door so the IS department can't find me.
There's a coral cache of the static content, including screenshots, if you can't get through to my melted pile of servers.
Normally I would agree with you, but we added autocomplete for a very real reason.
As a prior post pointed out, the most important problem with the Semantic Web is getting people to generate data. Until that happens on a widespread basis, the data coverage will always be spotty compared to a keyword engine.
We added the Autocomplete dropdown in response to user feedback that they had no idea what was in the system until they hit "enter", and by then it was too late. The dropdown gives immediate feedback about whether or not the system has a clue.
The current autocomplete implementation is clumsy and harrasses the user with some garbage that it shouldn't, and it's also missing an important feature (suggesting properties to add to class requests), but since this is research we get to play around with ideas that might not pan out.