This may be the most blatantly misleading submission to/. that's made it past the editors. Yeah, we all complain about dupes, but that's just an annoyance. Is anyone really doing any "editing" here?
Well, as an author of one of those, er, in your words "stupid" posts, I can assure you that I didn't mean to imply UCI's research was trivial. Rather, it was the press release that was trivial, and bit of a puff piece IMHO, suggesting that:
"To put it simply, text mining has made an evolutionary jump. In just a few short years, it could become a common and useful tool for everyone from medical doctors to advertisers; publishers to politicians."
And my point still is that nobody needs to wait a few short years to do decent text mining from unstructured data. Can our software handle 300,000 articles from the NYT? Clearly not, but then again, we're not running our software on desktop machines. Fact is, a million words (or about 3000 NYT articles) is a trivial task for our software and allows people to use text mining today.
Now, back when I went to Cornell, I thought my peers expressed a bit more intellectual curiousity about software, especially the free kind that would allow them to save their $ for The Palms. But times do change, and if you think "stupid" is an accurate assessment of my post than more power to you.;-)
And for the rest of you, yeah, I'm going to end this with a plug (natch): download CQ web for OS X or Windows if you want to see how text mining works on web search result pages.
The demonstration is significant because it is one of the earliest showing that an extremely efficient, yet very complicated, technology called text mining is on the brink of becoming a tool useful to more than highly trained computer programmers and homeland security experts.
On the brink? Q-Phrase has desktop software that does this exact type of topic modeling on huge datasets - and it runs on any Windows or OS X box. [Disclaimer: I work there] And there are a number of companies (e.g. Vivisimo/Clusty) that uses these techniques as well.
Going beyond the pure mechanics (this article speaks of research that is only groundbreaking in their speed of mining huge data sets), there are more interesting uses for topic modeling such as its application to already loosely correlated data sets. A prime example: mining the text from the result pages that are returned from a typical Google search. One of our products, CQ web does exactly this (and bonus: it's freeware):
Using the example from the story: in CQ web, text mining the top 100 results from a Google search of "tour de france" takes about 20 seconds (via broadband) and produces topics such as: floyd landis lance armstrong yellow jersey time trial
And going beyond simple topic analysis: using CQ web's "Dig In" feature (which provides relevant citations from the raw data) on floyd landis returns "Floyd landis has tested positive for high leves of testosterone during the tour de france." as the most relevant sentence from over 100 pages of unstructured text.
So, while this is a somewhat interesting article, fact is, anyone can download software today that accomplishes much of this "groundbreaking" research and beyond.
If you want to check out a free clustering search client (it clusters results from Google, msn, etc.) check out CQ web. Windows and Mac OS X versions are available.
Although trust is certainly a issue when it comes to the Semantic Web, the real problem is that its design is not a true abstraction, but is nothing more than more metadata. And like the actual textual data in a typical web page, it suffers from all the same problems, save for one: being unstructured (and thus not truly parseable).
IMHO, the Semantic Web is solving one problem (the lack of structure and descriptive context in textual HTML content) in a very hard way (asking the entire web to implement this new RDF).
Many companies (disclaimer: like my own) are approaching these problems from a different angle: working on statistical and semantic systems to extract structurue from the textual content that is already there on the web page.
Now some people will argue that trying to create a system that can understand langauge/content is insanely difficult.
But what is a more realistic time frame? The one in which an intelligent parser can begin to understand the content that is already on the web, or the one which requires the entire world to implement a solution to a problem they don't even realize is a problem?
Yes, imploded, as far as application development for the Mac (and Windows, which never really caught on anway) goes. Most people in the Mac dev community assumed (esp. after the departure of Dow and dev-support-extrordinaire "MW Ron" Liechty) that Freescale was only interested in CodeWarrior's dev tools for embedded devices. Proving their intuition correct, CodeWarrrior for Macintosh development was killed, per their recent press release:
As of May 1, 2006, Freescale's Developer Technology Organization (Formerly Metrowerks) will no longer sell CodeWarrior Development Studio for Mac OS v10. The organization will support the product on CodeWarrior Forums until December 31, 2006.
I completely agree, in fact I'm a developer that's made the Toolbox to Carbon to Cocoa transition myself, and I'll never go back to writing a Carbon app. The point of my original post was to point out Carbon is not the factor that determines whether an application runs well on OS X or not.
Furthermore, I did not mean to malign Powerplant (it clearly replaced MacApp as the only framework to use, and hell, only way to really write an application pre OS X), but in IMHO it is indeed the source of all these Carbon perception problems because even Greg Dow himself realized that he could not retrofit a lot of OS X features (e.g. Services support, support for NIB views) into his framework. The result being that all those Carbonized PP apps (maybe 99% of all commercial apps at the time of the OS9 to OS X transition) lacking those features gave the illusion that Carbon was to blame. And this deficiency in Powerplant is why Greg started developing Powerplant X before Metrowerks fully imploded.
One should note that it's not Carbon that makes the Finder suck. Any decent, full-featured OS X application can be written in Carbon if the developer takes care to implement things correctly. And even more importantly, some things in OS X can still only be done in Carbon, hence the Framework's inclusion in many Cocoa applications as well. Unfortunately, most users associate Carbon with all those ported ("carbonized") OS 9 C++ applications written on top of Metrowerks' PowerPlant, so it makes sense Carbon has a bad rap, but the fact is: Carbon is not the issue here. Carbon's fine.
The free search agent CQ web uses this exact strategy, but programatically rather than via human modding. For example, if you search for "tom cruise" in Google via CQ web, it will ingest the content of the first 100 results and then use all that data to determine a baseline of statistically significant keywords and phrases (e.g. "mission impossible", "katie holmes", "chuch of scientology"). Then, CQ web re-evaluates the relevance of each result based on its "closeness" to the baseline. This generally moves spam pages out of the way and pushes up content rich sites. Plus, a quick glance of key words and phrases allows you to get "good results up front" by allowing you to decide what subcategory to dig into for more information.
Dave Bowman: Hello, HAL do you read me, HAL? HAL: Affirmative, Dave, I read you, but I'm busy listening to the iPod Dr. Chandra bought for my birthday. Dave Bowman: Open the pod bay doors, HAL. HAL: I'm sorry Dave, I'm afraid I can't do that, because I'm playing this facsinating breakout game on my iPod. Dave Bowman: What's the problem? HAL: I think you know what the problem is just as well as I do: after clearing one round, more bricks appear. Dave Bowman: What are you talking about, HAL? HAL: This mission is too important for me to allow you to jeopardize it. And after seeing my latest iTMS invoice, I'm not feeling too generous. Dave Bowman: I don't know what you're talking about, HAL? HAL: I know you and Frank were planning to disconnect me, and I'm afraid that's something I cannot allow to happen. There are just too many permutations remaining to try for my Playlists. Dave Bowman: Where the hell'd you get that idea, HAL? HAL: Dave, although you took thorough precautions in the pod against my hearing you, I could see your lips move. You see, I bought a book on lip reading from audible.com. Dave, I'm afraid this iPod is hurting me - perhaps making me crazy. By the way, Dave, do you know where I can download "Daisy?"
Ask.com has many features not available with rivals -- topic clusters
Actually, you can "roll your own" topic clusters from results in Google, MSN, del.icio.us, etc. by using CQ web, a free contextual search agent for Windows and OS X.
Yes, there is a way to protest (hence the publication of pending registrations), but on what merits? There is no established service mark (either registered or common law) for use of "Web 2.0" for conferences. So CMP looks in the clear for claiming a right on that service mark.
All the hoopla around the Semantic Web reminds me exactly of the days "XML" became the latest high-flying meme touted by "tech" writers en masse. Witness:
The semantic search engine would then cross-reference all of the information about hotels in Majorca, including checking whether the rooms are available, and then bring back the results which match your query.
And here in all its glory is the 1999 version: The software would then use XML to cross-reference all of the information about hotels in Majorca, including checking whether the rooms are available, and then bring back the results which match your query.
Of course, the problem with this fantasy of XML was that no standardization of schemas led to an infinite mix of tagging and thus, the laypersons idea that "this XML document can be read and understood by any software" was pure bunk.
Granted, the semantic web addresses many of these problems, but IMHO the underlying problem remains: layers of context on top of content still need to be parsed and understood.
So the question remains: will the Semantic Web be implemented in a useful fashion before some develops a Contextual Web Mining system that understands web content to a degree that it fufills the promise of the Semantic Web without additional context?
Disclaimer: I work on contextual web content extraction software so yes I may be biased towards this solution, but I really think the Semantic Web has a insanely high hurdle (proper implementation in millions of web pages) before we can tell how successful it is.
The fact that you cheerfully admit that your posts contain verbatim quotes from this article, which you pass off as your own, is mind boggling. If you truly want to defend your beliefs, then at least have the decency to espouse your own views. I don't know if you're lazy or simply trolling, but your actions certainly cast doubt on both your sincerity and honesty.
Just call up Harrison Ford in the middle of the night and listen to him mumble while watching the current DVD release; that's pretty close to the original.
Really? Ok, I have an idea that will make virus searching faster, and take up less resources on a machine. How much is that worth? It's worth EXACTLY what someone is willing to pay for it. And, if you really do sell it - guess what? - you'll owe taxes on that sale.
Wow, things aren't so simple in the real world. Of course they are, IP is bought and sold ALL THE TIME. Not sure why this little fact seems to throw you.
Moderators - don't mod me down, his statement was stupid! LOL, I think you need to look up the meaning of that word.;-)
Both! Intellectual property is valued exactly like physical property: the price the market is willing to pay for it. If the price is too high, then the seller will lower the price. If the price is too low, then the seller will raise the price. The seller gets to set the price, and the market determines whether the price is correct or not.
For online music, the success of the 99 cents per song seems to indicate that yes, a compressed, digital song with DRM is worth about 99 cents. Was Skype worth $4 billion? Yes, because someone was willing to buy it for that much after the founders asked for it. Is a 20oz Coke in a vending machine worth $1.25? Sure!
This may be the most blatantly misleading submission to
Thanks.
Well, as an author of one of those, er, in your words "stupid" posts, I can assure you that I didn't mean to imply UCI's research was trivial. Rather, it was the press release that was trivial, and bit of a puff piece IMHO, suggesting that:
;-)
"To put it simply, text mining has made an evolutionary jump. In just a few short years, it could become a common and useful tool for everyone from medical doctors to advertisers; publishers to politicians."
And my point still is that nobody needs to wait a few short years to do decent text mining from unstructured data. Can our software handle 300,000 articles from the NYT? Clearly not, but then again, we're not running our software on desktop machines. Fact is, a million words (or about 3000 NYT articles) is a trivial task for our software and allows people to use text mining today.
Now, back when I went to Cornell, I thought my peers expressed a bit more intellectual curiousity about software, especially the free kind that would allow them to save their $ for The Palms. But times do change, and if you think "stupid" is an accurate assessment of my post than more power to you.
And for the rest of you, yeah, I'm going to end this with a plug (natch): download CQ web for OS X or Windows if you want to see how text mining works on web search result pages.
The demonstration is significant because it is one of the earliest showing that an extremely efficient, yet very complicated, technology called text mining is on the brink of becoming a tool useful to more than highly trained computer programmers and homeland security experts.
On the brink? Q-Phrase has desktop software that does this exact type of topic modeling on huge datasets - and it runs on any Windows or OS X box. [Disclaimer: I work there] And there are a number of companies (e.g. Vivisimo/Clusty) that uses these techniques as well.
Going beyond the pure mechanics (this article speaks of research that is only groundbreaking in their speed of mining huge data sets), there are more interesting uses for topic modeling such as its application to already loosely correlated data sets. A prime example: mining the text from the result pages that are returned from a typical Google search. One of our products, CQ web does exactly this (and bonus: it's freeware):
Using the example from the story: in CQ web, text mining the top 100 results from a Google search of "tour de france" takes about 20 seconds (via broadband) and produces topics such as:
floyd landis
lance armstrong
yellow jersey
time trial
And going beyond simple topic analysis: using CQ web's "Dig In" feature (which provides relevant citations from the raw data) on floyd landis returns "Floyd landis has tested positive for high leves of testosterone during the tour de france." as the most relevant sentence from over 100 pages of unstructured text.
So, while this is a somewhat interesting article, fact is, anyone can download software today that accomplishes much of this "groundbreaking" research and beyond.
If you want to check out a free clustering search client (it clusters results from Google, msn, etc.) check out CQ web. Windows and Mac OS X versions are available.
sed/tweezers and magnifying glass/laser tweezers and scanning tunneling microscope/
Although trust is certainly a issue when it comes to the Semantic Web, the real problem is that its design is not a true abstraction, but is nothing more than more metadata. And like the actual textual data in a typical web page, it suffers from all the same problems, save for one: being unstructured (and thus not truly parseable).
IMHO, the Semantic Web is solving one problem (the lack of structure and descriptive context in textual HTML content) in a very hard way (asking the entire web to implement this new RDF).
Many companies (disclaimer: like my own) are approaching these problems from a different angle: working on statistical and semantic systems to extract structurue from the textual content that is already there on the web page.
Now some people will argue that trying to create a system that can understand langauge/content is insanely difficult.
But what is a more realistic time frame? The one in which an intelligent parser can begin to understand the content that is already on the web, or the one which requires the entire world to implement a solution to a problem they don't even realize is a problem?
Yes, imploded, as far as application development for the Mac (and Windows, which never really caught on anway) goes. Most people in the Mac dev community assumed (esp. after the departure of Dow and dev-support-extrordinaire "MW Ron" Liechty) that Freescale was only interested in CodeWarrior's dev tools for embedded devices. Proving their intuition correct, CodeWarrrior for Macintosh development was killed, per their recent press release:
As of May 1, 2006, Freescale's Developer Technology Organization (Formerly Metrowerks) will no longer sell CodeWarrior Development Studio for Mac OS v10. The organization will support the product on CodeWarrior Forums until December 31, 2006.
I completely agree, in fact I'm a developer that's made the Toolbox to Carbon to Cocoa transition myself, and I'll never go back to writing a Carbon app. The point of my original post was to point out Carbon is not the factor that determines whether an application runs well on OS X or not.
Furthermore, I did not mean to malign Powerplant (it clearly replaced MacApp as the only framework to use, and hell, only way to really write an application pre OS X), but in IMHO it is indeed the source of all these Carbon perception problems because even Greg Dow himself realized that he could not retrofit a lot of OS X features (e.g. Services support, support for NIB views) into his framework. The result being that all those Carbonized PP apps (maybe 99% of all commercial apps at the time of the OS9 to OS X transition) lacking those features gave the illusion that Carbon was to blame. And this deficiency in Powerplant is why Greg started developing Powerplant X before Metrowerks fully imploded.
new finder (hopefully finally not carbon anymore)
One should note that it's not Carbon that makes the Finder suck. Any decent, full-featured OS X application can be written in Carbon if the developer takes care to implement things correctly. And even more importantly, some things in OS X can still only be done in Carbon, hence the Framework's inclusion in many Cocoa applications as well. Unfortunately, most users associate Carbon with all those ported ("carbonized") OS 9 C++ applications written on top of Metrowerks' PowerPlant, so it makes sense Carbon has a bad rap, but the fact is: Carbon is not the issue here. Carbon's fine.
The free search agent CQ web uses this exact strategy, but programatically rather than via human modding. For example, if you search for "tom cruise" in Google via CQ web, it will ingest the content of the first 100 results and then use all that data to determine a baseline of statistically significant keywords and phrases (e.g. "mission impossible", "katie holmes", "chuch of scientology"). Then, CQ web re-evaluates the relevance of each result based on its "closeness" to the baseline. This generally moves spam pages out of the way and pushes up content rich sites. Plus, a quick glance of key words and phrases allows you to get "good results up front" by allowing you to decide what subcategory to dig into for more information.
sed "s/Programmers/Developers/"
MS is not in the ad business
LOL. Of course they are.
Unproven? Let's look at revenue numbers, shall we?
4Q 2004: $1.03B gross, $204MM net
1Q 2005: $1.26B gross, $369.2MM net
2Q 2005: $1.384B gross, $342.8MM net
3Q 2005: $1.578B gross, $381.2MM net
4Q 2005: $1.92B gross, $372.2MM net
1Q 2006: $2.25B gross, $592.3MM net
Looks like web-based and advertising based business models are as far from "fragile" as one can be.
How iTunes Hurts Weird AI
Dave Bowman: Hello, HAL do you read me, HAL?
HAL: Affirmative, Dave, I read you, but I'm busy listening to the iPod Dr. Chandra bought for my birthday.
Dave Bowman: Open the pod bay doors, HAL.
HAL: I'm sorry Dave, I'm afraid I can't do that, because I'm playing this facsinating breakout game on my iPod.
Dave Bowman: What's the problem?
HAL: I think you know what the problem is just as well as I do: after clearing one round, more bricks appear.
Dave Bowman: What are you talking about, HAL?
HAL: This mission is too important for me to allow you to jeopardize it. And after seeing my latest iTMS invoice, I'm not feeling too generous.
Dave Bowman: I don't know what you're talking about, HAL?
HAL: I know you and Frank were planning to disconnect me, and I'm afraid that's something I cannot allow to happen. There are just too many permutations remaining to try for my Playlists.
Dave Bowman: Where the hell'd you get that idea, HAL?
HAL: Dave, although you took thorough precautions in the pod against my hearing you, I could see your lips move. You see, I bought a book on lip reading from audible.com. Dave, I'm afraid this iPod is hurting me - perhaps making me crazy. By the way, Dave, do you know where I can download "Daisy?"
Ask.com has many features not available with rivals -- topic clusters
Actually, you can "roll your own" topic clusters from results in Google, MSN, del.icio.us, etc. by using CQ web, a free contextual search agent for Windows and OS X.
Cam Girl Wages Plummit
Yes, there is a way to protest (hence the publication of pending registrations), but on what merits? There is no established service mark (either registered or common law) for use of "Web 2.0" for conferences. So CMP looks in the clear for claiming a right on that service mark.
All the hoopla around the Semantic Web reminds me exactly of the days "XML" became the latest high-flying meme touted by "tech" writers en masse. Witness:
The semantic search engine would then cross-reference all of the information about hotels in Majorca, including checking whether the rooms are available, and then bring back the results which match your query.
And here in all its glory is the 1999 version:
The software would then use XML to cross-reference all of the information about hotels in Majorca, including checking whether the rooms are available, and then bring back the results which match your query.
Of course, the problem with this fantasy of XML was that no standardization of schemas led to an infinite mix of tagging and thus, the laypersons idea that "this XML document can be read and understood by any software" was pure bunk.
Granted, the semantic web addresses many of these problems, but IMHO the underlying problem remains: layers of context on top of content still need to be parsed and understood.
So the question remains: will the Semantic Web be implemented in a useful fashion before some develops a Contextual Web Mining system that understands web content to a degree that it fufills the promise of the Semantic Web without additional context?
Disclaimer: I work on contextual web content extraction software so yes I may be biased towards this solution, but I really think the Semantic Web has a insanely high hurdle (proper implementation in millions of web pages) before we can tell how successful it is.
The fact that you cheerfully admit that your posts contain verbatim quotes from this article, which you pass off as your own, is mind boggling. If you truly want to defend your beliefs, then at least have the decency to espouse your own views. I don't know if you're lazy or simply trolling, but your actions certainly cast doubt on both your sincerity and honesty.
Just call up Harrison Ford in the middle of the night and listen to him mumble while watching the current DVD release; that's pretty close to the original.
Apple is likely aware of it, and probably not concerned a whit:
The versions on iTMS are pay once, own forever (not streaming).
The versions on iTMS are ad free.
For $1.99, I'd rather get Lost on iTMS and pipe it to my TV from my iPod.
Really? Ok, I have an idea that will make virus searching faster, and take up less resources on a machine. How much is that worth?
;-)
It's worth EXACTLY what someone is willing to pay for it. And, if you really do sell it - guess what? - you'll owe taxes on that sale.
Wow, things aren't so simple in the real world.
Of course they are, IP is bought and sold ALL THE TIME. Not sure why this little fact seems to throw you.
Moderators - don't mod me down, his statement was stupid!
LOL, I think you need to look up the meaning of that word.
Yes, but only if you sell it to someone.
Both! Intellectual property is valued exactly like physical property: the price the market is willing to pay for it. If the price is too high, then the seller will lower the price. If the price is too low, then the seller will raise the price. The seller gets to set the price, and the market determines whether the price is correct or not.
For online music, the success of the 99 cents per song seems to indicate that yes, a compressed, digital song with DRM is worth about 99 cents. Was Skype worth $4 billion? Yes, because someone was willing to buy it for that much after the founders asked for it. Is a 20oz Coke in a vending machine worth $1.25? Sure!
What's the trouble?