When you say, "no data should. .." does that mean you think viruses shouldn't work?
Don't get me wrong, I think most OSs are far to open to damage and while everyone focuses in on networks, the real danger are drivers. But still, blaming the OS for some company that probably spent millions of dollars to figure out how to screw with properly written device drivers so you couldn't play a CD on your computer. The parallels really are far closer to virus writers.
If it were physically compatible it would be screwing up the firmware now would it?
The better analogy is Sony knowingly placing virus on their CDs that would attack iMacs.
Yeah - next thing you'll know people will be reading articles about them and then spending their time ripping on people pointing out mistakes! They'll actually visit some web site several times a day to read up on this, how much money the new Star Wars will cost the economy and then dumb crap like whining about something called "Open Source" that only a geek would understand or worse yet give a shit about.
Second amendment vs. First amendment. Why is one fine to limit while the other isn't?
The fact is that *all* our rights can be regulated somewhat, so long as our basic rights aren't eliminated. Thus it is legal to restrict sawed off shotguns and various extreme forms of pornography.
Now with the videogames, I think that some limits are fine (just as with any right) so long as law abiding citizens are able to exercise their basic right.
The idea that our rights ought to have no limits at all seems difficult to sustain. And if you do take that position (say with the 1st ammendment) for consistency you really have to apply it to all.
The company I work with has a summarization library that does this. Pricing depends upon how you use it. I know that they've made fairly good deals for educational uses. It was more designed for writing automated abstracts, but it does an amazingly good job on news sources as well.
Obvious caveats apply - i.e. I work for them and helped write the thing. However if you are needing that sort of thing or something more particular contact Lextek
Often you can mix bits of formal systems with bits of statistical systems. Depending upon what you need, it can get you quite a ways. Of course formal structure (besides being problematic philosophically) is pretty much beyond anything we could conceive of writing. However you can do things like write a statistical part of speech tagger and then use those structures to find direct objects. Tricks like that often are very helpful in mining data.
This works because resumes have a structure. The structure varies a fair bit and is somewhat vague in implementation, but it is there. Consider the problem akin to finding word breaks in text if you weren't given such things. Obviously a slightly different problem, but the reason we can solve it is because there is structure to what you are looking for. (I bring it up just because that's the problem I'm working on at work)
Concepts and so forth are far more unstructured. Consider the problem of finding all references to Apple executives. Now you can get part way there with complex queries. But somehow you have to take some information (say executive names gleaned from connection to terms about executives near terms related to the company name) and then use that info to define spaces in a text or information in text to get you further information. That is a much more complex problem than simply tagging text with XML or so forth. The final output might possibly be taggable. However generating that final output involves many intermediate steps that require complex views of both terms and space.
You end up requiring a way of querying documents so that you can use complex boolean and ranked queries and complex notions about position and space ranges. Thus you might have a complex boolean query that finds all terms with a certain rank (to do fuzzy match or more complex notions of belonging to a set). Then with those results you create a region and then use those regions for further calculations.
My caveat for all this is that I did work on a project for Lextek International (Lextek.com) that did do all this. So I'm somewhat biased. Probably no one here (given the Open Source nature of things here) would likely be a client. So hopefully I can say all this without anyone thinking I'm just tooting my own horn. Besides - I hardly ever see anything on slashdot I can actually say anything about.
As I mentioned elsewhere in this discussion, the problem with XML is that it must be fully nested. This is, for many types of unstructured data, a horrible situation. The problem is that when mining for data you often don't have the structure but are creating the structure. This relates various contexts in ways that don't fit the requirements of an XML topology. An example of this is relating pages to paragraphs. Paragraphs aren't always nested within pages. One structure can cross the borders of the other structure.
However once you have some structures (say basic linguistic units like sentences, words, paragraphs, pages, speakers, etc.) you can then create other ones. From those structures you can then use various techniques to develop more informtion.
Once again, great in theory, complex in practice. However many of the issues used in NLP to understand words can then be expanded for larger units of meaning. Further you can then start to relate various types of contexts. Of course how helpful all this is relates to the type of analysis you are making. Some practical problems are very solvable now. Other problems are more complex.
But consider some future "Google" which indexes pages based not on words but on concept spaces. It then uses other methods, such as the links to a page and so forth, to rank not just pages but concept *spaces* within a page. Finding information would be much, much more helpful.
Actually depending upon the kind of data you are mining, XML is very poor for this. Consider a simple structure that exists in every book. You have pages, paragraphs, authors, quotes, and so forth. The problem is that different blocks are not always within other blocks. (i.e. nested, the way inner loops are always nested in a programming language) Instead a paragraph block can be half in one page block and half in an other.
That doesn't sound like a big problem, but it can be when you are using regions to map out new concepts. (i.e. analyze a class of words in all sentences that contain the concept of Apple computer) In practice writing "concepts" to analyze (data mine) texts of this sort is very hard. Further using tools like Perl can be a pain. Yeah you can do it, but you probably won't do it well.
I know that the company Sageware which I have dealt with does what this article describes. However it supplies various "objects" for mining for concepts. It ends up being tricky stuff which is why mainly large portals use the technology.
The basic notions can apply to Perl or simple C code. Go very complex though and things get messy very quickly.
While a lot of the added value of Napster was browsing (trying out songs and genres you didn't know much about) most songs are ones you already know about. So for this device you'd simply have a list of key words to search for and download.
I think that is overstating things, but yea, that's the basic idea. Why get caught in a quagmire with our people dying when it doesn't even affect us? There are hundreds of places around the world that are in bad shape. Most likely even if we went in it wouldn't do any good. (Does anyone really think the Afghan government will be a pillar of democracy and human rights 5 years from now?)
So unless there is some compelling interest along with the humanitarian issues we shouldn't get involved. It isn't that self-interest is the sole reason but neither should humanitarianism be the sole reason.
Actually Bush already is championing fuel cells. He killed one of Clinton's plans for more fuel efficient cars and is promoting more fuel cell research. Fuel cell research really is quite a ways along and is a very exciting solution. Everyone has been quoting large costs, but that is because it is still largely R&D and there are no economies of scale. The big issue is gettting hyrogen to fuel stations.
Babylon 5 I can understand. Twin Peaks I can understand. (Great DVD by the way) However I don't understand people buying DVDs of shows still in syndication that are on EVERY NIGHT. i.e. MASH, Simpsons, etc. Aren't you sick of those? I mean Twin Peaks was enjoyable because it hadn't been on in 10 years and other than a brief run on Bravo (which I don't have) hasn't been shown. What's the point of Star Trek DVDs when it seems like you can't turn on the TV at night without one of the 5 different Trek shows on somewhere.
When you say, "no data should. . ." does that mean you think viruses shouldn't work?
Don't get me wrong, I think most OSs are far to open to damage and while everyone focuses in on networks, the real danger are drivers. But still, blaming the OS for some company that probably spent millions of dollars to figure out how to screw with properly written device drivers so you couldn't play a CD on your computer. The parallels really are far closer to virus writers.
If it were physically compatible it would be screwing up the firmware now would it? The better analogy is Sony knowingly placing virus on their CDs that would attack iMacs.
Yeah - next thing you'll know people will be reading articles about them and then spending their time ripping on people pointing out mistakes! They'll actually visit some web site several times a day to read up on this, how much money the new Star Wars will cost the economy and then dumb crap like whining about something called "Open Source" that only a geek would understand or worse yet give a shit about.
Second amendment vs. First amendment. Why is one fine to limit while the other isn't? The fact is that *all* our rights can be regulated somewhat, so long as our basic rights aren't eliminated. Thus it is legal to restrict sawed off shotguns and various extreme forms of pornography. Now with the videogames, I think that some limits are fine (just as with any right) so long as law abiding citizens are able to exercise their basic right. The idea that our rights ought to have no limits at all seems difficult to sustain. And if you do take that position (say with the 1st ammendment) for consistency you really have to apply it to all.
Obvious caveats apply - i.e. I work for them and helped write the thing. However if you are needing that sort of thing or something more particular contact Lextek
Often you can mix bits of formal systems with bits of statistical systems. Depending upon what you need, it can get you quite a ways. Of course formal structure (besides being problematic philosophically) is pretty much beyond anything we could conceive of writing. However you can do things like write a statistical part of speech tagger and then use those structures to find direct objects. Tricks like that often are very helpful in mining data.
Concepts and so forth are far more unstructured. Consider the problem of finding all references to Apple executives. Now you can get part way there with complex queries. But somehow you have to take some information (say executive names gleaned from connection to terms about executives near terms related to the company name) and then use that info to define spaces in a text or information in text to get you further information. That is a much more complex problem than simply tagging text with XML or so forth. The final output might possibly be taggable. However generating that final output involves many intermediate steps that require complex views of both terms and space.
You end up requiring a way of querying documents so that you can use complex boolean and ranked queries and complex notions about position and space ranges. Thus you might have a complex boolean query that finds all terms with a certain rank (to do fuzzy match or more complex notions of belonging to a set). Then with those results you create a region and then use those regions for further calculations.
My caveat for all this is that I did work on a project for Lextek International (Lextek.com) that did do all this. So I'm somewhat biased. Probably no one here (given the Open Source nature of things here) would likely be a client. So hopefully I can say all this without anyone thinking I'm just tooting my own horn. Besides - I hardly ever see anything on slashdot I can actually say anything about.
However once you have some structures (say basic linguistic units like sentences, words, paragraphs, pages, speakers, etc.) you can then create other ones. From those structures you can then use various techniques to develop more informtion.
Once again, great in theory, complex in practice. However many of the issues used in NLP to understand words can then be expanded for larger units of meaning. Further you can then start to relate various types of contexts. Of course how helpful all this is relates to the type of analysis you are making. Some practical problems are very solvable now. Other problems are more complex.
But consider some future "Google" which indexes pages based not on words but on concept spaces. It then uses other methods, such as the links to a page and so forth, to rank not just pages but concept *spaces* within a page. Finding information would be much, much more helpful.
That doesn't sound like a big problem, but it can be when you are using regions to map out new concepts. (i.e. analyze a class of words in all sentences that contain the concept of Apple computer) In practice writing "concepts" to analyze (data mine) texts of this sort is very hard. Further using tools like Perl can be a pain. Yeah you can do it, but you probably won't do it well.
I know that the company Sageware which I have dealt with does what this article describes. However it supplies various "objects" for mining for concepts. It ends up being tricky stuff which is why mainly large portals use the technology.
The basic notions can apply to Perl or simple C code. Go very complex though and things get messy very quickly.
Nah. It just means it is ok to clone porn stars. . .
While a lot of the added value of Napster was browsing (trying out songs and genres you didn't know much about) most songs are ones you already know about. So for this device you'd simply have a list of key words to search for and download.
What's wrong with sock puppets and herding cats? Funny stuff if you ask me.
So unless there is some compelling interest along with the humanitarian issues we shouldn't get involved. It isn't that self-interest is the sole reason but neither should humanitarianism be the sole reason.
Actually Bush already is championing fuel cells. He killed one of Clinton's plans for more fuel efficient cars and is promoting more fuel cell research. Fuel cell research really is quite a ways along and is a very exciting solution. Everyone has been quoting large costs, but that is because it is still largely R&D and there are no economies of scale. The big issue is gettting hyrogen to fuel stations.
Babylon 5 I can understand. Twin Peaks I can understand. (Great DVD by the way) However I don't understand people buying DVDs of shows still in syndication that are on EVERY NIGHT. i.e. MASH, Simpsons, etc. Aren't you sick of those? I mean Twin Peaks was enjoyable because it hadn't been on in 10 years and other than a brief run on Bravo (which I don't have) hasn't been shown. What's the point of Star Trek DVDs when it seems like you can't turn on the TV at night without one of the 5 different Trek shows on somewhere.