Markov models have the same problem as any other metric: they don't take context into account. And that includes competence and performance of the author: a French coder will make different mistakes than an English theatre journalist, and there are no corpora on which to train the model. There's no money for that. This is just going to be another cheap hack with very little benefit and potentially huge costs.
I worked on spelling and grammar checking, and I can assure you it's far from easy. The errors most grammar checkers can find reliably will not interfere with general understanding. Other errors are very hard to reliably detect and correct; illiterate language is almost impossible to correct.
E.g., let's take an example from that page (whose link has mysteriously disappeared when I clicked on Reply): "There is an antique store on Camden Avenue." Suppose I made a mistake and wrote: "*Their is an antique store on Camden Avenue." If you would take a simple metric such as Levenshtein (the minimal number of changes needed to turn one word into another), the optimal correction would be: "Theirs is an antique store on Camden Avenue."
And this example: "*They're are many documents that are used in investigations." could seen as a duplication of "are" and corrected to: "They are many documents that are used in investigations." You would really need to understand the context to know which one is correct.
No, this is a Bad Idea. Just because you can is not good enough.
I mean, how much did this thing cost? $9 billion? And it was way over budget. They couldn't add one or two supercomputers?
Plus, it's an ugly experiment to begin with. Just throwing more and more energy at it to see if there might be a glimpse of a particle that might or might not exist. And if we don't find it, we won't even know for sure it doesn't exist. There's no beauty in it.
So police is ok, but education and health is not? And libertarians are also know for their dislike of too many laws. Which of the current laws would you abolish?
Anyway, in a normal democracy you are perfectly free to choose whatever you want. You just have to pay a bit of taxes for that service. Is that really so bad? Do you really want to send the poorest back to dying in the slums because you want all your money for yourself?
And, mind you, you probably would be poor yourself. Unless you happen to come from a powerful and wealthy family, you would be relegated to the underclass. You wouldn't have money to worry about, nor any freedom, since the police wouldn't care for it. Go read some Charles Dickens.
That's even worse, and it's not even essays (as the OP states). So the teacher has to provide the program with the "correct" anwers? That can be a pretty long list. It's probably going to be good at rating mediocrity...
I'm pretty sure no program is capable of this (and I've got a PhD in natural language processing). They might be able to check for a couple of easily scored factors, such as number of words, and consistency between paragraphs, but I'm pretty sure that there is no program that could distinguish between an essay and the same essay messed up to base reasoning on false assumptions. I think someone left out a pretty important assumption: such programs might be able to score fairer (meaning: with less bias!), provided the students did their best.
Except no-one would pay for the scientists getting their education in the first place. The world would very, very quickly become a dumb and hard place, run by tribal war lords. What? You think people are good and are interested in what you care about? Forget it. They want to have your car, your house, and your wife. And without police, they'll have it.
Wanna see how absence of government works? Go to Sudan.
Welcome to libertarian heaven, a.k.a. humanitarian hell.
I pity you for the number of replies that apparently only read up until the point where they get excited, and conveniently overlook the last part, the Anonymous Cowards...
I was afraid of such a response. XML documents can be stored in a relational database. Either in a neat structure, with linked tables and such, or just as a BLOB, in case you don't care. And then there is Oracle's XML datatype, which allows you to write queries using XPATH.
As I wrote in another comment, I love the performance and scalability of NoSQL, but this is really not an argument in favor of them.
It's not that the data cannot be stored, we've got billions and billions of rows. I just cannot get 30M rows returned from the server in a couple of seconds; that takes ages under Oracle, whereas Hypertable returns them very, very quickly. E.g. fast enough to generate reports on the fly.
If you're willing to spend a lot of money, perhaps. But if you want to run on the cheap, where you've got more time than money, NoSQL database provide excellent performance. I'm working with an Oracle Enterprise installation, and it just cannot compete with the 30M rows that the Hypertable db on my laptop can serve up in a couple of seconds. Perhaps if the DB guys threw a load of money at it, Oracle could do that too. However, using Hypertable means that you have to write a lot of extra code. That's not good for business in the long term. But for a start? Or for throwaway data (I'm thinking Twitter)?
Java can also be used in Oracle (the db), but PL/SQL works better. It still is very messy though. At this moment, I prefer to do the data massaging in Java and keep the sprocs to a very bare skeleton. A bit of aggregation ok. But where I work now, complex reporting stuff is done in PL/SQL and that is not a nice sight.
I've got about 20 years experience in C++. I installed the first preprocessor on a Sun workstation (there's that Oracle connection again) if memory serves, and it didn't work well. Stroustrup certainly made a mess out of it, multiple inheritance and all. And then he scorned the most elegant thing in the language, templates.
Objective-C is mainly a mac thing, but it does compile on other platforms. It's a pretty interesting language. And rather efficient. However, there is nothing comparable to Tomcat/JBoss/... or Active Mqueue or any of that stuff that makes Java so versatile. On the other hand, perhaps there is, but it might be expensive. Objective-C totally rules for GUI development though.
And I mock you for your love of Python! A language where your editor's tab settings can change the meaning!! A language without semicolons!!! Everyone know that the only language without semicolons was (visual) basic and that is a bad thing. By association, Python must be bad.
Java 1.1 I/O... there's something to be improved. Or Calendar, my goodness. Thank The Force for JodaTime.
With respect to frameworks: I've found that Objective-C(++) scales pretty well, and is very, very friendly in terms of multithreading. But I agree that Java has an edge in ease of deployment. And I quite like the braces and semicolons...
Anyway, Java is not Oracle's product, just as MySQL really isn't. These were conceived and grown elsewhere, just to end up in the portfolio of a company that took 8 versions just to improve SQL*Plus a bit...
I can't believe you didn't point out that going with Oracle could also have meant doing everything in Java or using AutoVue or Oracle Linux (http://www.oracle.com/us/technologies/linux/index.html)
They should have gone with Oracle? Why? I work with that expensive cr*p, and it can't perform its way out of an open box. They can't have that much db dependent software anyway. Just plug in a compatibility layer and use something fast under it. I guess this is only news because it's facebook and facebook is worth so much money.
You're right: how the hell do they define unconsciousness without defining consciousness? And then they define it as the process that happens when you get anaesthesized? That's all about adding drugs to suppress brain functions. No wonder they find something like this.
Apart from the methodological errors that plague such studies. 3D imaging from EEG is at least 10 years old now, and relies on assumptions about the electrical structure of the places where the signal originates, and can only see part of the brain. Then the question becomes: how do you interpret all these different 3D high speed measurements? Apparently, they just found a qualitative difference while looking at the images. That's limited.
Anyway, that doesn't mean this technique/technology doesn't have value (e.g. in anaesthesizing), just that the press blurb claims way too much...
Yeah, you're right. There is a necessary exploratory phase, where you look at data to form an idea. I wanted to stress the importance to have a theory, a model, a framework, whatever, preferably computational, to interpret the data. Not having that, a lot of the data becomes "duh"...
Ok, but real science needs a good model first. I can measure anything, and come up with e.g. a correlation between age and some aspect of driving. Does that mean anything? No, it doesn't. Suppose we didn't have a model of gravity and upward lift, and then started measuring the relation between weight and falling acceleration. Then we would find out that really light things don't fall. Well, we knew that already, of course, but then it would be science? No, it would be "duh". Once we have a model that incorporates air pressure, resistance, lift, and gravity, we can attach a deeper meaning to these measurements. So: model first, then measurements. That's science.
A lot of these duh studies will also claim to use models, but they usually are so-called generalized linear models, where each independent factor you can measure becomes a linear factor in the end result. That's bad science.
In the small list of non-language learning software, there are must-have apps such as PowerDirector 9 Deluxe and Sony Vegas, Acid and Sound Forge. Why should we bother when a shop cannot even filter for OSX.
Read it, didn't comment on it, as you didn't comment on my post, but went off in another direction altogether. If you want a comment: I don't see SnowCrash nor Neuromancer as SF in this context. These books cross boundaries, and Neuromancer introduced a new style of writing and storytelling. SnowCrash is less arty, more action, and a poignant social commentary, well written. These are books that cannot be simply judged by their technological novelty.
Besides, some of the concepts in SnowCrash still will take decades to become reality. The description of the VR environment itself may be too simple, but massive, global immersion with full interaction (from a mobile station) is very far away, and the facial rendering technology described is more complex than that used in the latest game, LA Confidential.
I'm still not entrenched enough into the realms of post-modernism that I think the difference between Real Art and shoddy entertainment can be dealt with by a single word, even if it has a deep and rich semantic structure as "meh".
Markov models have the same problem as any other metric: they don't take context into account. And that includes competence and performance of the author: a French coder will make different mistakes than an English theatre journalist, and there are no corpora on which to train the model. There's no money for that. This is just going to be another cheap hack with very little benefit and potentially huge costs.
I worked on spelling and grammar checking, and I can assure you it's far from easy. The errors most grammar checkers can find reliably will not interfere with general understanding. Other errors are very hard to reliably detect and correct; illiterate language is almost impossible to correct.
E.g., let's take an example from that page (whose link has mysteriously disappeared when I clicked on Reply): "There is an antique store on Camden Avenue." Suppose I made a mistake and wrote: "*Their is an antique store on Camden Avenue." If you would take a simple metric such as Levenshtein (the minimal number of changes needed to turn one word into another), the optimal correction would be: "Theirs is an antique store on Camden Avenue."
And this example: "*They're are many documents that are used in investigations." could seen as a duplication of "are" and corrected to: "They are many documents that are used in investigations." You would really need to understand the context to know which one is correct.
No, this is a Bad Idea. Just because you can is not good enough.
I mean, how much did this thing cost? $9 billion? And it was way over budget. They couldn't add one or two supercomputers?
Plus, it's an ugly experiment to begin with. Just throwing more and more energy at it to see if there might be a glimpse of a particle that might or might not exist. And if we don't find it, we won't even know for sure it doesn't exist. There's no beauty in it.
So police is ok, but education and health is not? And libertarians are also know for their dislike of too many laws. Which of the current laws would you abolish?
Anyway, in a normal democracy you are perfectly free to choose whatever you want. You just have to pay a bit of taxes for that service. Is that really so bad? Do you really want to send the poorest back to dying in the slums because you want all your money for yourself?
And, mind you, you probably would be poor yourself. Unless you happen to come from a powerful and wealthy family, you would be relegated to the underclass. You wouldn't have money to worry about, nor any freedom, since the police wouldn't care for it. Go read some Charles Dickens.
That's even worse, and it's not even essays (as the OP states). So the teacher has to provide the program with the "correct" anwers? That can be a pretty long list. It's probably going to be good at rating mediocrity...
I'm pretty sure no program is capable of this (and I've got a PhD in natural language processing). They might be able to check for a couple of easily scored factors, such as number of words, and consistency between paragraphs, but I'm pretty sure that there is no program that could distinguish between an essay and the same essay messed up to base reasoning on false assumptions. I think someone left out a pretty important assumption: such programs might be able to score fairer (meaning: with less bias!), provided the students did their best.
Except no-one would pay for the scientists getting their education in the first place. The world would very, very quickly become a dumb and hard place, run by tribal war lords. What? You think people are good and are interested in what you care about? Forget it. They want to have your car, your house, and your wife. And without police, they'll have it.
Wanna see how absence of government works? Go to Sudan.
Welcome to libertarian heaven, a.k.a. humanitarian hell.
I pity you for the number of replies that apparently only read up until the point where they get excited, and conveniently overlook the last part, the Anonymous Cowards...
Why is there no way to mod things both insightful *and* funny?
I was afraid of such a response. XML documents can be stored in a relational database. Either in a neat structure, with linked tables and such, or just as a BLOB, in case you don't care. And then there is Oracle's XML datatype, which allows you to write queries using XPATH.
As I wrote in another comment, I love the performance and scalability of NoSQL, but this is really not an argument in favor of them.
It's not that the data cannot be stored, we've got billions and billions of rows. I just cannot get 30M rows returned from the server in a couple of seconds; that takes ages under Oracle, whereas Hypertable returns them very, very quickly. E.g. fast enough to generate reports on the fly.
And if your data doesn't fit in a relational database, how are you going to get it into a noSQL one? Writing serialized data in the value field?
If you're willing to spend a lot of money, perhaps. But if you want to run on the cheap, where you've got more time than money, NoSQL database provide excellent performance. I'm working with an Oracle Enterprise installation, and it just cannot compete with the 30M rows that the Hypertable db on my laptop can serve up in a couple of seconds. Perhaps if the DB guys threw a load of money at it, Oracle could do that too. However, using Hypertable means that you have to write a lot of extra code. That's not good for business in the long term. But for a start? Or for throwaway data (I'm thinking Twitter)?
Java can also be used in Oracle (the db), but PL/SQL works better. It still is very messy though. At this moment, I prefer to do the data massaging in Java and keep the sprocs to a very bare skeleton. A bit of aggregation ok. But where I work now, complex reporting stuff is done in PL/SQL and that is not a nice sight.
I've got about 20 years experience in C++. I installed the first preprocessor on a Sun workstation (there's that Oracle connection again) if memory serves, and it didn't work well. Stroustrup certainly made a mess out of it, multiple inheritance and all. And then he scorned the most elegant thing in the language, templates.
Objective-C is mainly a mac thing, but it does compile on other platforms. It's a pretty interesting language. And rather efficient. However, there is nothing comparable to Tomcat/JBoss/... or Active Mqueue or any of that stuff that makes Java so versatile. On the other hand, perhaps there is, but it might be expensive. Objective-C totally rules for GUI development though.
And I mock you for your love of Python! A language where your editor's tab settings can change the meaning!! A language without semicolons!!! Everyone know that the only language without semicolons was (visual) basic and that is a bad thing. By association, Python must be bad.
Java 1.1 I/O... there's something to be improved. Or Calendar, my goodness. Thank The Force for JodaTime.
The topic was of course relational databases.
With respect to frameworks: I've found that Objective-C(++) scales pretty well, and is very, very friendly in terms of multithreading. But I agree that Java has an edge in ease of deployment. And I quite like the braces and semicolons...
Anyway, Java is not Oracle's product, just as MySQL really isn't. These were conceived and grown elsewhere, just to end up in the portfolio of a company that took 8 versions just to improve SQL*Plus a bit...
The bar for sarcasm is considerably lower when Oracle is involved.
I can't believe you didn't point out that going with Oracle could also have meant doing everything in Java or using AutoVue or Oracle Linux (http://www.oracle.com/us/technologies/linux/index.html)
They should have gone with Oracle? Why? I work with that expensive cr*p, and it can't perform its way out of an open box. They can't have that much db dependent software anyway. Just plug in a compatibility layer and use something fast under it. I guess this is only news because it's facebook and facebook is worth so much money.
You're right: how the hell do they define unconsciousness without defining consciousness? And then they define it as the process that happens when you get anaesthesized? That's all about adding drugs to suppress brain functions. No wonder they find something like this.
Apart from the methodological errors that plague such studies. 3D imaging from EEG is at least 10 years old now, and relies on assumptions about the electrical structure of the places where the signal originates, and can only see part of the brain. Then the question becomes: how do you interpret all these different 3D high speed measurements? Apparently, they just found a qualitative difference while looking at the images. That's limited.
Anyway, that doesn't mean this technique/technology doesn't have value (e.g. in anaesthesizing), just that the press blurb claims way too much...
Yeah, you're right. There is a necessary exploratory phase, where you look at data to form an idea. I wanted to stress the importance to have a theory, a model, a framework, whatever, preferably computational, to interpret the data. Not having that, a lot of the data becomes "duh"...
Ok, but real science needs a good model first. I can measure anything, and come up with e.g. a correlation between age and some aspect of driving. Does that mean anything? No, it doesn't. Suppose we didn't have a model of gravity and upward lift, and then started measuring the relation between weight and falling acceleration. Then we would find out that really light things don't fall. Well, we knew that already, of course, but then it would be science? No, it would be "duh". Once we have a model that incorporates air pressure, resistance, lift, and gravity, we can attach a deeper meaning to these measurements. So: model first, then measurements. That's science.
A lot of these duh studies will also claim to use models, but they usually are so-called generalized linear models, where each independent factor you can measure becomes a linear factor in the end result. That's bad science.
Someone mod the parent up. That paper is really worth reading.
In the small list of non-language learning software, there are must-have apps such as PowerDirector 9 Deluxe and Sony Vegas, Acid and Sound Forge. Why should we bother when a shop cannot even filter for OSX.
Read it, didn't comment on it, as you didn't comment on my post, but went off in another direction altogether. If you want a comment: I don't see SnowCrash nor Neuromancer as SF in this context. These books cross boundaries, and Neuromancer introduced a new style of writing and storytelling. SnowCrash is less arty, more action, and a poignant social commentary, well written. These are books that cannot be simply judged by their technological novelty.
Besides, some of the concepts in SnowCrash still will take decades to become reality. The description of the VR environment itself may be too simple, but massive, global immersion with full interaction (from a mobile station) is very far away, and the facial rendering technology described is more complex than that used in the latest game, LA Confidential.
I'm still not entrenched enough into the realms of post-modernism that I think the difference between Real Art and shoddy entertainment can be dealt with by a single word, even if it has a deep and rich semantic structure as "meh".