Your comments are well received. Of course, Perl hashes are in-memory data structures, and in-memory structures are infinitely more flexible than on-disk structures (not just in Perl). The topic is actually about comparing methods of on-disk storage.
Of course, don't expect any Java weenies to understand the beauty and flexibility of the inside-out objects technique.
Yes, that's a big problem with statistical methods. The point is that we don't just use words with specific meanings like "man" or "tall", but we also use:
abstract words that take on different meanings in different contexts (i.e. they're polymorphic)
we use words metaphorically (the "pissed" example above). Metaphor requires the reader to make the connection on the fly between two concepts, hence it requires intelligence. ("On the fly" is a good example. A computer can be given a list of such metaphorical expressions, but recognizing new ones is a much harder problem.)
we use words incorrectly, or misspell them, or use imperfect grammar, but that's OK because our human reader is able to infer the meaning
humans think it's funny sometimes to use words in the wrong context, i.e. where the metaphorical meaning is really outlandish, or there is a conflict between the idea and the way it is expressed. I think we like this because it requires intelligence to work out the meaning in these cases.
For example, the English word pattern can be translated in French by any of (please excuse the lack of accents, they were stripped when I submitted): modele, exemple, type schema, dessin, motif, maquette, patron, plan, disposition, groupement, repartition, combinaison, diagramme, gabarit, echantillon, tendance, figure, circuit (and probably others as well) depending on the context -- and not just the lexical context, but the meaning.
Previous attempts to automate translation focused on giving computers grammatical and semantic knowledge, in the hope that it could infer some meaning from this and so choose the right equivalents. Despite some success, this approach failed in general, putting machine translation (MT) firmly in the realm of AI. I believe this statistical approach is a step in the wrong direction (back to purely lexical means of analyzing texts with a view to translation). Further progress in MT will come from AI.
This doesn't detract from the ways in which computers have been useful to translators -- in the area of computer-assisted translation (translation memory, localization, terminology databases, etc.)
The other point is it's a lot harder to get a good-quality parallel corpus than you'd think (even in the Internet age -- most of the stuff on the Internet is crap anyway).
It's not the idea of using computers in translation that I think is limited, just this approach.
And where are you going to find gigabytes of parallel Klingon-English texts?
No seriously, this is the fallacy behind any statistical approach to automated translation.The news release gives the telling comment:
"Different human translators' versions of the same text will often vary considerably. Another key improvement has been the use of multiple English human translations to allow the computer to more freely and widely check its rendering by a scoring system. This not coincidentally allows researchers to quantitatively measure improvement in translation on a sensitive and useful scale."
This paragraph just doesn't make any sense to me. Either it's badly explained, or the entire approach is flawed:
You have to start with correctly human-translated and aligned texts to begin with. How many versions of the same text are you willing to pay for?
Most likely, you will have some texts well translated, and some badly translated. How do you rate the relative quality of each version? How many translators does it take to revise gigabytes of text? (One to screw in the lightbulb...)
A large percentage of existing translations are mediocre. So you are going to get mostly bad translation out, since they don't even attempt to build any linguistic knowledge into the system. GIGO rules!
Statistical methods just cannot deal with the subtlety of meaning to be found in natural language texts. It's a little like believing that you can always win at chess if you can just look ahead far enough. I believe that this approach is inherently limited and any apparent success is illusory. This news release hasn't changed my opinion.
Sorry to be a party-pooper, but that's how I feel.
It's not up to the Linux community to do anything. SCO is making the allegations, let them provide the evidence.
These "analysts" are not even programmers. They have no idea of how source code control works. Therefore their viewpoint is completely worthless.
I think they got the idea from the associative nature of human memory.
Unfortunately, this would really only be of value in a semantic web of such memento objects, with an inference engine that could automatically create useful associations between them.
Not just a traditional filing system like the article seems to describe. We already have a system that works for that: it's called... surprise!... a filesystem.
I don't know, what is this fascination with applying computer hardware to the kitchen?
Ever since the fifties, the "wired kitchen" has been the holy grail for futurists and technology gurus. They never seem to ask whether people really 'want' these products or not.
It's as if they want to justify the usefulness of computers. "See, it can cook too!" "And you can use it to index your recipes!"
I think these goals are misguided, because:
People who cook like to cook;
People who don't like to cook already have a million other easier options, like restaurants and take-out;
Inserting a computer (or any kind of automation) into the cooking and food preparation process can only make it more complicated, unless the computer does everything from bringing it home from the grocery to setting the table.
The point is: don't develop and completely implement a Web site, and only afterwards think about accessibility (when it might be too late because too much has to be re-worked, etc.)
Design your Web site to be accessible from the get-go. Then it won't be twice the cost, it might not even cost any more than usual, once the principles of accessibility become second-nature. It does mean that certain common (bad) design habits will have to be re-learned, and also clients have to understand that they will not have sub-micron control over presentation either. (Although this will get better as we move to more and more css-compliant browsers.)
The difference here is that the users are not (presumably) tech-savvy MIT students, but average citizens.
I work for the Ontario government, which has already implemented a digital certificate system for identifying their employees (up to 60,000 people), and believe me there are a lot of misconceptions and (exaggerated IMHO) distrust of such systems.
People hear all kinds of horror stories in the media about security vulnerabilities, theft of information, etc. Then, they are being forced prematurely to use on-line systems which they don't understand, and therefore fear.
About privacy and Big Brother issues: remember, the problem is with database itself, not with the authentication method used to access. The question we should be asking is: should the government be keeping the information at all? And who has access to it right now? Whether you access the information that is rightfully yours in person at a government office, by mail or by computer through a secure authentication method is a much smaller issue.
However, the Canadian governments (federal and provincial) have been moving extremely quickly towards e-Government, much faster than is comfortable for many citizens. While making it possible to access government services through new technologies, people should not be forced, or even feel pressured, to adopt technologies they don't trust or understand.
The other issue is controlling the issuing of digital certificates. The federal government needs to do much better than it has in the past at controlling these processes if it wants to gain the public's trust.
Re:Perl's had it's day - It's become like COBOL
on
Apocalypse 5 Released
·
· Score: 1
I can share my personal experience with you. Though I know programming, right now I am not officially working as a programmer. Nevertheless, my office needs to manage lots of data, which is inevitably stored in a variety of formats, databases, spreadsheets, etc.
I have found Perl to be an invaluable tool to act as "glue" between all these formats. I can honestly say that programming in Perl is genuine pleasure. I usually find that the program "pops" into place faster than I expect it to. I don't know why this is so, I'm not a language designer or "expert". But I think comparing Perl with all its expressive power to COBOL is ridiculous.
I think Perl excels with smaller programs, I can see it could be difficult to organize a very large Perl program. But just because you personally don't like Perl doesn't mean nobody finds it useful. And to say that Perl will die out just because you don't use it -- well I'm sorry but that's just pure hubris!
At the risk of repeating it too many times, let's remember: You don't have to be the most popular browser to matter! All you need is 10% of the market (if that) to be significant.
The real contribution of mozilla (et al.) is to make sure that everyone is not forced to use one single browser product. This will keep a certain Big Software Company (who shall remain nameless) from using its tremendous economic power unfairly.
Isn't it fascinating that the "Recording Industry" and the "Motion Picture Industry" think that they are "content creators"?
When will people realize that these "industries" are middle-men for the actual content creators, the artists and craftsmen who actually create works of art? Their only function is to get "content" from the artists to the consumers.
Their efforts to eliminate all distribution channels other than their own is an agressive move, not only against consumers, but against the very artists they claim to represent!
Direct distribution from artists to consumers using digital technology is in its infancy, but it's not going away! Eventually they will have to join the majority or become a footnote in history.
Your comments are well received. Of course, Perl hashes are in-memory data structures, and in-memory structures are infinitely more flexible than on-disk structures (not just in Perl). The topic is actually about comparing methods of on-disk storage.
Of course, don't expect any Java weenies to understand the beauty and flexibility of the inside-out objects technique.
So that would be... Billows? or Blows for short?
Repeat after me class...
... Dar - l ...
Darl
not Daryl... not Darryl... DARL!
I know, it's a wierd name... nevertheless...
SCO is dead!
There's, that's much better...
So they're escrewed?
"robot dog technology" ????
"Good news for Linux" ????
Am I the only one that doesn't get this?
There's no such thing as Soviet Russia.
By the way, does anybody know how to post non-ASCII text on Slashdot?
For example, the English word pattern can be translated in French by any of (please excuse the lack of accents, they were stripped when I submitted): modele, exemple, type schema, dessin, motif, maquette, patron, plan, disposition, groupement, repartition, combinaison, diagramme, gabarit, echantillon, tendance, figure, circuit (and probably others as well) depending on the context -- and not just the lexical context, but the meaning.
Previous attempts to automate translation focused on giving computers grammatical and semantic knowledge, in the hope that it could infer some meaning from this and so choose the right equivalents. Despite some success, this approach failed in general, putting machine translation (MT) firmly in the realm of AI. I believe this statistical approach is a step in the wrong direction (back to purely lexical means of analyzing texts with a view to translation). Further progress in MT will come from AI.
This doesn't detract from the ways in which computers have been useful to translators -- in the area of computer-assisted translation (translation memory, localization, terminology databases, etc.)
The other point is it's a lot harder to get a good-quality parallel corpus than you'd think (even in the Internet age -- most of the stuff on the Internet is crap anyway).
It's not the idea of using computers in translation that I think is limited, just this approach.
And where are you going to find gigabytes of parallel Klingon-English texts?
No seriously, this is the fallacy behind any statistical approach to automated translation.The news release gives the telling comment:
This paragraph just doesn't make any sense to me. Either it's badly explained, or the entire approach is flawed:
Statistical methods just cannot deal with the subtlety of meaning to be found in natural language texts. It's a little like believing that you can always win at chess if you can just look ahead far enough. I believe that this approach is inherently limited and any apparent success is illusory. This news release hasn't changed my opinion.
Sorry to be a party-pooper, but that's how I feel.
What the heck is a "crunchie", anyway?
It's not up to the Linux community to do anything. SCO is making the allegations, let them provide the evidence. These "analysts" are not even programmers. They have no idea of how source code control works. Therefore their viewpoint is completely worthless.
I think they got the idea from the associative nature of human memory.
Unfortunately, this would really only be of value in a semantic web of such memento objects, with an inference engine that could automatically create useful associations between them.
Not just a traditional filing system like the article seems to describe. We already have a system that works for that: it's called ... surprise! ... a filesystem.
I don't know, what is this fascination with applying computer hardware to the kitchen?
Ever since the fifties, the "wired kitchen" has been the holy grail for futurists and technology gurus. They never seem to ask whether people really 'want' these products or not.
It's as if they want to justify the usefulness of computers. "See, it can cook too!" "And you can use it to index your recipes!"
I think these goals are misguided, because:
And I don't see that happening any time soon. :-P
The point is: don't develop and completely implement a Web site, and only afterwards think about accessibility (when it might be too late because too much has to be re-worked, etc.)
Design your Web site to be accessible from the get-go. Then it won't be twice the cost, it might not even cost any more than usual, once the principles of accessibility become second-nature. It does mean that certain common (bad) design habits will have to be re-learned, and also clients have to understand that they will not have sub-micron control over presentation either. (Although this will get better as we move to more and more css-compliant browsers.)
It always amazes me (42) how deeply Douglas Adams seems to have influenced (42) the programming community on a subconscious level. (42)
42.
See?
The difference here is that the users are not (presumably) tech-savvy MIT students, but average citizens.
I work for the Ontario government, which has already implemented a digital certificate system for identifying their employees (up to 60,000 people), and believe me there are a lot of misconceptions and (exaggerated IMHO) distrust of such systems.
People hear all kinds of horror stories in the media about security vulnerabilities, theft of information, etc. Then, they are being forced prematurely to use on-line systems which they don't understand, and therefore fear.
About privacy and Big Brother issues: remember, the problem is with database itself, not with the authentication method used to access. The question we should be asking is: should the government be keeping the information at all? And who has access to it right now? Whether you access the information that is rightfully yours in person at a government office, by mail or by computer through a secure authentication method is a much smaller issue.
However, the Canadian governments (federal and provincial) have been moving extremely quickly towards e-Government, much faster than is comfortable for many citizens. While making it possible to access government services through new technologies, people should not be forced, or even feel pressured, to adopt technologies they don't trust or understand.
The other issue is controlling the issuing of digital certificates. The federal government needs to do much better than it has in the past at controlling these processes if it wants to gain the public's trust.
I can share my personal experience with you. Though I know programming, right now I am not officially working as a programmer. Nevertheless, my office needs to manage lots of data, which is inevitably stored in a variety of formats, databases, spreadsheets, etc.
I have found Perl to be an invaluable tool to act as "glue" between all these formats. I can honestly say that programming in Perl is genuine pleasure. I usually find that the program "pops" into place faster than I expect it to. I don't know why this is so, I'm not a language designer or "expert". But I think comparing Perl with all its expressive power to COBOL is ridiculous.
I think Perl excels with smaller programs, I can see it could be difficult to organize a very large Perl program. But just because you personally don't like Perl doesn't mean nobody finds it useful. And to say that Perl will die out just because you don't use it -- well I'm sorry but that's just pure hubris!
I think the (implied) point is, not "the universe is a computer", but "how big/fast a computer would we need to simulate the universe?"
Don't worry, programmers are on their own wavelength, and their hubris knows no bounds. But then maybe that's a virtue after all? ;-)
The real contribution of mozilla (et al.) is to make sure that everyone is not forced to use one single browser product. This will keep a certain Big Software Company (who shall remain nameless) from using its tremendous economic power unfairly.
Mozilla 1.0 does the job just by showing up!
Note to evolutionists: You still can't prove there isn't a God.
Note to agnostics: Get off the damn fence!
I do believe that creators should be compensated
Isn't it fascinating that the "Recording Industry" and the "Motion Picture Industry" think that they are "content creators"?
When will people realize that these "industries" are middle-men for the actual content creators, the artists and craftsmen who actually create works of art? Their only function is to get "content" from the artists to the consumers.
Their efforts to eliminate all distribution channels other than their own is an agressive move, not only against consumers, but against the very artists they claim to represent!
Direct distribution from artists to consumers using digital technology is in its infancy, but it's not going away! Eventually they will have to join the majority or become a footnote in history.