I wasn't saying that Babelfish is particularly bad as translation software goes, merely that the mistranslation was particularly unfortunate in this case because the word "an" was key to decipherment.
The use of a word from another language is of course a particularly difficult thing for translation software to deal with. The fact that it is quoted is a hint that it might be something that shouldnt be translated but by no means a certain indicator. This is probably an example of translation that cannot be done correctly by a system that does not actually "understand" the text and have the ability to reason about it and make use of knowledge of the world.
Other provisions of the bill would set a dangerous precedent by making copyright owners who use anti-copying technology on music discs subject to regulation and fines from the Federal Trade Commission unless they meet
extensive labeling and regulatory requirements.
The PPA's notion of extensive is ridiculous. All they need to do is put on a label like copy-protected music or encrypted music - not a CD. The regulatory requirements are almost fully contained in the text of the bill and basically just require these labels. You'd think that they were being asked to provide the kind of detailed labelling required on food, or regulated the way prescription drugs are.
The requirements are very modest and entirely justifiable - they just prevent consumer fraud.
Twenty years ago at Bell Labs one of the speech machines (an SEL with homebrew audio i/o) had output to loudspeakers that went through unshielded speaker wires that ran past the CPU, so if you weren't playing anything back the speakers played back CPU noise. We could tell what stage a compilation was at by the noise that came over the speakers.
To fill in a detail, they made use of the fact that the blacked out word is preceded by "an", from which
they deduced that it had to begin with a vowel.
The Babelfish mistranslation of this as "year" is
therefore particularly misleading.
Maybe I'm missing something, but I don't see what the non-obvious innovations are in these patents.
The first one, for example, seems to describe a perfectly ordinary object system, no different from what has been in languages Smalltalk, C++, and CLOS for twenty years or more. The fact that the object system appears to be intended specifically for management of certain types of data doesn't make it any more innovative. Not that Sun's recent behavior makes me like them, but I wonder if Kodak's
patents are valid.
SCO was doing poorly when they filed the suit. Their software was not all that good and wasn't doing very well even in the proprietary market. Furthermore, their market was being undercut by the various FLOSS versions of UNIX, particularly GNU/Linux. So, no, even if they hadn't filed the suit, it was a company whose business was in decline and that could be expected to continue to go downhill.
I don't have a problem with law enforcement looking into someone who put in this type of request. Its true that it might involve something fishy, and there's no harm in checking it out. It would be legitimate to see if the guy had a criminal record, had made threats, had been buying explosives, etc. The problem I see is with the questions they asked him. It would be reasonable to ask him why he wanted to know about the tunnels, but it was totally inappropriate to ask him whether he was working with the ACLU or whether he belonged to UT Watch. That's political harassment. It isn't relevant to checking out potential terrorists and it isn't any of their business.
The term "assignation" is used where we would normally use "assignment" in C. H. Lindsey and
S. G. van der Meulen's classic Informal Introduction to Algol 68.
One thing that few if any reviews, and few if any pieces of information supplied by the distribution,
tell you is exactly what software is installed in a
non-customized installation. They'll tell you what office suite they provide and that sort of thing, but I would like to know what you get in, say, a default workstation installation. My own gripe with Mandrake, for instance, is that the default workstation installation omitted all sorts of things that I wanted, including a lot of things whose omission surprised me. Granted, in some cases a custom installation may be necessary, but it would be nice to know when, and to be able to choose distributions in part on the basis of how much work it is to get the kind of software you need.
Any content-based restriction on what sites
people can visit is improper. Not only does the government have no business playing censor, but
it sends the wrong message to people elsewhere,
namely that censorship is okay, as long as it is the right kind.
If they really didn't want to waste resources on anything other than pro-democracy web sites, they could provide access just to specific sites, or they could provide open access but limit bandwidth. The images from porn sites will generally use much more bandwidth than the text of a political discussion. As it stands, the keyword list the contractor used is really hopeless. It just goes to show that there aren't very many words that are likely only to be associated with porn cites. I bet that any number of Catholic sites, for example, are blocked by the "virgin" keyword. In any case, where foreign countries are concerned, keyword blocking should be easy to get around. Instead of putting the sexual terms in your domain name, you put them in meta tags and site text, and you put them there in Chinese and Persian and so forth. How halfway intelligent people with the serious mission of spreading freedom and democracy can waste their time on such a thing is beyond me.
RMS is not wrong. He is well aware that lots of people use the term "intellectual property".
He says so explicitly here.
His point is that it is a cover term for a number of different things (copyright, patent, trademark, etc.) that in his opinion it is wrong to generalize about and that it is misleading to think of these things in the same terms as other things that we call "property". You can agree with him or not, the parent is a misleading presentation of RMS' position.
I'm generally sympathetic to attempts like this to get rid of spyware, but it seems to me that "computer usage" needs to be defined carefully in order to avoid criminalizing the collection of inocuous usage information. For instance, I once wrote a time series editor that was basically an interpreter for a specialized programming language, kind of like emacs. For a while, I collected statistics on memory usage and how many times the language primitives were executed and had the program email it to me on exit. The program printed a brief message about this on startup but didn't ask the user's permission. That
didn't seem necessary since the resources used were trivial and no personal information was obtained. I've heard of other people doing the same kind of thing. This could fall under information about "computer usage", which presumably is intended to be restricted to information that the user might want to keep confidential, such as web sites visited.
Strictly speaking, a WAV file can contain any of dozens of representations of the audio data, many of which are lossy compressions. The WAV format just defines the file structure. There is an entry in the header that specifies the audio data representation.
Much of the time WAV files contain linear PCM data,
which is uncompressed, but they don't have to.
In fact, theres no reason in principle that
a WAV file couldn't contain FLAC compressed data,
though I believe that no FLAC identifier code has
been registered.
Actually, a fair case can be made that in at least
some languages other than English the word "window"
is used as a generic computer term. That is, "window" may not mean the kind of window you find
in the wall of a house, but it does mean "window
on a computer display". I made exactly this argument
in this post on Language Log. It took me only a few minutes to turn up examples of the word "window" used as a generic computer term in Dutch.
It isn't clear what the real impact of this decision is. If you read the article, it quotes lawyers as saying that the music industry prepared
a sloppy case and that it can always try again.
It may only be a temporary victory. But at least it sounds like the Canadian courts are requiring a higher standard of evidence of infringement than the US courts are.
If you read the actual article, it says that the
study concluded that file sharing INCREASES CD sales. On their "most pessimistic model", which is not the one they think is most likely correct, they compute a decrease in sales of 2 million CDs in 2002, which they say is statistically insignificant in comparison to the decrease of 139 million CDs sold between 2000 and 2002.
The fact that software doesn't always come out on time isn't the point. The point is that for a lot of
people the main reason to pay thousands of dollars for software maintenance is to get upgrades without having to pay extra for them. If they don't get them, they have reason to think twice about shelling out for software maintenance. They can forgo maintenance and purchase upgrades when they come out if they decide that they are worthwhile, or they can use FLOSS products, where with luck somebody else will improve it, and where if necessary they can make improvements themselves or hire somebody else to do it. Microsoft's delays are reducing the reason not to go with these alternatives.
The strangest thing reported in this paper is that some arbitration panels have awarded critique domain names to the companies criticized on the grounds that non-native speakers of English may not understand American slang like "sucks" and so not
realize that a site is a critique site. Aside from the fact that this ignores the fact that the site content will soon make it clear that it is not the
company's own site, these panels seem not to understand that slang of general usage is probably the first thing to spread to other parts of the world.
The other striking fact reported is that some companies have attempted to pre-empt critics by registering the domain names that they are likely to use. Chase Manhattan Bank is mentioned as having registered a bunch of domain names, such as
chasebanksucks.com. What this says to me is that these companies are overly concerned about criticism and therefore that they probably offer poor service and are unresponsive to complaints.
I would avoid companies like that. This doesn't sound like a good way to enhance your company's reputation.
The generally poor English conversation skills of Japanese people do not support the position that learning two languages at once works poorly. The fact is that English is poorly taught in Japan. Very few Japanese teachers of English actually speak English themselves. Furthermore, the curriculum and exams, especially the all-important University entrance exams, emphasize the ability to read English, not to speak it.
An example of a country with succesful, quality language-teaching is the Netherlands. A Dutch high-school graduate will generally be fluent in English and capable of getting by in French and German. In Dutch Universities, classes in the languages taught in high school are conducted in the language. That is, if you major in French at the university level, your classes, including classes in subjects like literature and linguistics, will be conducted in French.
There are also many societies in which children grow up fluent in two or more languages as a result of using different languages in different contexts, e.g. one at home and another at school.
Millions of immigrants to the US, for example, have grown up speaking both fluent English, learned outside the home, and their heritage language: Italian, Yiddish, Chinese, Polish, etc.
Swedish Finns, such as Linus Torvalds, grow up
bilingual in Finnish and Swedish, and like other Finns, most acquire a good command of English by the end of high school.
Multilingualism is common in much of Africa.
People often speak their local language, a regional African language, such as Swahili, and thelanguage of the former colonial power, which often serves as a national language, such as French or English. To take an admittedly somewhat extreme case, I have a friend from Eritrea who speaks Tigrinya, Tigre, Amharic, Beja, Nara, Sudanese Colloquial Arabic, and Modern Standard Arabic. He's been in the US for a couple of years and his English is imperfect but quite servicable.
In all probability, most people who have ever lived have probably spoken at least two languages.
Monolingualism is pathological.
It seems to me that at least one major locus of the problem is being missed here. ESR says:
I'm reading the manual, and I find a reference to "BrowseAddress" and/etc/cups/cupsd.conf which begins to unfold for me the mystery of how the autoconfiguration is supposed to work. It seems that CUPS instances periodically send broadcast packets advertising their status and available printers to a broadcast address to be picked up by other CUPS instances. Smart design! But...bugger me with a chainsaw, the broadcast facility is turned off by default and the documentation doesn't tell you that!
One of the autoconfiguration features that CUPS
provides to make life easier for the user was disabled! Now, maybe off should be the default,
as a security measure, but from the point of view of ease of use, either the default should be on, or the user should be provided the opportunity to enable it during installation. I don't know whether the default was set by the CUPS people or the people who put together the distribution, but it seems to me that handling this kind of thing is exactly the role of the people who create distributions.
Its clear that Verisign is irresponsible and can be expected to keep trying to abuse its position running the GTLD servers for.com and.net.
As I understand it, ICANN delegated this role to Verisign, so ICANN ought to be able to take it away.
Can anyone explain the terms of the current delegation? Is there are contract that will expire in a few years? Did Verisign somehow acquire permanant rights?
There's no reason that Kerry should change his mind or disassociate himself from Jane Fonda. He came back from service in Vietnam convinced that the war was wrong and became prominent in the anti-war movement. There's nothing wrong with that. I too opposed the war then, as did, eventually, a majority of Americans. Nothing has happened to change my mind, and I see no reason that Kerry should change his. But whatever one's take on the Vietnam war, Kerry never did anything in any way improper. Even if you don't approve of Jane Fonda's trip to Hanoi, the fact that she and Kerry participated in the same rally does not reflect on Kerry. The anti-war movement, like any large movement, involved all sorts of people united only by their position on that issue.
The fact that some may hold even more extreme views or distasteful views on other issues or be
criminals doesn't say anything about the others.
I wasn't saying that Babelfish is particularly bad as translation software goes, merely that the mistranslation was particularly unfortunate in this case because the word "an" was key to decipherment. The use of a word from another language is of course a particularly difficult thing for translation software to deal with. The fact that it is quoted is a hint that it might be something that shouldnt be translated but by no means a certain indicator. This is probably an example of translation that cannot be done correctly by a system that does not actually "understand" the text and have the ability to reason about it and make use of knowledge of the world.
I know that an means "year" in French. But in context it is a mistake because it isn't the French word for year, it is a quoted English word.
The PPA's notion of extensive is ridiculous. All they need to do is put on a label like copy-protected music or encrypted music - not a CD. The regulatory requirements are almost fully contained in the text of the bill and basically just require these labels. You'd think that they were being asked to provide the kind of detailed labelling required on food, or regulated the way prescription drugs are. The requirements are very modest and entirely justifiable - they just prevent consumer fraud.
Twenty years ago at Bell Labs one of the speech machines (an SEL with homebrew audio i/o) had output to loudspeakers that went through unshielded speaker wires that ran past the CPU, so if you weren't playing anything back the speakers played back CPU noise. We could tell what stage a compilation was at by the noise that came over the speakers.
To fill in a detail, they made use of the fact that the blacked out word is preceded by "an", from which they deduced that it had to begin with a vowel. The Babelfish mistranslation of this as "year" is therefore particularly misleading.
Maybe I'm missing something, but I don't see what the non-obvious innovations are in these patents. The first one, for example, seems to describe a perfectly ordinary object system, no different from what has been in languages Smalltalk, C++, and CLOS for twenty years or more. The fact that the object system appears to be intended specifically for management of certain types of data doesn't make it any more innovative. Not that Sun's recent behavior makes me like them, but I wonder if Kodak's patents are valid.
SCO was doing poorly when they filed the suit. Their software was not all that good and wasn't doing very well even in the proprietary market. Furthermore, their market was being undercut by the various FLOSS versions of UNIX, particularly GNU/Linux. So, no, even if they hadn't filed the suit, it was a company whose business was in decline and that could be expected to continue to go downhill.
I don't have a problem with law enforcement looking into someone who put in this type of request. Its true that it might involve something fishy, and there's no harm in checking it out. It would be legitimate to see if the guy had a criminal record, had made threats, had been buying explosives, etc. The problem I see is with the questions they asked him. It would be reasonable to ask him why he wanted to know about the tunnels, but it was totally inappropriate to ask him whether he was working with the ACLU or whether he belonged to UT Watch. That's political harassment. It isn't relevant to checking out potential terrorists and it isn't any of their business.
The term "assignation" is used where we would normally use "assignment" in C. H. Lindsey and S. G. van der Meulen's classic Informal Introduction to Algol 68.
One thing that few if any reviews, and few if any pieces of information supplied by the distribution, tell you is exactly what software is installed in a non-customized installation. They'll tell you what office suite they provide and that sort of thing, but I would like to know what you get in, say, a default workstation installation. My own gripe with Mandrake, for instance, is that the default workstation installation omitted all sorts of things that I wanted, including a lot of things whose omission surprised me. Granted, in some cases a custom installation may be necessary, but it would be nice to know when, and to be able to choose distributions in part on the basis of how much work it is to get the kind of software you need.
Any content-based restriction on what sites people can visit is improper. Not only does the government have no business playing censor, but it sends the wrong message to people elsewhere, namely that censorship is okay, as long as it is the right kind.
If they really didn't want to waste resources on anything other than pro-democracy web sites, they could provide access just to specific sites, or they could provide open access but limit bandwidth. The images from porn sites will generally use much more bandwidth than the text of a political discussion. As it stands, the keyword list the contractor used is really hopeless. It just goes to show that there aren't very many words that are likely only to be associated with porn cites. I bet that any number of Catholic sites, for example, are blocked by the "virgin" keyword. In any case, where foreign countries are concerned, keyword blocking should be easy to get around. Instead of putting the sexual terms in your domain name, you put them in meta tags and site text, and you put them there in Chinese and Persian and so forth. How halfway intelligent people with the serious mission of spreading freedom and democracy can waste their time on such a thing is beyond me.
RMS is not wrong. He is well aware that lots of people use the term "intellectual property". He says so explicitly here. His point is that it is a cover term for a number of different things (copyright, patent, trademark, etc.) that in his opinion it is wrong to generalize about and that it is misleading to think of these things in the same terms as other things that we call "property". You can agree with him or not, the parent is a misleading presentation of RMS' position.
I'm generally sympathetic to attempts like this to get rid of spyware, but it seems to me that "computer usage" needs to be defined carefully in order to avoid criminalizing the collection of inocuous usage information. For instance, I once wrote a time series editor that was basically an interpreter for a specialized programming language, kind of like emacs. For a while, I collected statistics on memory usage and how many times the language primitives were executed and had the program email it to me on exit. The program printed a brief message about this on startup but didn't ask the user's permission. That didn't seem necessary since the resources used were trivial and no personal information was obtained. I've heard of other people doing the same kind of thing. This could fall under information about "computer usage", which presumably is intended to be restricted to information that the user might want to keep confidential, such as web sites visited.
Strictly speaking, a WAV file can contain any of dozens of representations of the audio data, many of which are lossy compressions. The WAV format just defines the file structure. There is an entry in the header that specifies the audio data representation. Much of the time WAV files contain linear PCM data, which is uncompressed, but they don't have to. In fact, theres no reason in principle that a WAV file couldn't contain FLAC compressed data, though I believe that no FLAC identifier code has been registered.
Actually, a fair case can be made that in at least some languages other than English the word "window" is used as a generic computer term. That is, "window" may not mean the kind of window you find in the wall of a house, but it does mean "window on a computer display". I made exactly this argument in this post on Language Log. It took me only a few minutes to turn up examples of the word "window" used as a generic computer term in Dutch.
It isn't clear what the real impact of this decision is. If you read the article, it quotes lawyers as saying that the music industry prepared a sloppy case and that it can always try again. It may only be a temporary victory. But at least it sounds like the Canadian courts are requiring a higher standard of evidence of infringement than the US courts are.
If you read the actual article, it says that the study concluded that file sharing INCREASES CD sales. On their "most pessimistic model", which is not the one they think is most likely correct, they compute a decrease in sales of 2 million CDs in 2002, which they say is statistically insignificant in comparison to the decrease of 139 million CDs sold between 2000 and 2002.
The fact that software doesn't always come out on time isn't the point. The point is that for a lot of people the main reason to pay thousands of dollars for software maintenance is to get upgrades without having to pay extra for them. If they don't get them, they have reason to think twice about shelling out for software maintenance. They can forgo maintenance and purchase upgrades when they come out if they decide that they are worthwhile, or they can use FLOSS products, where with luck somebody else will improve it, and where if necessary they can make improvements themselves or hire somebody else to do it. Microsoft's delays are reducing the reason not to go with these alternatives.
You'd think that /. would be the last place you'd find stereotyping of geeks. I have spoken with RMS face-to-face and he didn't smell.
I picked a bad example. chasebanksucks.com is the one domain name of this type that the critics got before the bank tried to pre-empt them.
The strangest thing reported in this paper is that some arbitration panels have awarded critique domain names to the companies criticized on the grounds that non-native speakers of English may not understand American slang like "sucks" and so not realize that a site is a critique site. Aside from the fact that this ignores the fact that the site content will soon make it clear that it is not the company's own site, these panels seem not to understand that slang of general usage is probably the first thing to spread to other parts of the world.
The other striking fact reported is that some companies have attempted to pre-empt critics by registering the domain names that they are likely to use. Chase Manhattan Bank is mentioned as having registered a bunch of domain names, such as chasebanksucks.com. What this says to me is that these companies are overly concerned about criticism and therefore that they probably offer poor service and are unresponsive to complaints. I would avoid companies like that. This doesn't sound like a good way to enhance your company's reputation.
The generally poor English conversation skills of Japanese people do not support the position that learning two languages at once works poorly. The fact is that English is poorly taught in Japan. Very few Japanese teachers of English actually speak English themselves. Furthermore, the curriculum and exams, especially the all-important University entrance exams, emphasize the ability to read English, not to speak it.
An example of a country with succesful, quality language-teaching is the Netherlands. A Dutch high-school graduate will generally be fluent in English and capable of getting by in French and German. In Dutch Universities, classes in the languages taught in high school are conducted in the language. That is, if you major in French at the university level, your classes, including classes in subjects like literature and linguistics, will be conducted in French.
There are also many societies in which children grow up fluent in two or more languages as a result of using different languages in different contexts, e.g. one at home and another at school. Millions of immigrants to the US, for example, have grown up speaking both fluent English, learned outside the home, and their heritage language: Italian, Yiddish, Chinese, Polish, etc. Swedish Finns, such as Linus Torvalds, grow up bilingual in Finnish and Swedish, and like other Finns, most acquire a good command of English by the end of high school.
Multilingualism is common in much of Africa. People often speak their local language, a regional African language, such as Swahili, and thelanguage of the former colonial power, which often serves as a national language, such as French or English. To take an admittedly somewhat extreme case, I have a friend from Eritrea who speaks Tigrinya, Tigre, Amharic, Beja, Nara, Sudanese Colloquial Arabic, and Modern Standard Arabic. He's been in the US for a couple of years and his English is imperfect but quite servicable. In all probability, most people who have ever lived have probably spoken at least two languages. Monolingualism is pathological.
It seems to me that at least one major locus of the problem is being missed here. ESR says:
One of the autoconfiguration features that CUPS provides to make life easier for the user was disabled! Now, maybe off should be the default, as a security measure, but from the point of view of ease of use, either the default should be on, or the user should be provided the opportunity to enable it during installation. I don't know whether the default was set by the CUPS people or the people who put together the distribution, but it seems to me that handling this kind of thing is exactly the role of the people who create distributions.
Its clear that Verisign is irresponsible and can be expected to keep trying to abuse its position running the GTLD servers for .com and .net.
As I understand it, ICANN delegated this role to Verisign, so ICANN ought to be able to take it away.
Can anyone explain the terms of the current delegation? Is there are contract that will expire in a few years? Did Verisign somehow acquire permanant rights?
There's no reason that Kerry should change his mind or disassociate himself from Jane Fonda. He came back from service in Vietnam convinced that the war was wrong and became prominent in the anti-war movement. There's nothing wrong with that. I too opposed the war then, as did, eventually, a majority of Americans. Nothing has happened to change my mind, and I see no reason that Kerry should change his. But whatever one's take on the Vietnam war, Kerry never did anything in any way improper. Even if you don't approve of Jane Fonda's trip to Hanoi, the fact that she and Kerry participated in the same rally does not reflect on Kerry. The anti-war movement, like any large movement, involved all sorts of people united only by their position on that issue. The fact that some may hold even more extreme views or distasteful views on other issues or be criminals doesn't say anything about the others.