No. A random sequence does not have to be
uncompressible. The sequence "1 1 1" could very
well be the output of random selection and is
just as likely as "5 3 9" or any other combination
of three digits.
You can however not write a generic compression
algorithm that can compress any and all inputs
without further information (as this would allow
you to pass the output back in to that algorithm
and repeat, until it has been compressed to nothing - which is a clear paradox).
"random" in the context of the article should
rather be "unpredictable based on previous digits". What that means is that it should be
impossible to predict the next digit og Pi based
on the previous digits without resorting to
other information about Pi (such as how to recognize Pi, and find a formula to calculate it) - the number itself doesn't appear to include any
discernable information that allows you to accurately predict the next digit.
For an example, consider the sequence "1,2,3,4". Most people would predict the next digit to be "5". And most likely you'd be correct, because you'd assume you were looking at a part of a sequence increasing with one per digit - there's
a relationship between the digits that can be
easily calculated.
Noone has found that with Pi, and this article is
about trying to prove that no such relationship
can be found.
Or for the benefit of non-mathematicians: It's
"random".
(disclaimer: I'm certainly no mathematician, and I'm sure someone can find some silly flaw in the above:-)
Why the hell would you want to calculate Pi to
a lot of digits? It's been done to an accuracy
that nobody needs already. Again and again and again. We don't really need yet another waste of
resources....:)
I suggest you go sign up on the mailing list
on www.boycottadobe.org. A lot of the traffic is
concerned with getting the word to journalists,
politicians and other people that may affect this
directly - the demonstrations are only one of a
long range of methods used.
These aren't amateurs - a lot of people from organizations with lots of experience in activism
like this are involved.
For my part I've written a few e-mail messages do
Adobe officials. Every little bit done to turn this into a PR nightmare for Adobe is a good thing.
In my case I'd want the authentication provider,
because I move between lots of machines. I don't want to have the data stored locally. However I would want to be able to choose the authentication provider myself, based on trust. If I felt the security of my data warranted paying extra for a provider that use multiple external auditors to verify security and integrity, then so be it, and if I don't value my data, I could leave it with Microsoft.
But keep in mind that if the data is encrypted properly, you could have a system where you tell
the authentication provider to provide the data to
site X, and then tell site X your passphrase - no need to every send that phrase to your authentication provider.
Apparently you are clueless. There are lots of
powerline modems available on the market, for
speeds up to a couple of megabits pr. second at
least. The problem with widespread deployment is passing the data past transformers. In effect most schemes are based around adding equipment at every transformer.
In Europe that isn't too bad, since the number of households per transformer is high. In the US on the other hand, the number of households per transformer is very low, hence increasing the cost of using most current powerline modems quite a lot.
FYI: If all you want is sending data over powerlines within your own house, there are cheap,
working boxes to be had from lots of places, and I know at least a couple of people who've built powerline modems like that themselves.
Actually, apparently the demand for positions in
MBA courses have dropped in recent years - many people are instead looking at more specialized courses, so I doubt you'll see any explosion in the availability of people with MBA's.
Of course, if you switch careers totally you're
going to have to start getting experience in your
new field. However, for many, taking an MBA combined with engineering or computer science background is more intended to be able to take on technical positions that are more bordering to business development, project management or upper management positions.
And that's a completely different value proposition.
Also, keep in mind that there are lots of schools (good schools), that offer part time programs that are tightly integrated with your current work (for instance by focusing written work towards doing real projects in your company). For those who are considering an MBA, it might be worth talking to your boss or your HR department and asking them whether they would be willing to sponsor you for a school like that.
See my other post below. ISO/IEC 10646 and the Unicode standards define the character sets. UCS-2 and UCS-4 are encodings of those characters sets. UTF-7/UTF-8/UTF-16 are transformation formats that allow variable length encodings of the UCS-2 and UCS-4 encodings.
No. See the glossary at www.unicode.org - UCS-2
and UCS-4 are encoding forms of the unified character set defined by the ISO/IEC 10646
standards, which now include at least 10646-1 and 10646-2. Unicode is mostly a different name for
the ISO/IEC standards, but also include additional information about the use of the characters.
You mean that can't satisfy the bureaucrats. Most ordinary people won't see many restrictions from the current standard, as it does contain about CJK 65,000 codepoints, which should be more than enough for ordinary use. Those does at this point
also include "compatibility" characters - duplicates that are there to be satisfy worries about compatibility with pre-existing encoding systems.
Did you actually read my post? I explained why there is a legitimate request for different versions of similar characters among the CJK glyphs. I also suggested that people that needs the missing characters work to add them.
Finally, however, I did suggest that to most people using Chinese, Japanese and Korean, the current set of 94,140 characters, of which about
65.000 are there for the benefit of Chinese, Japanese and Korean, would be sufficient.
I did not write anything to imply that noone would run into limits. I did not write anything to imply that people who do run into limits should accept that (hence my suggestion that they work to have the characters they need accepted in forthcoming revisions of the standard).
However I do stand by my claim that 94,140 characters will be enough for most people most of the time, including people using Chinese, Japanese and Korean.
Now go learn something about how to parse basic English sentences.
Wrong. The worst case for unicode is 4 times larger than normal. If you only use non-ASCII text spuriously, you can use UTF-8 and will get by with much less than that (as UTF-8 encodes all ASCII text in one byte).
Do you use Linux? Try starting "kterm" or similar.
If you're using Redhat and Gnome you'll likely
find it under "System" in the program menu as "Kanji terminal". Try holding down alt and pressing a couple of character combinations.
AFAIK, each plane is only 16 bit. For Unicode 3.1, for instance, the new characters are placed in planes 1,2 and 14. But you're right that Unicode as a whole encodes over a million codepoints.
The "surrogate pair" method only applies to UTF-16 encoding, AFAIK. UCS-4 should be equivalent to UCS-2 with surrogate pairs, except that the codepoint is always encoded as a 32 bit value,
whether or not a single 16-bit character or a pair of two 16-bit surrogates are used.
UCS-4 is not a character set. It is an encoding of Unicode, similar to UCS-2 (UCS-2 is 16 bit, UCS-4 is 32 bit), and UTF-7, UTF-8 and UTF-16 (variable lenght encodings).
Except for UCS-2 (and perhaps UTF-7? I don't remember), all of them can encode about a million glyphs (the reason it's not more is due to the way the codespace is laid out, separating things in "planes", and reserving a lot of space for private use etc.)
Uhm. Unicode already have at least four representations that allow for about a million characters each: UTF-8 (8 bit for US-ASCII, 2-4(?) bytes for everything else), UTF-16 (usually 16 bit, 32 bit for alternate "planes") and UCS-32 (32 bit).
In other words, the limitation currently isn't lack of space in the Unicode encodings (unless you use UCS-2), but the fact that they simply haven't gotten around to specifying any more characters yet - unicode is still a work in progress.
One of the reasons they want different glyphs is that the characters actually look different in
present day use.
As for including all possible historical versions of Western characters, there are very few that are sufficiently different from present day renderings to be easy to confuse.
But I agree that his criticism is mostly whining. Most of all because Unicode 3.1 has shown that unicode absolutely is not a static standard, but one that is evolving to encompass more characters on a regular basis. Perhaps some people will have problems using it today. In that case those people should interact with the standards committee instead of whining, and get their characters into the next version.
But for most people (including most Chinese and Japanese people) the current Unicode standard will be comprehensive enough for most use.
Get your facts straight. Unicode isn't written in stone. It is an evolving standard. And one of the reasons it is taking so long is precisely because everyone affected can get involved - there's been a lot of infighting about which glyphs should make it and how to organize them. The result, however, is that most commonly used scripts can be handled
by the current version of Unicode. More will most likely be handled in the future.
Input methods for Chinese, Japanese and Korean exists, and can efficiently handle the number of characters required. Some do it by typing out the romanized sound, and mapping it to the characters.
And actually, the "Unicode standard we have now" does not fit in UCS-2 (16 bit). It requires one of the UTF-* encodings (which are variable length encodings), or UCS-4 (32 bit).
As for his gripes about Unicode 3.1, sure, there are things you can't write with it. But it's a good step forward. And it doesn't fill the entire
glyph-space, by far. The 32 bit encodings, because of the way they are arranged can "only" handle about a million characters if I remember correctly, but that is still way more than is needed.
I've been using Unicode in various incarnations
for a long time. And UCS-2 is not the only way
to encode Unicode. UTF-8 is perhaps a lot more widespread, as it is the defacto standard encoding for exchange of XML documents over the web.
UCS-4 is also quite common, and allows for the new extensions.
UTF-16 is used by some that needs to extend their UCS-2 applications to UTF-16, or that mostly need text that work with UCS-2, but wants to be prepared for more.
Yes, a lot of things are difficult with Unicode. But if you look at most recent internationalization efforts, unicode is what people use.
Considering the size of LG (it's one of the largest companies in the world), I'd say that is a big loss. And considering how controversial SDMI is, if even a single electronics giant stands outside, they've lost, because the remaining one(s) will surely exploit their status for what it's worth to take market share by appealing to consumer rights.
You can however not write a generic compression algorithm that can compress any and all inputs without further information (as this would allow you to pass the output back in to that algorithm and repeat, until it has been compressed to nothing - which is a clear paradox).
"random" in the context of the article should rather be "unpredictable based on previous digits". What that means is that it should be impossible to predict the next digit og Pi based on the previous digits without resorting to other information about Pi (such as how to recognize Pi, and find a formula to calculate it) - the number itself doesn't appear to include any discernable information that allows you to accurately predict the next digit.
For an example, consider the sequence "1,2,3,4". Most people would predict the next digit to be "5". And most likely you'd be correct, because you'd assume you were looking at a part of a sequence increasing with one per digit - there's a relationship between the digits that can be easily calculated.
Noone has found that with Pi, and this article is about trying to prove that no such relationship can be found.
Or for the benefit of non-mathematicians: It's "random".
(disclaimer: I'm certainly no mathematician, and I'm sure someone can find some silly flaw in the above :-)
--
Remove Trash+ to reach my actual inbox
--
Remove Trash+ to reach my actual inbox
--
Remove Trash+ to reach my actual inbox
--
Remove Trash+ to reach my actual inbox
These aren't amateurs - a lot of people from organizations with lots of experience in activism like this are involved.
For my part I've written a few e-mail messages do Adobe officials. Every little bit done to turn this into a PR nightmare for Adobe is a good thing.
--
Remove Trash+ to reach my actual inbox
But keep in mind that if the data is encrypted properly, you could have a system where you tell the authentication provider to provide the data to site X, and then tell site X your passphrase - no need to every send that phrase to your authentication provider.
--
Remove Trash+ to reach my actual inbox
In Europe that isn't too bad, since the number of households per transformer is high. In the US on the other hand, the number of households per transformer is very low, hence increasing the cost of using most current powerline modems quite a lot.
FYI: If all you want is sending data over powerlines within your own house, there are cheap, working boxes to be had from lots of places, and I know at least a couple of people who've built powerline modems like that themselves.
Actually, apparently the demand for positions in MBA courses have dropped in recent years - many people are instead looking at more specialized courses, so I doubt you'll see any explosion in the availability of people with MBA's.
And that's a completely different value proposition.
Also, keep in mind that there are lots of schools (good schools), that offer part time programs that are tightly integrated with your current work (for instance by focusing written work towards doing real projects in your company). For those who are considering an MBA, it might be worth talking to your boss or your HR department and asking them whether they would be willing to sponsor you for a school like that.
Sure, but Bill Gates isn't exactly a good example of the value of education, considering he dropped out of college without any degrees.
See my other post below. ISO/IEC 10646 and the Unicode standards define the character sets. UCS-2 and UCS-4 are encodings of those characters sets. UTF-7/UTF-8/UTF-16 are transformation formats that allow variable length encodings of the UCS-2 and UCS-4 encodings.
No. See the glossary at www.unicode.org - UCS-2 and UCS-4 are encoding forms of the unified character set defined by the ISO/IEC 10646 standards, which now include at least 10646-1 and 10646-2. Unicode is mostly a different name for the ISO/IEC standards, but also include additional information about the use of the characters.
You mean that can't satisfy the bureaucrats. Most ordinary people won't see many restrictions from the current standard, as it does contain about CJK 65,000 codepoints, which should be more than enough for ordinary use. Those does at this point also include "compatibility" characters - duplicates that are there to be satisfy worries about compatibility with pre-existing encoding systems.
Finally, however, I did suggest that to most people using Chinese, Japanese and Korean, the current set of 94,140 characters, of which about 65.000 are there for the benefit of Chinese, Japanese and Korean, would be sufficient.
I did not write anything to imply that noone would run into limits. I did not write anything to imply that people who do run into limits should accept that (hence my suggestion that they work to have the characters they need accepted in forthcoming revisions of the standard).
However I do stand by my claim that 94,140 characters will be enough for most people most of the time, including people using Chinese, Japanese and Korean.
Now go learn something about how to parse basic English sentences.
I don't know about the status, but I believe it was proposed by someone a while back... :-)
Wrong. The worst case for unicode is 4 times larger than normal. If you only use non-ASCII text spuriously, you can use UTF-8 and will get by with much less than that (as UTF-8 encodes all ASCII text in one byte).
You don't need a special keyboard.
The "surrogate pair" method only applies to UTF-16 encoding, AFAIK. UCS-4 should be equivalent to UCS-2 with surrogate pairs, except that the codepoint is always encoded as a 32 bit value, whether or not a single 16-bit character or a pair of two 16-bit surrogates are used.
Except for UCS-2 (and perhaps UTF-7? I don't remember), all of them can encode about a million glyphs (the reason it's not more is due to the way the codespace is laid out, separating things in "planes", and reserving a lot of space for private use etc.)
In other words, the limitation currently isn't lack of space in the Unicode encodings (unless you use UCS-2), but the fact that they simply haven't gotten around to specifying any more characters yet - unicode is still a work in progress.
As for including all possible historical versions of Western characters, there are very few that are sufficiently different from present day renderings to be easy to confuse.
But I agree that his criticism is mostly whining. Most of all because Unicode 3.1 has shown that unicode absolutely is not a static standard, but one that is evolving to encompass more characters on a regular basis. Perhaps some people will have problems using it today. In that case those people should interact with the standards committee instead of whining, and get their characters into the next version.
But for most people (including most Chinese and Japanese people) the current Unicode standard will be comprehensive enough for most use.
Get your facts straight. Unicode isn't written in stone. It is an evolving standard. And one of the reasons it is taking so long is precisely because everyone affected can get involved - there's been a lot of infighting about which glyphs should make it and how to organize them. The result, however, is that most commonly used scripts can be handled by the current version of Unicode. More will most likely be handled in the future.
And actually, the "Unicode standard we have now" does not fit in UCS-2 (16 bit). It requires one of the UTF-* encodings (which are variable length encodings), or UCS-4 (32 bit).
As for his gripes about Unicode 3.1, sure, there are things you can't write with it. But it's a good step forward. And it doesn't fill the entire glyph-space, by far. The 32 bit encodings, because of the way they are arranged can "only" handle about a million characters if I remember correctly, but that is still way more than is needed.
UCS-4 is also quite common, and allows for the new extensions.
UTF-16 is used by some that needs to extend their UCS-2 applications to UTF-16, or that mostly need text that work with UCS-2, but wants to be prepared for more.
Yes, a lot of things are difficult with Unicode. But if you look at most recent internationalization efforts, unicode is what people use.
Considering the size of LG (it's one of the largest companies in the world), I'd say that is a big loss. And considering how controversial SDMI is, if even a single electronics giant stands outside, they've lost, because the remaining one(s) will surely exploit their status for what it's worth to take market share by appealing to consumer rights.