The Evolution of Python 3

← Back to Stories (view on slashdot.org)

Posted by ScuttleMonkey on Monday January 12, 2009 @09:02AM from the all-growd-up dept.

chromatic writes to tell us that O'Reilly has an interview with Guido van Rossum on the evolutionary process that gave us Python 3.0 and what is in store for the future. "I'd like to reiterate that at this point, it's a very personal choice to decide whether to use 3.0 or 2.6. You don't run the risk of being left behind by taking a conservative stance at this point. 2.6 will be just as well supported by the same group of core Python developers as 3.0. At the same time, we're also not sort of deemphasizing the importance and quality of 3.0. So if you are not held back by external requirements like dependencies on packages or third party software that hasn't been ported to 3.0 yet or working in an environment where everyone else is using another version. If you're learning Python for the first time, 3.0 is a great way to learn the language. There's a couple of things that trip over beginners have been removed."

3 of 215 comments (clear)

Re:In all seriousness by AuMatar · 2009-01-12 11:39 · Score: 0, Troll

I think that it's ridiculous that a language's readability depends on the tool used to read it. It's a sign the language is broken. A proper language would use an easily distinguishable delimeter- anything other than a whitespace. It could be { [ ( or (@$^(*#*@(&#*(@&#^& for all I care. If you need a special tool to read it, its flawed. I should be able to write my code in emacs, vi, nano, pico, ed, or notepad for that matter without having to spend any time messing with the setup. And my coworkers should be able to use whatever tool they want, even if they are heretical vi users. Nor should I have to know the 1 billion features of emacs. I have better things to waste braincells on.
By the way, what would you do if you were in an environment that didn't have emacs- say editing on an embedded/mobile device? Or were working off of printouts? Or just didn't have it on the machine and couldn't install it (no network, network outage, improper permissions)?
I also love that they should behave "mostly sane" thereafter. So even with tools it isn't promised to work right? No thanks.
And I am speaking from experience here- I worked on a Python project in a team environment. It was a disaster, the whitespace thing caused daily bugs. There's no excuse for the amount of time and productivity it caused us to lose when the solution exists and is 4 or 5 decades old.

--
I still have more fans than freaks. WTF is wrong with you people?
Directed Evolution by CustomDesigned · 2009-01-12 13:38 · Score: 0, Troll

Can't evolution be controlled?
Of course it can. But then it isn't "evolution" in the religious sense that hard core atheists insist on. The official Dogma explicitly requires *undirected* chance plus natural selection as the ultimate origin of anything that appears to be designed. (Notice I said, "ultimate", nitpickers.)
I mean really, philosophical materialism is just as silly as the "the universe must have been created in 7 revolutions of a certain planet as measured 14 billion (or 6000) years into its evolution" camp. ("Evolution" in the continuous change according to a set of rules sense). Did they ever consider that our physical time was itself one the things being (allegedly) created? (Many Church Fathers did - e.g. Augustine)
There are many meanings of "evolution" in common use, so discussions always end up in equivocation with straw and torn blue jeans all over the place.
Unicode by spitzak · 2009-01-12 17:16 · Score: 1, Troll

I posted about this before in a previous Python 3.0 article and a lot of people attacked me. However I very much feel that Pythons treatment of Unicode as UTF-16 is a HUGE problem that will cause no end of pain. I think a far cleaner solution to Unicode is to do the following:
- Make unmarked plain quoted strings produce byte strings just like they do now. Unless there are backslashes, the contents are precisely the bytes that are in the input file. Keep the automatic casting of byte strings to unicode strings.
- Force the encoding to be UTF-8 by default, or at least make it trivial to turn this mode on (in Python2.x the default init deletes the api to do this!)
- The sequence \uXXXX in a byte string constant should turn into the correct UTF-8 sequence. And the sequence \xXX in a Unicode string should be interpreted as bytes and converted from UTF-8 to unicode. This is necessary so that a string constant can easily be changed between bytes and Unicode.
- We must have lossless conversion of UTF-8 to UTF-16. The most popular method I have seen is to turn invalid bytes into 0xd8xx (which is invalid UTF-16 as it is lower-half surrogate pairs). Oddly enough this makes the UTF-16 api useless because the reverse conversion is not lossless, I have looked into this and it may be fixable but is complex: the to-UTF-8 converter must not translate a sequence of these to a legal UTF-8 sequence and instead convert that sequence to the typical 3-byte encoding of that number, and the from-UTF-8 converter must treat these typical 3-byte encodings as invalid byte sequences except when they are arranged such that the back converter would make them! This is messy but I see no other way to be able to use backends that insist on UTF-16 (in particular Windows filenames and it's clipboard).
The reason for this is that real Python programs need to handle arbitrary data that is *PROBABLY* UTF-8. Note that by "PROBABLY" I mean that the programmer really really wants to think of it as a sequence of unicode characters, not as a "byte sequence", but it must NOT compare any two different byte sequences as being equal.
I'm very afraid that Python3.0 as designed will encourage byte sequences to be treated as ISO-8859-1 rather than UTF-8 (because when you set the translation to that it is lossless and no errors are thrown, and \xXX does the same thing in both constants). IMHO this would be very, very bad for internationalization efforts. Believing the programmers will not take this easy solution, and instead rewrite their interfaces to the new byte/unicode naming and correctly handle exceptions thrown by converters is, I think, quite ignorant.
I am not joking or trolling about this. This has bitten me already and forced us to change all our use of Python from Unicode to byte strings. And we are just reading metadata from image files. Searching for comments on Python 3.0 on the web, it is apparent that web programmers are encountering this far more often and are very worried about this, and they certainly are trying to handle many orders of magnitude more data from sources that may be actively trying to exploit security holes.