Slashdot Mirror


Python 2.6 to Smooth the Way for 3.0, Coming Next Month

darthcamaro writes "Some programming languages just move on to major version numbers, leaving older legacy versions (and users) behind, but that's not the plan for Python. Python 2.6 has the key goal of trying to ensure compatibility between Python 2.x and Python 3.0, which is due out in a month's time. From the article: 'Once you have your code running on 2.6, you can start getting ready for 3.0 in a number of ways,' Guido Van Rossum said. 'In particular, you can turn on "Py3k warnings," which will warn you about obsolete usage patterns for which alternatives already exist in 2.6. You can then change your code to use the modern alternative, and this will make you more ready for 3.0.'"

27 of 184 comments (clear)

  1. Re:More ready? by Onaga · · Score: 4, Funny

    But which one is correcter?

  2. tough transitions by AceJohnny · · Score: 4, Interesting

    These kind of compatibility switches are make-or-break. I'm glad there's Python 2.6 to try to ease the problem, but Py3k means that everybody who publishes python software will all of a sudden have to maintain 2 branches, for Python 2.X line and Python 3.X line.

    This isn't the same as one software package having "legacy" and "bleeding edge" branches, because that's their own choice. In this case the underlying language is forcing them to choose.

    Honestly, I'm not confident in the economics of such transitions, and believe Py3k will die out.

    --
    Misleading titles? Inflammatory blurbs? Keep in mind that Slashdot is a tabloid.
    1. Re:tough transitions by Anonymous Coward · · Score: 3, Insightful

      Honestly, I'm not confident in the economics of such transitions, and believe Py3k will die out.

      Why would Python 3.0 'die out'? Even if you don't believe existing projects will make the switch there's no reason why new projects won't want to have the considerable benefits of using Python 3.0.

    2. Re:tough transitions by DragonWriter · · Score: 3, Insightful

      These kind of compatibility switches are make-or-break. I'm glad there's Python 2.6 to try to ease the problem, but Py3k means that everybody who publishes python software will all of a sudden have to maintain 2 branches, for Python 2.X line and Python 3.X line.

      No, they don't "have to" maintain two branches. They can choose to, or they can maintain one (which depends on their particular circumstance); if necessary (if it is an app and not a library) they can just distribute the right interpreter with the app.

      This isn't the same as one software package having "legacy" and "bleeding edge" branches, because that's their own choice.

      Yeah, actually, it is exactly the same as that, at least as long as bug-fixes and maintenance continues on Python 2.x: the "one software package" being the Python interpreter.

      And, yeah, if those maintaining python-based projects choose to maintain Python-2.x and Python-3.x based versions, that will also be an instance of exactly what you say it wouldn't be, as it will still be their own choice.

    3. Re:tough transitions by GooberToo · · Score: 5, Insightful

      For whatever reason, people fail to understand python natively supports parallel installs. Furthermore, since python's preferred script magic is "#!/bin/env python", rather than, "#!/bin/python", the executing script will use the python that it finds in your path. Additionally, you can also tie python to a specific version as "python2.5". Want a different python? Change your path. A script requires a specific version of python? Change the script to require it. It's one line and trivial. It's at the top of the file, so there's no hunting even.

      New python releases only pose problems for the uninitiated, the ignorant, or the dumb.

    4. Re:tough transitions by jgrahn · · Score: 3, Insightful

      For whatever reason, people fail to understand python natively supports parallel installs. Furthermore, since python's preferred script magic is "#!/bin/env python", rather than, "#!/bin/python", the executing script will use the python that it finds in your path. Additionally, you can also tie python to a specific version as "python2.5". Want a different python? Change your path. A script requires a specific version of python? Change the script to require it. It's one line and trivial. It's at the top of the file, so there's no hunting even.

      Changing my path is not practical. It's too broad. I'd have to write a shell script wrapper for the application which did 'env PATH=new_python:$PATH the_real_application "$*"' or something. And it's not just me; I'd have to communicate this to all other users of the system somehow. And changing one line of a script is not trivial, if I'm not root.

      All this may seem like minor things, but it adds up. And no other good language puts me in situations like that.

      New python releases only pose problems for the uninitiated, the ignorant, or the dumb.

      Or those of us who have been around for a while, and seen innocent backwards-incompatible changes become maintenance nightmares ... Ok, maybe not a nightmare in this case, but an inconvenience and annoyance which will keep being inconvenient and annoying for years, until the last Python 2.x dependency goes away.

      The best way to judge this would probably be to look at what Linux distributions like Debian want to do about Python 3.0. They ship one Python as the default (2.4 currently, for Debian) but provide others too. I bet even a change from 2.4 to 2.5 is a major migration for them.

  3. Re:Not sure about this one by jeremiahstanley · · Score: 5, Insightful

    Because the development cycle is longer than that for derivative projects. Imagine if you could have a cycled and tested app that was ready from day 0...

  4. What's new by ChienAndalu · · Score: 5, Informative

    Here are the changes.
    I really have to check out the multiprocessing package. Too bad that I have to wait for the print function and the new division handling.

  5. Cut the crap. by Anonymous Coward · · Score: 5, Interesting

    These changes are NOT earth-shattering. 2.6 is mostly just going to add a few new features, most important being the with statement. Most code written using Python idioms will be fine under 2.6 and 3.0. Now, if you tried to write Java-esque or C-esque code under Python, you might run into issues. Even then, I doubt it. They've been deprecating features for awhile, and 3.0 is probably the point at which they'll be yanked...you've only had a year or two of DeprecationWarnings.

    I'm not sure why people whine about a language evolving. Retain backwards compatibility to a fault and you end up with C++, which is crippled by C-isms. You either know your code well enough that you could make the small incremental changes along the way, or you simply don't upgrade.

    Python most needs sane standard libraries. It is far too much of a "let's throw this in there" with three different naming conventions and no package organization. It is a shame, because the language itself is pretty powerful in the right hands.

    1. Re:Cut the crap. by slimjim8094 · · Score: 3, Insightful

      So don't use Python 3.0. If it's critical, you're not upgrading from a known working base anyways, right? And if it's not, this will hold your hand.

      --
      I have developed a truly marvelous proof of this comment, which this signature is too narrow to contain.
  6. Really? by Peaker · · Score: 5, Insightful

    What Python features broke for you between minor releases?

    I find it pretty hard to believe any Python user would actually switch to Perl, and stick to it.

    You sir, are probably making this story up :-)

  7. String f**k up by spitzak · · Score: 3, Interesting

    Reading the release, they have decided to really push 16-bit strings (they call this "Unicode" but it really is what is called UTF-16). I think this is a serious mistake.

    The proper solution is to use 8-bit strings, but any functions that care (such as I/O) should treat them as being UTF-8. Most functions do not care and thus the treatment of "Unicode" and "bytes" are the same.

    The problem with UTF-16 is you cannot losslessly convert a string that *might* be UTF-8 to UTF-16 and then back again. This is because any illegal UTF-8 byte sequences will be lost or altered. This is a MAJOR problem for code that wants to process data that is likely to be text but must not be altered under any circumstances, in effect such programs are forced to be ASCII-only, even though UTF-8 is purposly designed so that such programs could display all the Unicode characters. Note that bad UTF-16 (ie with mismatched surrogate pairs) can be losslessly converted to UTF-8 and back.

    This has been a real pain so far in our use of Python, and I am quite alarmed to see that they are changing the meaning of plain quotes in 3.0 to "Unicode". This is really a serious step backwards, as we will be forced to tell anybody using our system to put 'b' before all their string constants and I suspect there will be a lot less automatic conversion of these strings to unicode when we want to display them. Note that Qt is also causing a lot of trouble here too.

    1. Re:String f**k up by Animats · · Score: 4, Informative

      The problem is that there are three kinds of string-like objects in Python: UTF-16 strings, ASCII strings, and uninterpreted arrays of 8-bit bytes. Python 2.5 sort of supports all 3, with "array of bytes" the least well supported. Since this is a language without declarations, the semantics of this gets messy.

      The most common problem was that functions like ".read()" yielded strings, not arrays of bytes. This follows C standard library semantics, but is a bad fit to Python. In 3.0, ".read()" yields an array of bytes, not a string. If the data read is to be converted to a string, "decode" is required. That's the right answer.

      This is consistent with modern thinking about data representation. Consider SQL, which makes a similar distinction between "TEXT" and "BLOB".

    2. Re:String f**k up by John+Millikin · · Score: 4, Informative
      Spoken like somebody that's never had to deal with encoding issues. Using UTF-8 internally is fine, but exposing it to the programmer is insane and error-prone. And if the programmer then proceeds to manipulate that raw byte buffer as a string, he's an idiot.

      The proper solution is to use 8-bit strings, but any functions that care (such as I/O) should treat them as being UTF-8. Most functions do not care and thus the treatment of "Unicode" and "bytes" are the same.

      You might not be aware of this, but computers are used for more than just transmitting text. I don't want my binary streams being rewritten to gibberish because some I/O routine was written to be too clever. Furthermore, not every system uses UTF-8. Some may even need to send data over a *gasp* network! Good luck getting every other computer in the world to start using UTF-8 immediately.

      The problem with UTF-16 is you cannot losslessly convert a string that *might* be UTF-8 to UTF-16 and then back again. This is because any illegal UTF-8 byte sequences will be lost or altered.

      If you try to convert bytes that aren't in UTF-8 using a UTF-8 codec, an error will be raised. This behavior is proper -- if you don't know what format your input is in, there's no way to perform text-based operations on it.

      This has been a real pain so far in our use of Python, and I am quite alarmed to see that they are changing the meaning of plain quotes in 3.0 to "Unicode".

      Every developer I know uses Unicode strings already. The new behavior is just one less character to type in front of literals.

      This is really a serious step backwards, as we will be forced to tell anybody using our system to put 'b' before all their string constants

      Otherwise said as: "We're too stupid to fix the glaring encoding errors in our product, so we'll just use bytes everywhere and pretend it's all working". Also, Unicode strings in Python are implemented with either UTF-16 or UCS-4 depending on platform.

    3. Re:String f**k up by belmolis · · Score: 4, Informative

      Python does not use UTF-16 strings; it uses UCS-2 strings. The difference is that in UCS-2, every character is represented by exactly two bytes, while in UTF-16, some characters, those outside Plane 0, are represented by two "surrogate" pairs, totaling four bytes. UCS-2 does not provide any representation for characters outside the BMP. In other words, UCS-2 is a straightforward fixed length encoding, while UTF-16 is a more complex variable-length encoding.

      Python can in fact use either of two internal representations for text: UCS-2 or UTF-32 = UCS-4. If you give the option --enable-unicode=ucs4 to configure when building Python, you will get a Python that supports all of Unicode rather than just the BMP.

    4. Re:String f**k up by spitzak · · Score: 3, Insightful

      I think the lesson is that there is ONLY byte sequences.

      The fact that some code can interpret that byte sequence and draw something on the screen that the user thinks of as "text" is completely irrelevant and should not be a fundemental datatype of a programming language. This should be part of the code that draws the text. Imagine if every other type of data, such as image pixels, or sound samples, had a different IO routine and you could never read a file with the wrong routine because the conversion was lossy.

      The real problem is that everybody's mind has been polluted by decades of ASCII where there was no difference between characters and bytes. All I can suggest is to try to think of text as words or sentences. Nobody would suggest that it would be good to make all words use the same amount of storage, or that it is important that you be unable to split a string except at word boundaries. But there has been so much use of ASCII that people think this is important for "characters".

      I also believe there is a serious politically-correctness problem. Otherwise logical programmers are consumed with guilt because Americans get the "better" short encodings, and therefore feel they have to punish themselves by making the conversion to i18n as painful as possible so that Americans have just as much trouble as anybody else. The fact that they have actually made I18N far harder for everybody and thus actually discouraged it is the ironic result of this guilt.

    5. Re:String f**k up by belmolis · · Score: 4, Informative

      In fact I am better informed than you are. When not compiled to use UCS-4, Python uses what is properly called UCS-2, with half-baked extensions for treating it as UTF-16. Certain functions know about surrogate pairs, such as those that convert between UTF-8 and the internal representation. However, such basic functions as len do not know about surrogate pairs. Try giving a character outside the BMP as the argument to len. It will return 2, not 1.

    6. Re:String f**k up by tazzzzz · · Score: 4, Informative

      Reading the release, they have decided to really push 16-bit strings (they call this "Unicode" but it really is what is called UTF-16). I think this is a serious mistake.

      The proper solution is to use 8-bit strings, but any functions that care (such as I/O) should treat them as being UTF-8. Most functions do not care and thus the treatment of "Unicode" and "bytes" are the same.

      I'm going to try once more, slightly differently. Two other people apparently have tried and failed.

      Python 3.0's handling of strings is basically the same as Java's, because it has proven to work quite well there.

      For webapps, and the rules may be a little different on the desktop, "best practices" in Python for some time have been that you use unicode objects everywhere internally when you are representing text. When you hit a boundary (a file on disk, the net), you encode that unicode string into whatever encoding makes sense (often UTF-8). So far, so good, I hope?

      Python's internal representation of unicode objects is only relevant in that you need it to support whatever code points you care about. I don't think there are any code points that you can represent in UTF-8 that Python will screw up after decoding/encoding. I'm sure there are many people who would be interested to see such a test case.

      If you have a bunch of bytes that *might* be UTF-8, you're screwed. "process data that is likely to be text but must not be altered"? What do you mean by text? 7-bit ASCII? UTF-8? And where is the text coming from? Unless you tell Python the encoding of the file, you're going to get bytes out, not unicode objects.

      The whole point is that Python unicode objects know how to represent code points. If you have get a set of bytes from somewhere you *have* to know what encoding it is in order to be able to treat it as a bunch of text characters. Python unicode objects will not be "bad UTF-16". How they're stored is not generally important. What's important is that Python internally keeps track of the code points and will either successfully convert to whatever encoded sequence of bytes you want or it will raise an exception because the encoding you've chosen doesn't have one of the characters in your string.

      Python 3.0 makes this all clearer. When you talk about a "string", you're talking about a bunch of unicode characters. Anything else is a collection of bytes.

      By the way, you can specify what encoding a Python source file is in so that your string literals are all properly decoded.

      For further reading...
      http://www.joelonsoftware.com/articles/Unicode.html

    7. Re:String f**k up by tazzzzz · · Score: 3, Informative

      Actually, this has been explicit in Python for some time. In Python 2.x, "string" objects are byte sequences and "unicode" objects are character sequences.

      What changes in Python 3.0 is that "unicode" objects have been renamed "string" and "string" objects have been renamed "bytes". So, not only is it explicit, but the naming makes more sense.

      The other related change is that string literals in your code are interpreted as Python 3.0 "string" objects ("unicode" in Python 2.x terminology), whereas previously you had to stick a 'u' in front of the string to get that behavior. And you can indeed specify the encoding of your source files, which is nothing new.

      All of this to say, you're right on the money and Python is already in the spot you describe as "better off".

  8. Re:Not sure about this one by arevos · · Score: 4, Informative

    And if it's like some other languages you might have a long time to wait before 3.0.

    Given that the first release candidate of Python 3.0 is already out, I doubt we'll be in for a very long wait.

  9. Re:Not sure about this one by AM088 · · Score: 3, Informative

    I think the point is that with 2.6, your old code will work but will tell you what to change. If you move to 3.0, unless you have those changes already, it just won't work.

  10. Not really by widman · · Score: 4, Interesting

    You can keep your code compatible with both at the same time. Deprecated features are trivial to rewrite in most cases. There are even tools for this.

  11. Module support for 3.0 is a long way off by Animats · · Score: 3, Insightful

    Many essential third party libraries need to be converted for Python 3.0. I need M2Crypto (SSL support) and MySQLdb (MySQL support), neither of which is ready for Python 3.0, and neither of which has been updated in the last year or so.

    My guess is that it will be three years before stock mainstream Linux distros come with Python 3.0 and a set of libraries that work with it.

  12. Anthony Baxter on Python 2.6 and 3.0 by xixax · · Score: 3, Informative

    Anthony Baxter gave a pretty good talk on the implications at LCA 2008 earlier this year.

    http://video.google.com/videoplay?docid=4264641260805367198&hl=en

    --
    "Everything is adjustable, provided you have the right tools"
  13. Old news... by pdxp · · Score: 4, Interesting

    3.0rc1 (beta) is already available and has been for some time now. The advantage of 2.6 is not as much its backward-compatibility but its ability to tell you exactly what needs to change (via runtime warnings) for 3.0 without actually breaking your code. I've been using both for months now, so this article isn't exactly hot news.

  14. Re:Not sure about this one by tazzzzz · · Score: 5, Informative

    ...which is why some heavy python users, myself included, aren't going to use 2.6 or 3.0. I have huge amounts of python in operation, and the very last thing I'm going to do is break any of it with an incompatible language that happens to slightly resemble python (no matter who wrote it, and no matter what they call it, it isn't python if it can't run mundane python code.)

    "slightly resemble python"? Python 3.0 code looks just like the Python that's been around for years. Maybe there's some handy new syntax (with), but it's still Python.

    This is not about fundamentally changing Python. This is about cleaning up warts, some of which have been around since Python 1.x.

    If you're going to modify a language, you *must* do it in a compatible manner, otherwise what you're doing is making a new language that will require an entirely new community. Names notwithstanding, and resemblance beyond incompatibilities notwithstanding.

    From what I've seen, the Python devs have put together about the best possible migration path while still actually making the changes that need to be made.

    Here's the picture, in case it's not clear: Python 2.6 is just as backwards compatible as the other 2.x releases. Which is to say that porting from 2.5 to 2.6 is pretty trivial. I'd expect any actively used and maintained library to be 2.6 compatible within weeks (and a great many probably didn't break at all).

    2.6 lets you use many of 3.0's features that don't break compatibility (and there are many). It also has a warnings mode to help you spot 3.0 incompatible code. And it lets you selectively turn on 3.0 features within a module.

    Want to start using the new print function?

    from __future__ import print_fiunction

    Voila! The print keyword goes away and you have the new print function. Certainly bits of new Python 3.0 syntax work now as well:

    try:
            1/0
    except ZeroDivisionError as e:
            pass

    The "as e" bit is new.

    Finally, there's actually a "2to3" tool that makes many of the changes in an automated fashion.

    The single biggest change from a compatibility standpoint is that "foo" is a unicode object in 3.0 and a string (set of bytes) in 2.x. You can even prepare for that switch:

    from __future__ import unicode_literals

    foo = "foo" # this will be unicode
    bar = b"bar" # this is a set of bytes
    unibar = bar.decode("utf-8") # get a unicode from the bytes

    They have put *a lot* of thought into how to make this transition. People will gradually shift to 2.6, just as they did with 2.5. And, over time, they will change to using the new features. They'll probably upgrade to 2.7 (yes, there will be one), and use the new features even more. And eventually their code will just be 3.0 code and the switch will be a no brainer.

  15. Re:i like python by AlXtreme · · Score: 4, Informative

    (Mind you, there online documentation could be better - PHP's site for example, is so much friendlier).

    They're actually hard at work on that problem too. In addition to Python 2.6 being released, the Python documentation is now generated using Sphinx. See for example the new tutorial output. Big WTF the first time I saw it, but it's a decent improvement with more in the pipeline.

    --
    This sig is intentionally left blank