Slashdot Mirror


Python Family Gets a Triplet Of Updates

The Python developers have been busy this weekend, releasing three new versions at different points on the Python continuum: 2.7.4 (a 2.7 series bugfix release), 3.2.4 (what's new), and production releases 3.3.1. Here's what's new in 3.3.1.

196 comments

  1. Re:Yay! by martica · · Score: 1, Interesting

    Are you proposing that there are languages that could be considered not to be shitty?

  2. The Python Launcher by Anonymous Coward · · Score: 2, Informative

    So, say i want to make myself a python 3 program in the form of an exe file.

    How could i use the "PEP 397: Python Launcher for Windows" in an easy way then?
    i would need to make sure python+ the python launcher were installed as well as my program files.

    or would it be easier to go with the py2exe solution?

    1. Re: The Python Launcher by Anonymous Coward · · Score: 1

      except that py2exe doesnt support py3k.

    2. Re: The Python Launcher by Anonymous Coward · · Score: 0

      hm, didn't know that. hm, well the question is still valid though.

    3. Re:The Python Launcher by Anonymous Coward · · Score: 5, Informative

      PEP 397 doesn't do what you think it does. It's a program that gets installed when you install Python. It parses a .py file's shebang line and uses that to determine which installed version of Python gets called. It's not designed to bundle Python programs into standalone executables. For that, you'll want to download something like http://cx-freeze.sourceforge.net/ (which works on Python 3, unlike Py2EXE)

    4. Re: The Python Launcher by lattyware · · Score: 2

      Yeah, for 3.x, there is cx_Freeze.

      --
      -- Lattyware (www.lattyware.co.uk)
    5. Re: The Python Launcher by Inzkeeper · · Score: 1

      cx_Freeze works for 2.x as well.

    6. Re: The Python Launcher by lattyware · · Score: 1

      Indeed, and for many platforms - I worded my post poorly - I meant to say that unlike py2exe, cx_Freeze works with 3.x as well as 2.x.

      --
      -- Lattyware (www.lattyware.co.uk)
    7. Re:The Python Launcher by Foresto · · Score: 1

      Here's a list of tools that will help with that:

              Py2exe
              PyInstaller
              cx_Freeze
              bbfreeze
              py2app

    8. Re:The Python Launcher by Anonymous Coward · · Score: 0

      well does it make .py file executable then? or do you have to run this program with your .py file as a variable?

    9. Re:The Python Launcher by shutdown+-p+now · · Score: 1

      It doesn't make .py files executable (there's no way to make a random file executable in Windows unless it's an actual executable). However, Python interpreter associated .py files with the launcher, so that they're run when double clicked in Explorer and anything else that hooks into OS program association settings. From command line, though, you still need to type the usual "python foo.py".

  3. Re:Add curly braces and you have C by Anonymous Coward · · Score: 0

    The only real difference between Python and C is the curly braces, and a different library, and a whole new set of bugs. And such stupendous stupidities such as isoweekday() returning a range of 1..7 and sendmail() not being able to automatically construct one's 'From' address.

    What's wrong with isoweekday and what the fuck are you on about with sendmail?

  4. Re:Yay! by Anonymous Coward · · Score: 0

    It's called C, foo.

  5. Where is the 64 bit dongle support by PhamNguyen · · Score: 5, Funny

    Python needs to support larger dongles.

    1. Re:Where is the 64 bit dongle support by Anonymous Coward · · Score: 0

      I know, with a name like Python, you're expecting at least 9 inches.

    2. Re:Where is the 64 bit dongle support by Anonymous Coward · · Score: 1

      Seconded. Anything to keep Joan of Arc wannabes out.

  6. 2.7.4 by JanneM · · Score: 5, Interesting

    Happy to see another bugfix release for 2.7. Like it or not, 2.7 is going to remain the main or only version of Python for years to come at many installations. Which means tools that depend on Python at such places also only or mostly support the 2.7 series.

    The developers for the tool I use have just only begun discussing the possibility of perhaps beginning support for Python 3 in addition to the 2.5-2.7 versions for unspecified later versions; but only if it is possible to do without too much code duplication and maintenance efort.

    --
    Trust the Computer. The Computer is your friend.
    1. Re:2.7.4 by guacamole · · Score: 2, Interesting

      The slow speed of Python 3 adoption is surprising. I just started learning python last year, and it seems like some porting effort between Python 2 and 3 may be necessary but the changes between 2 and 3 are pretty small.

    2. Re:2.7.4 by mrvan · · Score: 3, Informative

      I think the current trend in the community is to write a single codebase that support both 2.7 and 3.x. In python 2.x you can "from __future__ import" a lot of the 3.x syntax changes, making it possible to have a shared codebase. For example, this is how django (a major python project) is handling 3.x compatability in its latest version.

      (I guess this could be used as an argument that breaking backwards compatability was not really needed and the transition could have been more gradual, but I don't know enough of the specifics on this case...)

    3. Re:2.7.4 by Anonymous Coward · · Score: 0

      The slow speed of Python 3 adoption is surprising. I just started learning python last year, and it seems like some porting effort between Python 2 and 3 may be necessary but the changes between 2 and 3 are pretty small.

      The changes may be small but they brake virtually every existing program. So you have to maintain 2.7 for existing customer applications, or persuade them to pay you for an upgrade that brings them no benefits (yeah right). And if you have to maintain 2.7 why bother upgrading your own operation to 3? It just means you'll have to do everything twice.

      Breaking existing code is the stupidest thing one can do, and history shows that the languages that do this do not recover from the blow (as an example: VB6 to VB.NET, which still hasn't recovered even though VB.NET has been free for 8 years now, while VB6 generated lot's of cash for Microsoft).

    4. Re:2.7.4 by Anonymous Coward · · Score: 0

      Happy to see another bugfix release for 2.7. Like it or not, 2.7 is going to remain the main or only version of Python for years to come at many installations. Which means tools that depend on Python at such places also only or mostly support the 2.7 series.

      The developers for the tool I use have just only begun discussing the possibility of perhaps beginning support for Python 3 in addition to the 2.5-2.7 versions for unspecified later versions; but only if it is possible to do without too much code duplication and maintenance efort.

      Laziness, laziness... I ported few of my projects to py3k and it was pretty straighforward process. You need to pay a bit of attention to IO related calls, but in general you can be done within hours even with some larger projects.

    5. Re:2.7.4 by baijum81 · · Score: 1

      118 out of 200 most downloaded third party packages in PyPI has Python 3 support. Ref. http://python3wos.appspot.com/

    6. Re:2.7.4 by mrvan · · Score: 3, Insightful

      I think it is 'dependencies dependencies' more than laziness. Few real-world projects depend only on the stdlib, and for these projects it is necessary to wait for at least the majority of depencies to adopt 3.x before porting becomes feasible, even if the porting itself is relatively straightforward. Of course, you can fork any dependencies and port them yourself, but the whole point of not reinventing a wheel is avoiding the maintenance on said wheel...

    7. Re:2.7.4 by abe+ferlman · · Score: 1, Interesting

      Wake me up when they bring 2.x style print back. Taking away convenience features is not the way to encourage adoption, if anything they should add more.

      --
      microsoftword.mp3 - it doesn't care that they're not words...
    8. Re:2.7.4 by Anonymous Coward · · Score: 0

      It doesn't seem surprising to me at all. Python is currently in dependency hell.

    9. Re:2.7.4 by znrt · · Score: 1

      So you have to maintain 2.7 for existing customer applications, or persuade them to pay you for an upgrade that brings them no benefits

      if you are using python to build "customer applications" you're doing it wrong. your customers are doomed anyway.

      python is a beautiful language and a superb tool for dishing out quick tools or prototypes. nowadays it simply isn't stable enough for much else.

      as a quick sampling test, try upgrading python on a standard RHEL box. you'll totally screw the effing package system. this is simply not serious.

    10. Re:2.7.4 by siride · · Score: 1

      There's no value in VB.NET, though. It's not just that it's different from VB6, it's that it's not really that different from C#. You might as well just use the flagship language for CLR. Also, the CLR environment isn't quite the same as the VB6 environment and is intended to be a full platform for writing real programs, rather than a platform for RAD and one-off crap. Basically, VB.NET doesn't fill a niche that makes sense in the way that VB6 did. Personally, I'd rather they take the few good features that VB.NET has that C# doesn't, put them in C#, and then kill VB.NET. That's assuming they don't kill .NET first.

    11. Re:2.7.4 by gsnedders · · Score: 1

      On the other hand, some of us supported Python 2.5 pretty much as long as Debian Lenny was supported (until a year ago), and hence __future__ didn't contain the majority of what is needed. Quite a few major projects are only just now moving to requiring 2.6/2.7, and hence only just now making this plausible.

    12. Re:2.7.4 by Anonymous Coward · · Score: 0

      Does Google still use 2.x internally? Why did Guido leave?

    13. Re:2.7.4 by gd2shoe · · Score: 1

      as a quick sampling test, try upgrading python on a standard RHEL box. you'll totally screw the effing package system. this is simply not serious.

      And that's python's fault, and not Red Hat's?

      --
      I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
    14. Re:2.7.4 by gd2shoe · · Score: 1

      Seconded.

      I can understand taking away statements in favor of built-in functions, but that was just too darned handy to yank out from beneath us.

      (Now making parenthesis optional on some function calls might be cool... if it doesn't cause Python to become as unreadable/ambiguous as Perl! Great care would need to be taken, and I doubt they'd ever consider it.)

      --
      I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
    15. Re:2.7.4 by fnj · · Score: 1

      Happy to see another bugfix release for 2.7. Like it or not, 2.7 is going to remain the main or only version of Python for years to come at many installations.

      Python 2.7! Python 2.7? Users of all those RHEL 6 installations out there only WISH. They have Python 2.6, and they were damn glad to get that, too, since they were saddled with 2.4 for all those dreary RHEL 5 years.

    16. Re:2.7.4 by fnj · · Score: 1

      as a quick sampling test, try upgrading python on a standard RHEL box. you'll totally screw the effing package system. this is simply not serious.

      Duh. Yum uses python (it's 2.6 in the latest RHEL 6, BTW; not 2.7). It's got to use SOMETHING. If an upgrade to python changes old behavior, why is that not brain-dead language design? Not anything wrong with yum.

      Here's a hint, though. It's linux. You can do almost anything. Just download the source tarball, extract it, ./configure --prefix=/opt/mypython, make, sudo make install. All system scripts that use #!/usr/bin/python will continue to use the original, still-installed python, and you can use #!/opt/mypython/bin/python in your own scripts to use the spiffy version you want.

      You can do this even if you have one of the many sites that still use RHEL 5 (python 2.4).

    17. Re:2.7.4 by fnj · · Score: 1

      Yeah, IMHO it's brain dead language design if a script expecting python 2.6 doesn't continue to work fine with python 2.7, which is only a point upgrade.

      It's nothing you can't easily work around, though. You can leave the original python installed, and install a new version in another location. So I'm not overly exercised by what I perceive to be silly language design decisions.

      It's CERTAINLY not Redhat's fault, because python 2.6 is a mandatory package whose removal and replacement with some third party package or build-from-source is not supported.

    18. Re:2.7.4 by spitzak · · Score: 1

      It seems to me that a simple handling of a space after the first word is all that is needed. The following statement:

            a b

      is treated exactly the same as

          a(b)

      Then old print would be supported since it would turn into the new print() function. It would also let "help foo" work, which is something I keep typing wrong for some reason.

      I'm sure there is some odd interactions with the parsing of some obscure syntax that need to be figured out, but since all the obvious tests I can do produce syntax errors right now, I suspect rules can be made so this works for all the uses people want.

    19. Re:2.7.4 by spitzak · · Score: 1

      A killer with Python 3 is the stupid handling of strings.

      A string should be a sequence of byte values. If they *happen* to be UTF-8 then it "is unicode" but I should not be prevented from using my string just because it does not match a complex pattern called "valid UTF-8". Nobody in the history of computer science would ever have made a scheme where the simple act of *looking* at data that has been sucessfully read from an outside source can throw an exception, but for some reason Unicode and UTF-8 turns otherwise intelligent people, such as Guido, into the most incredible morons or idiot savants.

      At least Python 2 does not mangle the strings when you do this, though you have to be careful when you actually want to print them. Python 3 actually makes it impossible to store invalid UTF-8 in a "string". This is counter-productive, as we are forced to use byte arrays for all strings, including ones that are valid UTF-8. That is NOT encouraging use of Unicode, it is going backwards!

    20. Re:2.7.4 by znrt · · Score: 1

      of course, instead of upgrading you can have a separate python install. the point of the example was just to show how python's immaturity makes it a bad choice to build upon when long term compatibility is of any concern. it's no surprise as long term compatibility isn't part of python's philosophy at all. python is still a great swiss army knife if it isn't.

      one could argue it is a bad example because it would be RH/Yum's fault to rely on such a tool for this purpose, and then again RHEL is a stable distro, not concerned with being able to play well with whatever the future may bring, but with providing a very stable and concrete environment for given specs, so they just expect you to not upgrade and that's it. well, it's the example that came to mind, it turns out I don't know of much relevant software stacks that choose to rely on python. :)

    21. Re:2.7.4 by Anonymous Coward · · Score: 0

      A killer with Python 3 is the stupid handling of strings.

      A string should be a sequence of byte values. If they *happen* to be UTF-8 then it "is unicode" but I should not be prevented from using my string just because it does not match a complex pattern called "valid UTF-8".

      How do you 'use' a string if you do not know the encoding? If your idea of UTF-8 is "I'll do the same as for ASCII, but now I can claim to support internationalization", then you end up with programs that only work in the US. You can't chop UTF-8 up randomly and still reliably get UTF-8 output. A UTF-8 character is not identically the same as a byte.

      A data type is defined by a set of valid values and a set of valid operations on those values. A data type is not a particular underlying implementation such as 'a sequence of byte values', and anyway Python 3 strings are not always UTF-8 for all versions and platforms. The fact that ASCII happens to use 1 byte per character means that some languages - such as C - treated an ASCII character and a byte of unknown datatype as equivalent. However the semantically valid operations on those datatypes are not the same and this only works 'by accident'. With Unicode encoded as UTF-8 this accidental equivalence cannot be maintained, and pretending that it can by ignoring the difference between 'a single byte' and 'a variable number of bytes representing a character encoded as UTF-8' is stupid.

      At least Python 2 does not mangle the strings when you do this, though you have to be careful when you actually want to print them. Python 3 actually makes it impossible to store invalid UTF-8 in a "string". This is counter-productive, as we are forced to use byte arrays for all strings, including ones that are valid UTF-8. That is NOT encouraging use of Unicode, it is going backwards!

      If your data doesn't conform to any known encoding scheme then you don't know what it is supposed to be anyway (eg you cannot actually tell what characters might be being represented by some of the bytes unless you are simply assuming that everything is ASCII), and the best you can do is treat it as a blob of unknown bytes. This is a problem with your data.

    22. Re:2.7.4 by spitzak · · Score: 1

      If you have a string of bytes that MAY be UTF-8, it can be used in an incredible number of ways. For instance it can be the name of a file. It can be data in that file. It can be copied from one file to another. It can be passed to some code that checks to see if it is UTF-8.

      You seem to be under the impression that somehow I want ASCII. That is absolutely false, and the problem is that this refusal to handle invalid sequences is by far the biggest thing FORCING the use of ASCII (or other 8-bit codes where there are no "errors" so you can be pretty safe using the strings for filenames). Your belief that somehow the string cannot be used unless it "conforms to an encoding scheme" is one of the biggest impediments to I18N. No other area would somebody say that blocks of data must conform to a regexp JUST TO BE COPIED!!!! That is insane, but that seems to be what we get when morons get ahold of Unicode.

    23. Re:2.7.4 by spitzak · · Score: 1

      Concrete example: You have a filesystem that stores filenames as strings of bytes and any byte sequence is supported. You want to encourage use of UTF-8 for these filenames. Figure out how to write something in Python that can rename a file with a "bad" filename (ie invalid UTF-8) to have a "good" filename that is valid UTF-8. You must use the Unicode api.

      You will quickly find that this is impossible. Instead you have to use some other API, one THAT IS NOT UNICODE. That is absolutely the wrong direction. Now you need two versions of your filename renamer. Perhaps you need four versions if you want to be able to pass Unicode to any name that is valid Unicode, rather than being forced to use raw bytes.

      If Python just did not complain and passed invalid UTF-8 around, we could stop using other encodings. With the 3.3 strings with a UTF-8 option this is entirely possible and cleanly done.

    24. Re:2.7.4 by JanneM · · Score: 1

      At least one installation I know about still has only 2.5 and will likely never get updated to anything newer. This is not an uncommon situation, so to the tools I use have to support 2.5 upwards too (2.4 support just disappeared last year). Which means any solution for supporting Python 3 would need to allow for that.

      --
      Trust the Computer. The Computer is your friend.
    25. Re:2.7.4 by gd2shoe · · Score: 1

      That's exactly what I was driving at. It's simple. It's clear, and it's far easier to type.

      The problem is avoiding garbage like this:
      dir asdf qwer, wert poiu, gfds bvcx

      Do you parse this as:
      dir( asdf(qwer) ), wert(poiu), gfds(bvcx)

      Or as:
      dir(asdf( qwer, wert( poiu,gfds(bvcx) ) ))

      It's not immediately obvious or clear. Thus, great care would need to be taken in designing the change. First word only (first alnum token) would avoid problems like this, but wouldn't be real elegant.

      --
      I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
    26. Re:2.7.4 by Anonymous Coward · · Score: 0

      Concrete example: You have a filesystem that stores filenames as strings of bytes and any byte sequence is supported. You want to encourage use of UTF-8 for these filenames. Figure out how to write something in Python that can rename a file with a "bad" filename (ie invalid UTF-8) to have a "good" filename that is valid UTF-8. You must use the Unicode api.

      You will quickly find that this is impossible. Instead you have to use some other API, one THAT IS NOT UNICODE. That is absolutely the wrong direction. Now you need two versions of your filename renamer. Perhaps you need four versions if you want to be able to pass Unicode to any name that is valid Unicode, rather than being forced to use raw bytes.

      This makes no sense. If you have filenames that are not necessarily valid UTF-8, then you can't expect to reliably interpret their semantics in terms of Unicode characters. The best you can do is treat them as an opaque sequence of bytes with no inherent meaning - a 'byte string' - which is what Python will encourage you to do. This is not a Python limitation, this is just logic. You seem to want a string type that will transparently switch between a 'just a bunch of bytes' and 'a well-defined sequence of Unicode characters' depending on whether the content happens to conform to the UTF-8 rules, but this can't be made to work reliably because there is no way to transparently and systematically interpret any invalid sequence as valid Unicode. Sure, it's easy to come up with possible protocols such as skipping unreadable sections, but this is not a transparent conversion. In some cases (such as BIG5 strings) you may want other conversions.

      You don't need four versions of your file renamer, you need one version that reads the names as opaque byte structures without assuming they meet UTF-8 rules, then decoding them in some way to valid Unicode strings with explicit handling of 'broken' filenames, then doing whatever renaming you need in Unicode and then encoding the Unicode strings into byte structures using valid UTF-8. Wanting to skip the decoding and encoding steps assumes that they are automatable processes, which long experience has shown they are not.

      If Python just did not complain and passed invalid UTF-8 around, we could stop using other encodings. With the 3.3 strings with a UTF-8 option this is entirely possible and cleanly done.

      All versions of Python 3 DO pass invalid UTF-8 around provided you treat it as an opaque byte string. Python 3 makes a clear distinction between strings which are collections of abstract Unicode characters (not in any particular representation such as UTF-8 or UTF-16) and string representations which are bytes. UTF-8 by definition is about bytes, so UTF-8 cannot be the abstraction used for strings, which is about characters. The internal representation changes in 3.3 are separate from the language datatype abstraction.

    27. Re:2.7.4 by spitzak · · Score: 1

      A UTF-8 string with an "error" in it is NOT a "not an UTF-8 string object". It absolutely is a UTF-8 string but it has an error in it. This should be plenty obvious with simple transforms: append an error to a UTF-8 string can't possibly change it's type to "not UTF-8" because if you take a substring near the start it suddenly IS UTF-8 again, which is not true of a byte array which would remain a byte array!

      More to the point, if there is a mistake it is MUCH more useful to display as much of the string as possible as UTF-8, with error indicators for the bad bytes. Suddenly turning ALL your text into garbage because of a single byte error is incredibly stupid!

      And there is a very clear and unambiguous interpretation of invalid UTF-8. The Unicode standard even defines some explicit rules, in particular that an error must not consume any of the next sequence that forms a valid code point. Python even defines an error much more explicitly in some of the converters, an error is one byte, with the parsing continuing by trying the next byte. It even has a somewhat-data-preserving conversion to UTF-16 where it turns these 127 possible errors into 127 different "errors" in UTF-16 (unparied surrogates in the range 0xDC80..0xDCFF). This was done because at least a few people working on Python realized that the strings are USELESS if we cannot defer checking for errors until after our data manipulations are done.

      What Python should do in 3.3 is support arbitrary byte arrays as UTF-8. This can be done in the 3.3 version by allowing arbitrary bytes to be in the UTF-8 setting, and adding an iterator for the string so you can look at each code point without copying it to a UTF32 array, and the iterator can also return more information that a code point number, such as the fact that it is currently at a UTF-8 or UTF-16 error. Iterators would also allow you to visit the code points in various normalization forms which would help a lot in avoiding string copying.

    28. Re:2.7.4 by JanneM · · Score: 1

      So, out of curiosity, how do you handle strings in other encodings, such as Shift-JS or EUC-JP?

      --
      Trust the Computer. The Computer is your friend.
    29. Re:2.7.4 by spitzak · · Score: 1

      You need to know that strings in other encodings are in those encodings, however allowing the string to contain an arbitrary sequence of bytes, rather than limiting it to valid UTF-8, will allow you to defer the decision about handling the alternative encoding until usage of the string, rather than having to do it during storage and copying. This is usually a lot easier and failures are more predictable and understandable.

      It is also extremely easy to distinguish UTF-8 from another encoding, because a huge majority of patterns with the high bit set are not valid UTF-8 (96% for 2 byte sequences, and higher for higher numbers of bytes). So if there is only one alternative encoding it is probably very possible to detect this at run time without an encoding id, as it will have a greater number of errors than correctly-encoded non-ASCII UTF-8. This does not help if there is more than one non-UTF-8 as they are difficult to distinguish from each other.

    30. Re:2.7.4 by shutdown+-p+now · · Score: 1

      If you want to work with byte arrays, then use byte arrays. Python API gives you that ability.

      Strings are more than byte arrays. By definition, they are meaningful sequences of human characters, not bytes, and they provide operations that are only sensible on such sequences (like, say, length in characters). If you're using them to store random data which may or may not be a valid string, you're doing it wrong. It's like complaining that you can't store random bit patterns in a double - well duh, that's not what it is for!

      So, yes, if you want to handle UTF-8 in a pass-through manner, you have to use bytes. And if you want to deal with it as a string (for which it has to be valid UTF-8), you have to convert it to a string. Doing so forces programmer to actually think about what kind of data he is dealing with, and write code appropriately. Languages that mix up byte arrays and strings tend to have horrible localization story for real-world apps, because developers from US who are clueless about character sets beyond ASCII do all kinds of silly things, as any person whose native language isn't English (and encoding for which is not Latin-1 or subset) can testify.

    31. Re:2.7.4 by shutdown+-p+now · · Score: 1

      If you have a string of bytes that MAY be UTF-8, it can be used in an incredible number of ways. For instance it can be the name of a file.

      How are you going to encode invalid UTF-8 into an UTF-16 string to store on NTFS and UDF (and, I believe, also HFS+)?

    32. Re:2.7.4 by shutdown+-p+now · · Score: 1

      A UTF-8 string with an "error" in it is NOT a "not an UTF-8 string object". It absolutely is a UTF-8 string but it has an error in it. This should be plenty obvious with simple transforms: append an error to a UTF-8 string can't possibly change it's type to "not UTF-8" because if you take a substring near the start it suddenly IS UTF-8 again, which is not true of a byte array which would remain a byte array!

      By this argument, a sequence of bytes representing a 64-bit double will remain a double "with errors" no matter how many bytes you append (or prepend) to it - after all, if you truncate enough bytes, it's a valid double again!

      Your logic is broken. Appending "an error" to an UTF-8 string should not be legal in the first place, because the types are not compatible. If you want the types to be compatible, you represent both the string and the "error" as byte arrays, and then concatenating them yields another byte array (which, of course, is no longer a string).

      Even if you do allow to append values of two different types like that, there's no requirement that the resulting value is of the type of either operand in particular - to give an obvious example, adding a real number and an imaginary number produces a complex number. Complex numbers are a superset - any real number is also a complex number, and any imaginary number is a complex number, but not all complex numbers are real or imaginary. Similarly, here, any string is a byte array, but not any byte array is a string.

    33. Re:2.7.4 by spitzak · · Score: 1

      How to encode invalid UTF-8 into UTF-16 for NTFS filenames

      Python itself is the originator of the most popular mapping that preserves errors. In this case the error bytes (which are only going to have the high bit set and thus in the range 0x80-0xFF) are mapped to the UTF-16 code units 0xDC80..0xDCFF. These are unpaired surrogates and thus technically the UTF-16 string is invalid as well, which is a nice property.

      These are not unique mappings, as most UTF-8 to UTF-16 encoders will convert the 3-byte UTF-8 encoding of these code points to the same code. The Unicode standard suggests that these 3-byte sequences be considered errors, but I am not convinced this is a good idea, as it will break data stored by CESU (broken UTF-8 encoders that did not decode UTF-16 to code points before encoding UTF-8). Since NTFS already maps more than one string to the same result (for instance '\' and '/' map to the same file, and trailing spaces are ignored), I'm not sure if the multiple mapping is a real problem. You are correct that the only 100% proper solution is to preserve the original byte values, but there are a zillion other problems with Windows file systems and this seems pretty minor.

      Somewhat more dangerous but perhaps more user-friendly is to convert they error bytes to the equivalent Unicode character from the CP1252 code page. This would preserve compatibility with the existing 1-byte api in Windows except in the incredibly rare occurance that a CP1252 sequence of characters matches a valid UTF-8 sequence.

      HFS+ does not use UTF-16, it uses UTF-8. It has problems with enforcing normalization, however. This is similar to all the bugs caused by case-folding on older Windows file systems (they should instead match files using normalization, but preserve the normalization, similar to how modern Windows file systems preserve case).

    34. Re:2.7.4 by spitzak · · Score: 1

      You seem to have forgotten that a string is an ARRAY. Adding a multiple of 64 bits of garbage to an array of 64 bit doubles, you will have a larger array of 64 bit doubles (in case you can't figure out how to do this in your favorite language, imagine binary i/o of a data structure containing such an array). I am unaware of any language where it turns into a "not a 64 bit double array" because one of the doubles has an invalid pattern. In fact all I/O in modern systems prints "NaN" or even the entire bit pattern in hex for invalid numbers, because the programmers realize that this "bad" data is important and has to be preserved!

      So your attempt to find an example is completely wrong and further proves my point.

    35. Re:2.7.4 by spitzak · · Score: 1

      Yes, you can use byte arrays everywhere in every api. But that kind of defeats the purpose of Python strings, since you are no longer using them!

      And a 1000-byte UTF-8 string with a single error in is 99.9% UTF-8 characters. In fact I may not be able to figure out it is this magical "not UTF-8" without a very expensive scan! That is incredibly stupid.

      You know your precious UTF-16 can have errors, too? Did you notice that nothing throws exceptions on them? Think hard about why this is, rather than spouting idiocy. UTF-8 is not different than UTF-16 and is in no way less of "Unicode" despite the absolute hatred spouted by some people and the attempts to sabotage it by this insisitence that "errors" are somehow not part of it.

    36. Re:2.7.4 by shutdown+-p+now · · Score: 1

      A string is text. Its underlying representation may be an array of bytes (or codepoints), but that's not what a string is.

      A 64-bit float is similarly backed by an array of bits, but we do not treat it identically to an array of bits, because it is a higher level of abstraction. So the example still stands. Your reference to array of floats is rather irrelevant, because "only contains valid floats" is not part of the semantics of an array.

    37. Re:2.7.4 by spitzak · · Score: 1

      Look up IEEE NaN and then say something intelligent about arrays of floats. They can contain invalid sequences of bits and yet all systems I am aware of do not suddenly make the arrays "not arrays of floats" and do not make them unprintable and do not make them impossible to use as a source for math operations.

    38. Re:2.7.4 by shutdown+-p+now · · Score: 1

      Yes, you can use byte arrays everywhere in every api. But that kind of defeats the purpose of Python strings, since you are no longer using them!

      You are using them wherever you actually have to treat them as strings. Say, if you want to display one to the user.

      You know your precious UTF-16 can have errors, too? Did you notice that nothing throws exceptions on them? Think hard about why this is, rather than spouting idiocy. UTF-8 is not different than UTF-16 and is in no way less of "Unicode" despite the absolute hatred spouted by some people and the attempts to sabotage it by this insisitence that "errors" are somehow not part of it.

      You miss the point. This is not at all an encoding issue, it is a semantics issue. Even if we internally encode strings as UTF-8, it should still be a distinct type from a byte array, because it is a different thing. One could argue that there should be an easy way to obtain the underlying byte array representation from any string (though that leaks abstraction by exposing the encoding), but that is a different question. I do agree that UTF-8 generally makes more sense for both on-disk and in-memory representation.

      The other point is that a string should really be a view on top of a byte array - i.e. you should be able to take any random byte array, and say "treat this as a string in encoding X" without any copying of the data, and get a projected sequence of code points. At which point you get validation etc. But, again, you shouldn't just be able to pass byte arrays to APIs that expect a string because they need to treat it as a string. It is a potentially faulty conversion so it should be explicit.

    39. Re:2.7.4 by shutdown+-p+now · · Score: 1

      I was not talking about arrays of floats at all. I was talking about adding bits to an individual float.

      Furthermore, float itself is just a random example. If you really insist, let's take any other type for which a random bit pattern is not a valid representation. Which would be pretty much any C++ STL class.

    40. Re:2.7.4 by spitzak · · Score: 1

      I think you may be arguing the same thing I am.

      I want to put an arbitrary array of bytes into a "string" and DEFER the exceptions or other errors that are thrown until something actually needs to look at the Unicode code units. In particular I should be able to copy the string to another string and to a byte array without any errors and do raw I/O. I would further argue that any operation that does not have to look at the code units, such as concatenation, also does not produce errors, but that is somewhat less important.

      When displaying to the user it would be a lot better if error blocks were shown for the errors and anything that does parse as UTF-8 shows the resulting Unicode glyphs. Readable Unicode for the majority of the string helps considerably for the user figuring out what is wrong.

      However conversion to error blocks can be a security problem in other parts of the software, meaning that throwing exceptions, or another form of the data where errors are an object that is distinct from any possible code point, may be useful there. This is why it is really important that nothing be done to the UTF-8 data until as late as possible, since only the final step knows what type of error handling is correct.

      Current handling of UTF-8 by Python is extremely error prone and strongly discourages use of Unicode. As soon as the stupid programmer decides to strip out the bytes with the high bit set because they threw an exception, we have regressed to ASCII-only, which is worse than what we had in about 1983...

      The Python3.3 "hybrid" string is perhaps a solution but there needs to be some changes. It can store "ASCII" or "UCS-2" or "UTF-32". I would add the ability to store "UTF-8", reusing the ASCII pointer, but the UTF-8 is allowed to be an arbitrary byte array. It can also store "UTF-16" including errors in the UCS-2 pointer. It is also capable of storing fixed-sized characters in one of the 3 arrays (ASCII if nothing is greater that 127, UCS-2 if there is only BMP and no paired surrogates). It would figure out and convert to these fixed-sized arrays only when len() or indexing is used. I would add the ability to get "len" and "index" for UTF-8 and UTF-16 so you can work with code units. In addition there should be iterators that let you visit the code points without conversion, which is where there would be huge savings over current Python handling of Unicode. Iterators that also do different normalizations would be really useful too.

    41. Re:2.7.4 by spitzak · · Score: 1

      A UTF-8 string is an ARRAY of bytes, therefore comparing it to an ARRAY of fixed-size things like floats is correct. I have no idea why you are insisting it is somehow like adding more bits to a fixed-size object.

    42. Re:2.7.4 by steveha · · Score: 1

      There are two cases for calling print(): in your program, and interactively at the command line.

      In my programs, I am okay with print() being a function, and always putting the parens. In fact, I code my Python 2.x programs that way as well to make them more portable. (This works for printing any single value, and strings using % escapes or the {} format stuff are single values. If you want to use the comma syntax for printing multiple values in Python 2.x, you can't wrap everything in parens, because then everything becomes a tuple.)

      For interactive use, I itch to avoid typing parens for trivial stuff. And luckily, other people have already solved the problem. Instead of just running python to get an interpreter, run ipython, the "interactive Python" shell. Among its many cool features, it will add the parens for you on simple function calls. "Simple" means 1 argument by default; but again, strings made with % escapes or the {} format stuff are single values.

      So, if you are really lazy, define a function called p() that prints its argument, and in ipython you can print values like so: p 2+3 # prints 5

      And "help foo" works as you wish. By the way, here's a tip: if you want to get help on a Python keyword, you can, as long as you put it into quotes.

      help(int) # gets help on the int class
      help(def) # error because "def" is a keyword
      help("def") # gets help on the "def" keyword

      I'm sure there is some odd interactions with the parsing of some obscure syntax that need to be figured out

      I personally like how straightforward Python is:

      foo # you get some sort of an object
      foo(3) # call to that object with arg 3
      foo 3 # legal in Ruby, calls foo(3); not legal Python

      There are no tricky cases where you are calling the object without parens. Ruby, on the other hand, has what you want, as shown in the example above... and that means that Ruby has to have special syntax you use to get the object reference instead of calling the object.

      --
      lf(1): it's like ls(1) but sorts filenames by extension, tersely
    43. Re:2.7.4 by steveha · · Score: 1

      If Python just did not complain and passed invalid UTF-8 around, we could stop using other encodings.

      You are flaming pretty hard on this issue... but I don't understand why, because as far as I can understand, what you are proposing here is pretty close to what Python actually does.

      Python has an encoding error handler called surrogateescape, and uses it by default in all contexts that involve filenames. Thus if you have a filename that contains an illegal character for UTF-8, Python doesn't complain but just uses a surrogate escape for that illegal character; then when the Python string is converted back to a UTF-8 bytes sequence, the surrogate is converted back to the illegal byte.

      A nice discussion with a code sample in the PEP: http://www.python.org/dev/peps/pep-0383/

      Figure out how to write something in Python that can rename a file with a "bad" filename (ie invalid UTF-8) to have a "good" filename that is valid UTF-8.

      import os
      os.rename(illegal_name, legal_name)

      And that is not just hand-waving; I just did it. I created a file with a name that is illegal UTF-8, and used os.listdir() to get a list of filenames... result: no error and I got the list. I pulled the filename out of that list and passed it to os.rename()... result: no error and it renamed the file.

      I'm really confused because you seem to know about this already and yet in this comment you are making these strange claims that Python can't cope with invalid characters in filenames.

      --
      lf(1): it's like ls(1) but sorts filenames by extension, tersely
    44. Re:2.7.4 by spitzak · · Score: 1

      The PEP is interesting and seems to be one of the sources of the 0xDC80..0xDCFF. I saw it earlier as a method of fixing the handling of argv without throwing exceptions, though I don't know where the documentation is.

      This is still wrong in that you have to pass the special "surrogateescape" and use encode/decode. I want to be able to store a "Unicode" string that contains a UTF-8 error without an exception being thrown. The exception would be thrown if you attempt to translate the string to UTF-16 or look at the code points (though I also recommend there be ways to avoid the exception).

      Their problem is that they seem to think that UTF-16 (or perhaps UTF-32) is somehow "decoded" and UTF-8 is "encoded", while in fact it is the opposite, and they seem to be thrashing around trying to hide the fact they got it wrong with this filesystem stuff. On Unix at least a filename is a stream of bytes, and changing the "locale" should not change what file it identifies. Even on Windows with NTFS a filename is UTF-16, which is not "decoded" in their terminology, despite the ridiculous claim in the PEP that "Windows got it right". Windows with the use of 16-bit characters has caused an extrondinary amount of pain and has set I18N back decades.

    45. Re:2.7.4 by spitzak · · Score: 1

      They seem to have screwed up badly, I tested this a bit.

      I wanted to check if the translation to/from UTF-16 was lossy due to the surrogate replacements. It is not, but only because it appears that the UTF-8 encoding of 0xDC80..0xDCFF is considered "invalid" and thus turns into 3 of these codes when converted to UTF-16 rather than one. Thus it is impossible to write some valid Windows NTFS filenames in UTF-8 (actually I think it is impossible to put any of the codes 0xD800..0xDFFF into a UTF-16 filename by using "decode" as they like to call it). They also seem to have missed the problem that a malicious UTF-16 string can contain a sequence of these characters that "encodes" into a valid UTF-8 string and thus "decodes" back into an unexpected UTF-16 character.

      They really, really need to think about this again and get it right. We now have 3 encodings (UTF-8, UTF-16, and UTF-32) and NONE of them can be losslessly translated to another one. They need to realize that UTF-16, including errors, can be losslessly translated to/from UTF-8 by obvious means. The opposite is NOT true!!! UTF-32 can also be losslessly translated to UTF-8 except for code points greater than 0x10FFFF (or 0x7FFFFFFF if you use 6-byte UTF-8). The opposite could be true by selecting values with the sign bit set to represent errors, though I have never seen this done.

      If a "Unicode" string is built from UTF-8, it needs to keep the original UTF-8 around. There is no way they are going to get the bugs out of there mess with the filesystem if they don't figure this out!

    46. Re:2.7.4 by steveha · · Score: 1

      This is still wrong in that you have to pass the special "surrogateescape" and use encode/decode.

      In the context of handling filenames, you get this by default. As I said, I used os.listdir() and my file whose name contained a character invalid for UTF-8 was in the results, with a surrogate escape code for the illegal character; I was able to open it, rename it, or delete it (I tested all three).

      In short, filenames Just Work in Python 3, despite your claims.

      I want to be able to store a "Unicode" string that contains a UTF-8 error without an exception being thrown. The exception would be thrown if you attempt to translate the string to UTF-16 or look at the code points (though I also recommend there be ways to avoid the exception).

      If you are reading UTF-8 characters from a file, you don't get the surrogate encoding by default; by default it raises an exception, which you could handle. But it is a simple matter to request the surrogate encoding, and then you can easily filter the resulting string and look for the surrogate encoding characters. You may disagree with the default behavior in Python 3.x but I don't think you can claim that it is broken or insane.

      And! I didn't realize this until now, but Python 3 also allows you to use a "bytes" object to store raw UTF-8. You can convert a Unicode string representing a directory name to bytes (using the str.encode() method function) and then pass the bytes object to os.listdir(), and the resulting list of filenames will be bytes objects with the raw UTF-8. I believe this is exactly what you said you wanted. (So are the Python guys still "incredible morons"?)

      http://docs.python.org/3/howto/unicode.html#unicode-filenames

      Their problem is that they seem to think that UTF-16 (or perhaps UTF-32) is somehow "decoded" and UTF-8 is "encoded", while in fact it is the opposite, and they seem to be thrashing around trying to hide the fact they got it wrong with this filesystem stuff.

      And your problem is that you haven't studied what Python does or why it does it, yet you write long rants about how wrong it is. (See, I can be all judgmental too.)

      In Python 3.x, the concept is "all strings are Unicode". This means that from a Python user's point of view, a string is a sequence of Unicode code points, with an associated set of method functions. All else is implementation details. So, if you are reading a file that contains UTF-8, Python must decode the UTF-8 encoded bytes into Unicode and make the string. If you are writing a file that should be encoded as UTF-8, Python must encode the Unicode characters into UTF-8. Despite your claims, Python is completely consistent: converting from any encoding (UTF-8, UTF-16, UTF-32, Latin-1, etc.) to Unicode string is called "decoding" and converting from a string to any encoding is "encoding". See the above-linked Unicode HOWTO document.

      You keep saying they "got it wrong" but I actually tested it and it Just Worked for me, so it doesn't look wrong to me.

      On Unix at least a filename is a stream of bytes, and changing the "locale" should not change what file it identifies.

      If you just use the Python tools for managing files, they will Just Work. If you override the Python tools and tell them to decode with the wrong codec, you will get a bad result. This is a problem because... why, exactly? Would you also say that Python "got it wrong" because if you read a UTF-8 file but tell Python to use the Latin-1 codec it won't work right?

      Even on Windows with NTFS a filename is UTF-16, which is not "decoded" in their terminology,

      No, really, it is "decoded" in their terminology.

      http://farmdev.com/talks/unicode/

      --
      lf(1): it's like ls(1) but sorts filenames by extension, tersely
    47. Re:2.7.4 by steveha · · Score: 1

      I wanted to check if the translation to/from UTF-16 was lossy due to the surrogate replacements. It is not, but only because it appears that the UTF-8 encoding of 0xDC80..0xDCFF is considered "invalid" and thus turns into 3 of these codes when converted to UTF-16 rather than one.

      This is exactly what I expected: the translation isn't lossy, and the only way it can be non-lossy is if the surrogate characters are escaped. I'm not seeing the problem here.

      Thus it is impossible to write some valid Windows NTFS filenames in UTF-8 (actually I think it is impossible to put any of the codes 0xD800..0xDFFF into a UTF-16 filename by using "decode" as they like to call it).

      You still don't seem to understand the situation. The surrogateescape feature is used when Python reads a UTF-8 filename into a Python string (and then only used for characters illegal for UTF-8). When Python writes the string back out, the translation is reversed and the original string comes back out. But if you hand-craft a "bytes" object containing valid UTF-8 that includes those surrogate characters, then pass it to the Python file system stuff, it will be used as-is.

      If you hand-craft a UTF-8 string that includes characters that are illegal for UTF-8 and then try to use that on a Windows file system where the file names are stored in UTF-16, I don't know what will happen and frankly I don't care. On Windows you use valid Unicode filenames, and Python lets you do that. On *NIX you use string-of-bytes filenames, and Python lets you do that.

      There is no way they are going to get the bugs out of there mess

      As far as I can see, you still haven't identified any actual bugs.

      --
      lf(1): it's like ls(1) but sorts filenames by extension, tersely
    48. Re:2.7.4 by spitzak · · Score: 1

      The slide for UTF-16 clearly says that UTF-16 is the result of "encoding", not "decoding": http://farmdev.com/talks/unicode/. Also I did experiments and the new encoding cannot produce unpaired surrogates, therefore it cannot produce all possible NTFS filenames. Thus they have managed to reproduce the UTF-8 problem on Unix on Windows with UTF-16 as well.

      I find it odd you seem to think that the only source of filenames is os.listdir(). I know a lot of people like to put filenames in text files. It is kind of useful, in fact this is supported directly by Python when you use a filename in quotes in a .py file! Yet they have made it impossible to place all possible UTF-8 filenames in a Python script unless a bytes api is used and the programmer has to write the UTF-8 code units individually as \xNN sequences, making it unreadable.

      Your suggested solutions are just like all the other ones: basically never use Unicode at all in your Python program and use byte arrays everywhere. Technically this works but it does raise questions as to why they bothered to do all this work to support a "Unicode" object. I also find it really painful to write quoted strings if the string contains any non-ASCII because I have to use \xNN sequences for the UTF-8.

      Also we are back to the stupid programmer problem: when the programmer says "print bytes" and it throws an exception (because there was an invalid UTF-8 encoding in there) in 95% or so of the cases (based on my own experience with fixing stupid programmers mistakes) the "solution" is to somehow throw all bytes with the high bit set away so that in effect the strings are ASCII only (sometimes the bytes are literally thrown away, sometimes they think they are being smart by replacing them with "\xNN" or \NNN octal, and sometimes they double-UTF-8-encode them. I have NEVER seen a programmer fix the exception in any way that preserves correct UTF-8, and I have also never seen anybody who says "oh it is better that I got that exception rather than seeing my string printed out".

      This is the underlying problem: the current behavior encourages ASCII-only use and is effectively destroying attempts to migrate to Unicode. They need to make it easy to write a reliable program that uses UTF-8 and UTF-16, which means it must not do something unexpected if a physically possible pattern is encountered in the data.

    49. Re:2.7.4 by spitzak · · Score: 1

      It is impossible to write some valid Windows NTFS filenames in UTF-8!!!! That is a serious defect.

      It has nothing to do with the fact that they made UTF-8 -> Unicode -> UTF-8 lossless.

      It is IMPOSSIBLE for UTF-8 -> Unicode -> **UTF-16** to produce all possible UTF-16 strings!

      Therefore it is IMPOSSIBLE to store Windows filenames losslessly in a UTF-8 text file.

      In addition I worry that there are security problems in that a sequence of unpaired surrogates in UTF-16, when converted to UTF-8 and back again, may turn into another unexpected character (because they were arranged so that the resulting UTF-8 instead of having N error bytes had a valid encoding). This could be prevented but it makes the UTF-8 "encoder" much much more difficult and bug prone.

      They need to fix this, and the only fix that I think will work:

      1. Conversion from UTF-8 to UTF-16 must translate unpaired surrogates so that all possible UTF-16 strings can be created this way. Errors can be translated as the 0xDCxx codes.

      2. Conversion from UTF-16 to UTF-8 must translate unpaired surrogates to the correct 3-byte UTF-8 sequence. If you want it could carefully check that undoing the 0xDCxx errors results in an invalid UTF-8 sequence and undo it back to error bytes. I'm not sure if there is any point because it is still going to be lossy, though "less lossy" I guess if you assume the main source is UTF-8 errors.

      3. Unicode strings must save the UTF-8 as long as there is no modification to the string so that if the UTF-8 is asked for the result is the identity, with no reliance on lossy conversion.

    50. Re:2.7.4 by steveha · · Score: 1

      It is impossible to write some valid Windows NTFS filenames in UTF-8!!!! That is a serious defect.

      I believe you are completely mistaken here.

      In a *NIX context, filenames are sequences of bytes, and it is possible for those bytes to include values that are not legal UTF-8; Python 3.x has a scheme that escapes these illegal characters making a perfect lossless filename -> Python Unicode string -> filename conversion sequence.

      In a Windows context, filenames are UTF-16, and Python 3.x will not use the surrogateescape translation. It doesn't need to. My understanding, backed up by Wikipedia, is that UTF-8 can encode every code point in all of Unicode, so there shouldn't be any valid UTF-16 file name that UTF-8 cannot encode.

      So, your UTF-8 file example, Python does need to know whether the filenames are valid *NIX filenames or valid Windows filenames. But either or both cases are possible.

      Note also that *NIX names and Windows names are not perfectly convertible. *NIX names are allowed to contain a colon, an asterisk, a double-quote, and other characters that are errors on Windows. (I can't think of any Windows filenames that cannot be converted to *NIX but maybe there are some?)

      It is IMPOSSIBLE for UTF-8 -> Unicode -> **UTF-16** to produce all possible UTF-16 strings!

      Therefore it is IMPOSSIBLE to store Windows filenames losslessly in a UTF-8 text file.

      This is not my understanding, and Wikipedia seems to contradict this as well. If you really want to convince me, please show me a specific example.

      --
      lf(1): it's like ls(1) but sorts filenames by extension, tersely
    51. Re:2.7.4 by steveha · · Score: 1

      The slide for UTF-16 clearly says that UTF-16 is the result of "encoding", not "decoding":

      Actually, you are correct. Sorry, you confused me a bit. Go back to what I said earlier: a string is conceptually a sequence of Unicode codepoints, and to turn it into a filename you must encode it in a particular encoding. My greater point stands, namely that the Python terminology is completely consistent: you always encode from a string to an encoding, and you always decode from an encoding to a string. I apologize for the mistake.

      Also I did experiments and the new encoding cannot produce unpaired surrogates, therefore it cannot produce all possible NTFS filenames.

      Show me the code. If you really have found a problem, show me how to reproduce it.

      Let me remind you that Python allows you to write literals in UTF-16, which would avoid this issue; let me also remind you that Python allows you to specify the options on encoding, so if you somehow got a Unicode string that contains surrogate characters that you need to have left alone, you can just specify the encode to not use surrogateescape.

      I know a lot of people like to put filenames in text files. It is kind of useful, in fact this is supported directly by Python when you use a filename in quotes in a .py file! Yet they have made it impossible to place all possible UTF-8 filenames in a Python script unless a bytes api is used and the programmer has to write the UTF-8 code units individually as \xNN sequences, making it unreadable.

      Frankly I don't care. First you said Python is broken because it's possible to make a filename with illegal characters in it; I pointed out that with os.listdir() that this case Just Works. Now you say that Python is broken because when you have a filename with illegal characters in it, the only way to make a literal in a program is to use hex escapes. If I have a filename with illegal characters in it, I'm just going to write the hex escapes; I don't have a problem with this. In other news, Python programs are slower than hand-crafted C programs. Python is really good at a lot of stuff, but not perfectly optimal at everything. You have identified a corner case where you must use ugly hex escapes in a filename literal. Okay, if that's a deal-breaker for you, don't use Python.

      Your suggested solutions are just like all the other ones: basically never use Unicode at all in your Python program and use byte arrays everywhere.

      Actually, no. You said that you wanted the ability to hold onto raw UTF-8 and pass it around, and I pointed out that Python lets you do that. I just want to use the provided Python API functions, which Just Work as far as I can see.

      And for my own programs, so far all the filename literals have been boring, with no illegal characters in them. I've never had to write a Python program where UTF-8 wasn't adequate for all my filename literals.

      Also we are back to the stupid programmer problem:

      This is the underlying problem: the current behavior encourages ASCII-only use and is effectively destroying attempts to migrate to Unicode. They need to make it easy to write a reliable program that uses UTF-8 and UTF-16, which means it must not do something unexpected if a physically possible pattern is encountered in the data.

      You keep saying these things, but I haven't seen any evidence.

      And if you really are smarter than all the "incredible morons" working on Python, please contribute your insights on a Python mailing list, rather than just flaming here.

      I don't think I'm convincing you of anything, and frankly I'm not the world's greatest expert on Python stuff, so I think I'm done with this thread. If you really care about convincing me that Python is broken, please show me the code. Thank you and have a nice day.

      --
      lf(1): it's like ls(1) but sorts filenames by extension, tersely
    52. Re:2.7.4 by spitzak · · Score: 1

      I'm not trying to convert Unix filenames to Windows filenames

      I want to write a Windows filename into a file that is UTF-8, and read it back. The new Python 3.3 behavior makes this impossible (unless I use non-standard "decoders"). Or are you suggesting that if I want to embed Windows filenames into a text file, I have to use UTF-16 for the text file?

      It appears that the normal "decoder" for UTF-8 and also this new "surrogateescape decoder" both consider the UTF-8 encoding of an unpaired surrogate as an "error". The normal "decoder" will throw an error, the other one will replace it with 3 unpaired surrogates. In both cases I have failed to produce all possible Windows filenames, because they can contain unpaired surrogates. This is very similar to the problem on Unix where UTF-8 filenames can contain UTF-8 errors, and thus cannot use the results of the normal "encoder" into UTF-8.

      I also believe that the "surrogateescape encoder" for turning UTF-32 back into UTF-8 can be fooled into producing a UTF-8 string that will decode into a different UTF-32 string, this is done by placing the surrogates in a pattern such that the resulting "errors" are actually a valid UTF-8 encoding. I have not proven this yet but it seems likely.

    53. Re:2.7.4 by spitzak · · Score: 1

      I don't have Python3.3 here but this is pretty much the failing code:

      # make p be a Unicode string with an unpaired surrogate:
      p = u"\uDC00"
      # "encode" it into UTF-8:
      q = p.encode("utf8")
      # "decode" it using the surrogate escape:
      r = q.decode("utf8", "surrogateescape")
      # what is r?
      r
      u'\udced\udcb0\udc80'
      # did we preserve data?
      r == p
      False

  7. Re:Yay! by Anonymous Coward · · Score: 1

    Those who can, do. Those who can not, whine.

    What's remarkable is that whiners don't whine about their own inabilities. No. They whine about tools or circumstances.

  8. I'd fork that codebase! and make it go to 11 !!! by Anonymous Coward · · Score: 0
    re: Python needs to support larger dongles.

    I'd fork his codebase!
    .
    Why I'd fork his codebase into a whitesnake version, and a blacksnake version, and then the special "SpinalTap" version.
    ;>)
    One's well suited to pompous metal-head music, and the other one's bigger, and that third one, well, it goes to 11. 11 ought to be big enough for everybody -- billy gehtes. l'il billy wie gehtes-ihnen. Help, I'm drowing in a stream of consciousness!

  9. Re:Yay! by Anonymous Coward · · Score: 1, Insightful

    hair-raisingly complex

    You're an idiot.

    extremely bloated

    Nonsense.

    poor libraries

    Maybe.

    quite limited functionality

    Ridiculous.

  10. Not that surprising by Viol8 · · Score: 1

    The changes may be small but they're significant, potentially breaking a LOT of old code. This was a foolish decision on the part of Guido IMO. Sure, deprecate some features but don't remove them or change the syntax so old code won't run! Even Larry Wall understood that much and you'd never accuse Perl of being a well designed language.

    1. Re:Not that surprising by lattyware · · Score: 4, Insightful

      That's how you end up being PHP. Python 3 fixes core mistakes made in earlier versions of the language, and makes it harder to write bad code. That's a good thing, and the last thing you want is a language full of 20 ways to do something, 18 of which are deprecated. Removing backwards compatibility for the 3.x line was a good idea.

      --
      -- Lattyware (www.lattyware.co.uk)
    2. Re:Not that surprising by Anonymous Coward · · Score: 0

      I'll just add a language where the team wasn't afraid to cull and rebuild major parts of it for better and cleaner architecture even though it's quite widely used. As the effect, it's arguably one of most elegantly engineered languages (and with LuaJIT - damn fast as well).

    3. Re:Not that surprising by Anonymous Coward · · Score: 0

      I'm still looking forward to Python 4, with support for braces, and insignificant whitespace.

    4. Re:Not that surprising by lattyware · · Score: 1

      If you hate it that much, it'd be trivial to write an application that compiles braces into indentation. The reality is it is more readable, and nicer to write. If you are having problems with whitespace being significant, maybe you should find software that isn't crazy - everything I use handles it fine.

      --
      -- Lattyware (www.lattyware.co.uk)
    5. Re:Not that surprising by dbrueck · · Score: 1

      Agreed! The curly braces exist /primarily/ for the computer, not the humans reading/writing the code.

    6. Re:Not that surprising by Paul+Carver · · Score: 1

      Does the percent key in vim work?

      The most important thing about braces to me is that in vim a single keystroke (%) allows me to bounce back and forth between the start and end of a block.

      I haven't tried python yet but it would definitely be a big minus to me if the percent key no longer lets me bounce to block start/end points.

    7. Re:Not that surprising by Anonymous Coward · · Score: 0

      Disagreed! Curly braces exist so puny humans with their imperfect brains get fewer chances to make mistakes when telling computers what they want.

      You can still use whitespace for readability with curly braces - even more freely so. And even if you misindent something it'll be just a be harder to read, not a syntactic/semantic error (latter being worse).

    8. Re:Not that surprising by dbrueck · · Score: 4, Insightful

      Try this: take a well-formatted C or Java program and remove all the curly braces, and try to objectively quantify how much this affects your ability to determine the program's structure. Now, take the same program and leave the curly braces but remove all the indentation and again make your best guess about how much this affects your ability to determine the program's structure.

      Now ask yourself two questions:
      1) Which of the two (indentation or curly braces) is the much stronger indicator of program structure to a human?
      2) Which of the two is the much stronger indicator of program structure to the computer?

      (hint: if you're completely honest, you'll almost certainly come up with different answers for #1 and #2 :) )

      Doesn't it seem just a little weird that the primary indicator of program structure to the human isn't the one that actually matters from the computer's perspective? I'm not saying it's this massive problem, but at the same time it seems odd to fault a language like Python for taking the main block structure indicator from modern languages and have both the human and the computer rely on it. No redundancy, and no chance for two competing block structure indicators to ever be out of sync.

      Again, if you want a language where curlies are required, that's fine, but hopefully you can at least see that what Python does is both sensical and pragmatic.

    9. Re:Not that surprising by dbrueck · · Score: 2
    10. Re:Not that surprising by Anonymous Coward · · Score: 0

      Here's the thing - I can restore destroyed indentation from braces in a single keypress/click in any editor. I can't restore program structure from destroyed indentation without careful reading through and manual reindenting.

      There's not many ways you can accidentally destroy braces (and yeah, I've cursed at websites swallowing < many times), but there's a lot of ways you accidentally destroy indentation - starting with same websites that swallow your less-thans and going through other Python-insensitive software to careless copypasting and reindenting.

      Thank you, but I'll take unambiguous and less prone to accidental failure (though taking some more time in some cases) option every time.

      PS: By the way, there is significant whitespace in C. Inside of quotes. In Python you get both significant whitespace (in quotes and at starts of lines) and insignificant (in comments and as separator). Overloading semantics much?

    11. Re:Not that surprising by jgrahn · · Score: 1

      Again, if you want a language where curlies are required, that's fine, but hopefully you can at least see that what Python does is both sensical and pragmatic.

      Sorry, but pragmatic is not the right word. The indentation-is-block-structure assumes people make sane editing choices, don't redefine the size of a TAB, and don't disagree about indentation level. That's just not how the average programmer works, unfortunately. I myself have lost a lot of time on this. When you're familiar with other languages, it's an odd feeling when you realize a small indentation fsckup forces you to start over from the beginning.

      On the other hand, it's just one blemish on an otherwise good language, and not a reason to reject it.

    12. Re:Not that surprising by jgrahn · · Score: 1

      That's how you end up being PHP. Python 3 fixes core mistakes made in earlier versions of the language, and makes it harder to write bad code.

      I now nothing about PHP, but almost *all* languages try to stay backwards-compatible, because their designers know how surprisingly hard it is in practice to migrate everyone.

    13. Re:Not that surprising by dbrueck · · Score: 1

      Here's the thing - I can restore destroyed indentation from braces in a single keypress/click in any editor. I can't restore program structure from destroyed indentation without careful reading through and manual reindenting.

      There's not many ways you can accidentally destroy braces (and yeah, I've cursed at websites swallowing < many times), but there's a lot of ways you accidentally destroy indentation - starting with same websites that swallow your less-thans and going through other Python-insensitive software to careless copypasting and reindenting.

      Definitely could happen in theory, but it just doesn't tend to happen in practice very much, for the same reasons that the whitespace is used in the first place: it conveys useful information, so people take steps to preserve that information, regardless of the language being used.

      Thank you, but I'll take unambiguous and less prone to accidental failure (though taking some more time in some cases) option every time.

      More power to you: I'm not in any way trying to argue that people with your perspective should give up what they have selected as their preference. I do, though, object to the earlier comments (from you or some other AC) that imply that languages with significant whitespace are somehow lacking or broken or wrong.

      My assertion is that whitespace is significant in pretty much /all/ modern programming languages (otherwise it wouldn't be so universally used and often required by e.g. a company's coding standards). The difference is that in most languages a schism exists because the whitespace is significant to the human but not the computer.

      PS: By the way, there is significant whitespace in C. Inside of quotes. In Python you get both significant whitespace (in quotes and at starts of lines) and insignificant (in comments and as separator). Overloading semantics much?

      Haha.

    14. Re:Not that surprising by Anonymous Coward · · Score: 0

      "Doesn't happen very much" isn't quite comforting for something that introduces a new possible way to break your code.

      Even though I don't cross paths with Python very often I saw both syntax errors of this kind and semantic errors - copy-paste programming isn't a good idea, but when a validation function's all copy-pasted blocks to the effect of "if not validate_stuff():\n\t#may be do some stuff\n\treturn False", it's just waiting for some of those returns get miss out a tab. Which happened. Good thing it wasn't return True, which would be worse.

      Don't think I saw many of these errors with braces-based code. And don't think it was less readable, thanks to indentation still being permitted, just not being relied upon.

      (otherwise it wouldn't be so universally used and often required by e.g. a company's coding standards)

      Coding standars also usually require comments with concise and correct explanation of every class, method and function in understandable English, without slang and profanities.

      Don't see anyone declaring it "significant" and building it into a language.

      The difference is that in most languages a schism exists because the whitespace is significant to the human but not the computer.

      You're programming a computer. Of course things that are significant to computer should be of priority. If you're trying to explain something to someone, do you prioritize "it sounds better to me", or "there's less probability of misunderstanding"?

    15. Re:Not that surprising by lattyware · · Score: 1

      That doesn't make it a good idea. Go and look at PHP, and you'll see why. PHP is a horrific mess of deprecated stuff, and it's insanely hard to find the right way to do something in the cruft of hacked in features and old ways of doing things. As with all things, we make mistakes when we design languages. Sometimes, we can fix those without breaking backwards compatibility, sometimes we can't. It's worth making the break to make the language better - it's just not wise to do it more than very rarely.

      --
      -- Lattyware (www.lattyware.co.uk)
    16. Re:Not that surprising by dbrueck · · Score: 1

      Again, if you want a language where curlies are required, that's fine, but hopefully you can at least see that what Python does is both sensical and pragmatic.

      Sorry, but pragmatic is not the right word.

      Oh, it most definitely is! Look at all the counterarguments in this thread, including yours below: these are all very realistic problems that could occur /in theory/. In practice they rarely, if ever, occur. Seriously, I've been using Python for well over a decade with many different teams of developers in several different companies, and the problems that curly braces help protect against just don't really happen all that often. Big companies, small companies, big projects, small projects, new Python users, Python experts, large teams, small teams, etc., etc.: every significant variation in situation that I can think of, and not only have these types of problems never been regularly occurring, they have in fact been so rare that I have trouble coming up with specific instances in which they /did/ occur.

      So, going back to my earlier suggestion that whitespace is significant to humans in pretty much all modern computer languages, if you use a language like Python for awhile, most people I've worked with (including those who have come to Python with pretty strong bias against it) end up realizing it's actually just fine. You don't run into problems, you aren't constantly at risk of a big mess or anything. It works great. Do that for awhile and then go use e.g. C or Java, and it seems odd to need to use the curly braces. Why? Because you recognize that they aren't the most significant indicator of block structure (to you, the person writing the code or to all the people reading the code). Net result is that the curly braces don't feel all that pragmatic: they add little or no value so why use them?

      The indentation-is-block-structure assumes people make sane editing choices, don't redefine the size of a TAB, and don't disagree about indentation level.
      That's just not how the average programmer works, unfortunately.

      Hehe, I'm tempted to make some quip about what constitutes an average programmer, but again all I can do is observe that these things just don't happen all that often. You hire a new guy and he asks, "tabs or spaces?" and we say "spaces. Most people use 4" and that's it. It's never ever ever an issue. And honestly, it seems like 4-spaces-instead-of-tabs is a pretty common default anyway (regardless of language), such that a lot of times this conversation doesn't even happen.

      I myself have lost a lot of time on this. When you're familiar with other languages, it's an odd feeling when you realize a small indentation fsckup forces you to start over from the beginning.

      I /am/ familiar with other languages. I've used both types we're discussing for many years, which is why I can from experience observe that these things just don't really happen that often. At my company, for example, we each regularly develop in C#, Java, C, C++, Objective-C, Scala, Javascript, and Python. We use four-spaces-no-tabs in all languages not because of some standard that is enforced, but just because it looks nice and nobody used tabs anyway. We have the full gamut of experience levels of people using Python, and at this company we have experienced these theoretical problems with Python exactly zero times. I'm not saying they couldn't happen. Maybe we're just phenomenally lucky, or maybe these problems don't happen as frequently as detractors of significant whitespace imply. :)

      On the other hand, it's just one blemish on an otherwise good language, and not a reason to reject it.

      You're entitled to your opinion of course, but I hope you can recognize that many people feel that not only is it not a blemish that you just have to deal with, but a compelling feature and one of

    17. Re:Not that surprising by dbrueck · · Score: 1

      "Doesn't happen very much" isn't quite comforting for something that introduces a new possible way to break your code.

      That's the strongest I can say it without somebody coming up with an obscure counter-example and declaring checkmate. :) In my experience, these problems just don't happen, but asserting an extreme is an invitation for somebody to come along and gleefully pounce on it.

      There are plenty of ways to break code via sloppiness. Your experience may differ, but for me significant whitespace causes problems about as often as code breaking because somebody changed it all to upper case or ran it through Google Translator.

      Even when copying and pasting code (from e.g. an email or a web page to my editor), I can't remember the last time I had a whitespace problem, but more often I've had problems with e.g. the quote character becoming the quote character a word processor uses.

      Even though I don't cross paths with Python very often I saw both syntax errors of this kind and semantic errors - copy-paste programming isn't a good idea, but when a validation function's all copy-pasted blocks to the effect of "if not validate_stuff():\n\t#may be do some stuff\n\treturn False", it's just waiting for some of those returns get miss out a tab. Which happened. Good thing it wasn't return True, which would be worse.

      Don't think I saw many of these errors with braces-based code. And don't think it was less readable, thanks to indentation still being permitted, just not being relied upon.

      But whitespace /is/ relied upon - by humans who read, write, debug, and refactor the code. That's the rub. I'm sure you've seen the classic errors in C like:

      while (condition);
      doSomething();

      I'm not suggesting that these types of bugs happen all the time, but do you understand /why/ they can be easy to miss?

      (otherwise it wouldn't be so universally used and often required by e.g. a company's coding standards)

      Coding standars also usually require comments with concise and correct explanation of every class, method and function in understandable English, without slang and profanities.

      Don't see anyone declaring it "significant" and building it into a language.

      That's missing the point.

      It's telling that that consistent whitespace is such an important factor in code readability that many companies/projects formalize it as part of their coding standards, that's all.

      The difference is that in most languages a schism exists because the whitespace is significant to the human but not the computer.

      You're programming a computer. Of course things that are significant to computer should be of priority. If you're trying to explain something to someone, do you prioritize "it sounds better to me", or "there's less probability of misunderstanding"?

      First of all, the above two aren't opposites. In fact, the more the two converge, the better.

      But the underlying principle is that everything you require the programmer to do has a cost, and if the cost doesn't provide sufficient benefit, then it might be good to change it. If you look at the evolution of programming languages, real progress has been made in making languages more expressive and requiring less arbitrary effort on the programmer's part.

      The fact of the matter is that programming languages always require something of the programmer that is outside of the domain of whatever problem the programmer is trying to solve. To the extent that you can recognize and eliminate those effectively, that's a good thing. For example, in older C programs you had to declare variables at the start of your function, even if you didn't use them until later. Was that to help the programmer? Some people argued that it helped them think thro

    18. Re:Not that surprising by Anonymous Coward · · Score: 0

      That's the strongest I can say it without somebody coming up with an obscure counter-example and declaring checkmate. :) In my experience, these problems just don't happen, but asserting an extreme is an invitation for somebody to come along and gleefully pounce on it.

      Err, same to you. You're just brushing me off with "But it didn't happen to me!"

      [snipped lots of things to same effect] The curly braces can help indicate block structure, but they are secondary to the whitespace to the people writing the code, reading the code, troubleshooting the code, and modifying the code. [snipped]

      Ahem. Your definition of primary and secondary are quite strange.

      Braces do not "help indicate block structure", they define block structure. That's it. Unless you're writing your code on paper, you can ask your editor to present that defined block structure in any way you're comfortable with, or you can do it yourself in any way you're comfortable with. You can freely indent (or don't indent) it in any way, and it won't change the meaning of your code at all! Adding "/some/ readability"'s not the point of braces. Letting readability not affect semantics is. Didn't your mama teach you about importance of decoupling model from presentation?

      For Python's blocks presentation IS model. Yes, you can only write readable code - for arbitrary definition of "readable". You can also irreparably lose _both_ readability and structure just because someone thought "Hey, we'll be always living in floppies and modems age, let's call TRIM on all input lines!". With explicit block markers you don't lose latter and can easily restore former.

    19. Re:Not that surprising by darkfeline · · Score: 1

      What are you saying? Python has ALWAYS supported braces: http://www.python.org/doc/humor/#python-block-delimited-notation-parsing-explained

    20. Re:Not that surprising by phantomfive · · Score: 0

      Except it keeps people from ever wanting to use your language, because they don't want to rewrite all their code again when a new incompatible version comes out.

      The lack of commitment to backwards compatibility by Guido van Rossum is reason enough to avoid Python for other good languages.

      --
      "First they came for the slanderers and i said nothing."
    21. Re:Not that surprising by pongo000 · · Score: 0

      That's how you end up being PHP. Python 3 fixes core mistakes made in earlier versions of the language, and makes it harder to write bad code. That's a good thing, and the last thing you want is a language full of 20 ways to do something, 18 of which are deprecated. Removing backwards compatibility for the 3.x line was a good idea.

      And this is the very reason why I abhor Python: Every update seems to break legacy code, to the point where it becomes rather painful to even bother upgrading.

    22. Re:Not that surprising by dbrueck · · Score: 1

      That's the strongest I can say it without somebody coming up with an obscure counter-example and declaring checkmate. :) In my experience, these problems just don't happen, but asserting an extreme is an invitation for somebody to come along and gleefully pounce on it.

      Err, same to you. You're just brushing me off with "But it didn't happen to me!"

      To an extent perhaps, but I also tried to articulate that it /wasn't/ just me: not only are there gobs and gobs of people using these types of languages without constantly falling into these pitfalls, I have also witnessed firsthand the same thing across several companies, many projects (of all levels of scale and complexity), and many developers (of all ranges of experience with the language). So while I do readily understand how these problems could happen in theory, I think it's significant that after all this time I've rarely if ever seen them. That combined with the perception that those who most often cite them are those who haven't actually used a language like Python for any large project or significant amount of time does make me question how realistic these problems are in practice.

      [snipped lots of things to same effect] The curly braces can help indicate block structure, but they are secondary to the whitespace to the people writing the code, reading the code, troubleshooting the code, and modifying the code. [snipped]

      Ahem. Your definition of primary and secondary are quite strange.

      Braces do not "help indicate block structure", they define block structure. That's it.

      Ah, but we're arguing two different things. I think you're saying, "curly braces define block structure because the language is defined that way". I'm not disagreeing with that - languages like C, Java, etc. *are* defined that way of course. This discussion is around whether or not it's actually /needed/ and/or helpful - this whole thread is due to people dismissing Python because of significant whitespace, either implicitly or explicitly stating that that is a flaw in the language's design.

      So, I'm taking a step back from any particular language and looking at what value the curly braces actually add, and observing that in terms of code readability, they are less significant than the whitespace to all parties involved /except for/ the tool chain. The code is quite readable without the braces as long as indentation is maintained, while the opposite is definitely not the case. Further, it is not unheard of for braces to accidentally be omitted while indentation is maintained, resulting in a bug (not arguing that it is a common bug, just citing it as another example of what actually matters to the humans).

      All of this is not some attempt to abolish braces in those languages that have defined them as necessary, but to merely dispute the suggestion that significant whitespace is some crazy idea. You don't have to like significant whitespace or use it, but there's pretty compelling evidence that it's at least sensical. And to many of those that have used it a lot, it's quite preferable.

      Unless you're writing your code on paper, you can ask your editor to present that defined block structure in any way you're comfortable with, or you can do it yourself in any way you're comfortable with. You can freely indent (or don't indent) it in any way, and it won't change the meaning of your code at all!

      To whom? While it's true that the compiler will still generate the same output from the whitespace-less input, the maintainability of the code is drastically impacted, and "incorrect" whitespace can /directly/ change the *implied* meaning of the code to someone who is reading it or changing it. That may not matter to you, but IMO that's a pretty big deal as most code is almost certainly read (by humans) a lot more than it is written.

      Adding "/some/ readability"'s not the point of braces. Letting readability not affect semantics is. Didn't your mama teach you about importance of decoupling model from presentation?

      LOL

    23. Re:Not that surprising by Anonymous Coward · · Score: 0

      Well, that's what I keep repeating - whatever someone does to indentation of brace-delimited code, you can easily make it readable. It's just a matter of presentation.

      If someone sends you a snippet of code and somewhere along the road indentation gets stripped by inconsiderate software (don't tell me it never happened to you), as long as it didn't break the block markers, you can just hit "reformat" in your code editor and start reading and understanding it. In Python's case indentation is block markers, so if it's longer than 3-5 lines, you can just hit "reply" and ask your respondent to send it again and stop using dated software. I'd say it's pretty clear which one helps understanding and readability there.

      IOW, explicit markers = original author can write how he pleases, software inbetween can do how it pleases, reader can get it presented how he pleases, implicit markers = all three stages have to be aware of restrictions. At least, reader can change indentation from 3 spaces to 4.

      PS: "read (by humans)" is quite a pertinent note, especially for interpreted languages. Your code may be read (by humans) hundreds times a month, but it (or its bytecode form) will be read by computers thousands and millions times a day. I think it's clear possibility of which one misunderstanding you should be minimized. Especially if, as I keep saying, unambiguous for one means he can present it to other's eye in any form.

    24. Re:Not that surprising by dbrueck · · Score: 1

      Well, that's what I keep repeating - whatever someone does to indentation of brace-delimited code, you can easily make it readable. It's just a matter of presentation.

      If someone sends you a snippet of code and somewhere along the road indentation gets stripped by inconsiderate software (don't tell me it never happened to you), as long as it didn't break the block markers, you can just hit "reformat" in your code editor and start reading and understanding it. In Python's case indentation is block markers, so if it's longer than 3-5 lines, you can just hit "reply" and ask your respondent to send it again and stop using dated software. I'd say it's pretty clear which one helps understanding and readability there.

      IOW, explicit markers = original author can write how he pleases, software inbetween can do how it pleases, reader can get it presented how he pleases, implicit markers = all three stages have to be aware of restrictions. At least, reader can change indentation from 3 spaces to 4.

      So is the contention that code needs to be sent through unreliable media a lot?. The issue is not really one of explicit vs implicit block delimiters (in both Python and e.g. C the delimiters are quite explicit and part of the syntax), so we're talking about cases where you're allowing the code to be transferred /and modified/ en route - a byte-for-byte transfer of the code doesn't introduce any errors, it's only cases where you are assuming it's acceptable for modifications to occur. Again, yeah, it could occur, but there are so many ways for it to not occur - any modern website showing code is careful to preserve whitespace using a 'pre' tag if nothing else - regardless of language. Ditto for mail readers and a block with a fixed width font. But then there are plenty of other ways for sending code around that doesn't permit the code to be modified, be it a file attachment, a revision tag in a VCS, etc.

      So even if I concede the point that sometimes code has to be sent through an unreliable medium (which I don't do, Python or not), that in itself seems like a pretty tenuous argument in favor of delimiters be part of the language. Surely there is a more compelling reason than this, no?

      PS: "read (by humans)" is quite a pertinent note, especially for interpreted languages. Your code may be read (by humans) hundreds times a month, but it (or its bytecode form) will be read by computers thousands and millions times a day. I think it's clear possibility of which one misunderstanding you should be minimized. Especially if, as I keep saying, unambiguous for one means he can present it to other's eye in any form.

      You're misunderstanding my point: your argument above is precisely what I'm saying. :) The problem with a language like C is that there are two indicators of block structure - the official one (the braces), and the unofficial one (the whitespace). The one that matters in the end is the one your tool chain is looking for (the braces), yet in terms of understanding the code by the humans working on it, the whitespace is *most definitely* stronger. That's why coding standards typically make mention of it, that's why IDEs are so good at reformatting code, that's why poorly formatted code is so frowned upon - information is conveyed by the indentation, and if it's out of sync with the "official" block delimiters, the formatting of the code is misleading.

      As noted earlier, that's why bugs like this can occur:

      while (condition);
      doSomething();

      Do I think these bugs occur a lot? Nope. Do you understand why they /can/ occur? It's because from the human's perspective, the indentation carries more weight in conveying the block structure of the program. If this isn't intuitively clear to you (which I bet it is), Google for the usability studies that have been done on this subject. This is of course confirmed by my other suggestion of taking a C program

    25. Re:Not that surprising by lattyware · · Score: 1

      Why? You just run your old programs in 2.7. Both can be installed at the same time, and there are well-defined ways of indicating which should be used. Sure, there is some cost to not having backwards compatibility, but the gains from improving the language far outweigh those.

      --
      -- Lattyware (www.lattyware.co.uk)
    26. Re:Not that surprising by lattyware · · Score: 1

      Except that's simply not true. 3.x is the only time that has happened since the language moved out of it's initial stages.

      --
      -- Lattyware (www.lattyware.co.uk)
    27. Re:Not that surprising by Anonymous Coward · · Score: 0

      So is the contention that code needs to be sent through unreliable media a lot?

      No, contention is that this can happen at all, and there's likely no way to reconstruct the code afterwards. You were talking about implicit blocks as evolution of languages, but this evolution makes unhealthy assumptions about every media and every program only doing verbatim transfers. This is far from reality.

      The problem with a language like C is that there are two indicators of block structure - the official one (the braces), and the unofficial one (the whitespace).

      There's _one_ indicator of block structure in C et al. - braces. You could blindly assume there's no indentation at all in all your C files and just set up your workflow to automatically reindent it to your liking when you open it. Someone else might set it up differently and someone in another way - it won't make one bit of a difference to you. There are block markers in the file (model) - how to indent them is just for you and editor to decide (presentation).

      As noted earlier, that's why bugs like this can occur:

      while (condition);
                        doSomething();

      And that's why bugs like this are easily caught simply by hitting the same "reformat" key without needing a static analysis tool. By the way, this wouldn't catch that unindented return False error I mentioned earlier and that gave those fun "Invalid invoice: [blank page] OK" messages (but long live pylint!)

      doesn't it seem a little strange that the strongest indicator of block structure to a human is different than the one that actually matters?

      No, for reasons I stated in last post's PS. Robust model is preferable to perceived lower redundancy in presentation. In fact, you could write a plugin for any IDE that'd strip braces from your C and replace them with indentation, or, conversely, would let you write Python with braces, seamlessly changing it on save - model's the same, "IfStmt(condexpr, truestmt, falsestmt)" etc., it's just the question of presentation. Python's reliance on whitespace is bad design decision because it believes in perfection - that no one trims whitespace, every editor is aware of Python and programmers are perfect beings and don't misindent. All of this is just false. Good design shouldn't make so much assumptions and trip over imperfections of real world.

    28. Re:Not that surprising by phantomfive · · Score: 1

      but the gains from improving the language far outweigh those.

      No, no they do not. Unless you don't have to maintain a very large codebase. Then it's no problem.

      --
      "First they came for the slanderers and i said nothing."
    29. Re:Not that surprising by dbrueck · · Score: 1

      So is the contention that code needs to be sent through unreliable media a lot?

      No, contention is that this can happen at all, and there's likely no way to reconstruct the code afterwards. You were talking about implicit blocks as evolution of languages, but this evolution makes unhealthy assumptions about every media and every program only doing verbatim transfers. This is far from reality.

      I was talking of programmers doing less busywork just to satisfy the language as the evolution of languages. As far as verbatim transfers, to me it seems strange that a language designer would spend any time at all trying to worry about the case where code might be copied through some medium that can't be relied upon. Not saying it can't be a problem, just that it seems to be far outside the realm of what a language designer should care about.

      But regardless of which way you end up on this, I have to emphasize that this is more or less a theoretical problem: we send code from all sorts of languages through these example media and never run into problems: be it showing code snippets on a web page or emailing chunks of code back and forth for quick review or brainstorming. We never mandate any rules or best practices about how this should be done, and yet from your description of things it seems like we should frequently be "burned" by this, but we aren't. Are we just really lucky?

      The problem with a language like C is that there are two indicators of block structure - the official one (the braces), and the unofficial one (the whitespace).

      There's _one_ indicator of block structure in C et al. - braces.

      *Sigh*, no. I encourage you to look past the strict definition of what the tool is consuming and look at it from a language design perspective. Like I've already said before, from the tool chain's perspective, there is just one indicator. I'm not in any way disputing that. Yes, it really is the block structure indicator that matters (to the tools). But what matters to the humans? If you answer 'braces', then why is consistent indentation so widespread and important?

      You could blindly assume there's no indentation at all in all your C files and just set up your workflow to automatically reindent it to your liking when you open it.

      Not in practice, as this would destroy your revision control - you make a one line change to a file and the commit diff ends up being nearly the whole file just because your coworker with a different scheme was the one who edited that file previously.

      But still, this too is missing the point. Even if your editor and my editor decide to display the C code differently, the differences would be in things like the horizontal distance in each level or whether or not the opening curly brace is on its own line or not. The block structure of the program would not realistically be different in my editor than yours. Things that didn't really have anything to do with the program's structure might be displayed differently, yes, but not the structure of the program itself. Why? Because the structure of the program isn't changing, and that formatting conveys to you the structure (far more than the braces do), so of course it's going to be the same materially in both cases.

      As noted earlier, that's why bugs like this can occur:

      while (condition);

      doSomething();

      And that's why bugs like this are easily caught simply by hitting the same "reformat" key without needing a static analysis tool.

      Forget how to catch/correct the problem. I'm asking you to think through /why/ the problem could occur in the first place, and not be instantly obvious after the fact? This isn't rhetorical: I'm genuinely asking you to think through what is going on in a programmer's thought process

    30. Re:Not that surprising by Half-pint+HAL · · Score: 1

      If you hate it that much, it'd be trivial to write an application that compiles braces into indentation. The reality is it is more readable, and nicer to write. If you are having problems with whitespace being significant, maybe you should find software that isn't crazy - everything I use handles it fine.

      I have two problems with that.

      1: There is no reason whatsoever that a code editor couldn't automatically and dynamically handle the indenting. "Human readable code" is basically a myth, because it takes a properly-configured computer to render a series of magnetic polarities into a series of ASCII/Unicode characters and display them on-screen. Any computer recent enough to run something like Python is powerful enough for dynamic code management. I could be writing code with Python-style spacing, and the software could be saving it with braces and no spacing.

      2: Python spacing is a right bugger when you have any considerable depth of nesting at all. Yes, I know you should strive to keep functions "shallow", and use subfunctions for the deepest bit, but this isn't efficient, because sometimes the overheads of a function call end up being greater than the execution time of the code inside it. In a language that's already relatively slow due to its interpreted nature, the lack of a stack-free macro function is a pretty serious problem -- it actually leads to me using copy-and-paste duplicated code, which is something I really would rather not do. In fact, I'm seriously considering writing my own pre-processor as it seems like the only way I can keep my code clean while having it run at a reasonable pace....

      --
      Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
    31. Re:Not that surprising by Anonymous Coward · · Score: 0

      (hint: if you're completely honest, you'll almost certainly come up with different answers for #1 and #2 :) )

      You must free your mind even more, grasshopper.
      I don't see why #2 would be different from #1.
      How are 0x20 and 0x0A (or, god forbids, 0x09 and 0x0D) any different from 0x7B and 0x7D ?
      What makes you (reader) think a byte is "stronger" than another for a parser ?

      I find the mindstate about spaces being "weak" chars annoying. It's just like saying the air may not exist or may disappear any moment because one does not see it. Or that the "space theory" will be debunked like the phlogiston.

    32. Re:Not that surprising by sjames · · Score: 1

      The 2.x->3.x transition is the one and only time I have seen any breakage. Even there, there are pools that come with the distribution that assist with the porting effort.

      I am slowly but surely bring all of my code forward, with no issues thus far.

    33. Re:Not that surprising by sjames · · Score: 1

      I would argue that it's a feature. Dig up that old C code with the defective indentation in a few years and I'll bet you'll read it wrong even though it compiles fine.

      OTOH, the python code has to be indented correctly for reading or it won't work at all.

    34. Re:Not that surprising by lattyware · · Score: 1

      1. Why bother? The language can support it natively, so why not just do that?

      2. It should never reach that point - it sounds like your code is convoluted and poorly laid out. If the cost of the function call is actually affecting your program, then the code you are talking about is hugely performance-sensitive, and should probably be offloaded into an extension module.

      --
      -- Lattyware (www.lattyware.co.uk)
    35. Re:Not that surprising by lattyware · · Score: 1

      Yes. They do. Again, I use PHP as my example. It's a horrible mess of a language due to the fact they refused to make breaking changes. It's horrific to work with, and virtually impossible to follow good practises with. Loosing backwards compatibility is worth it, when done rarely.

      --
      -- Lattyware (www.lattyware.co.uk)
    36. Re:Not that surprising by olau · · Score: 1

      I disagree. Python 3 uptake has been really slow. The result is that a lot of the good stuff in the Python 3.x series isn't in wide-spread use yet, and if you're writing reusable library code, you can't really assume the majority of your users will have access to it yet.

      Guido should have said: we'll break these things we have to break, and for the rest add shim layers with deprecation warnings instead of just opening the gates. That would probably have seen much faster adoption rates.

      The gradual approach has worked pretty well for the 2.x series - if you use something deprecated, you get a bunch of really visible warnings you can then fix as you go along. I see no upsides in turning these warnings into hard errors, especially not when some of them happen in library code out of your immediate control.

    37. Re:Not that surprising by Half-pint+HAL · · Score: 1

      My problem is that I'm generating a huge dataset -- at last count I had over 797,000 items in it -- as permutations of a few dozen atomic items and several dozen combinable rules. Using explicit, clear labelling (for function names, arguments and variables) is good practice, but I keep running out of horizontal screen space even before I include much nesting. It provides a disincentive to me from using clear naming conventions, and also from using parameterised/named function arguments (which are essential, as my codebase is under rapid, frequent revision).

      --
      Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
    38. Re:Not that surprising by dbrueck · · Score: 1

      (hint: if you're completely honest, you'll almost certainly come up with different answers for #1 and #2 :) )

      You must free your mind even more, grasshopper.
      I don't see why #2 would be different from #1.

      Try the experiment I suggested and then you'll see.

      What makes you (reader) think a byte is "stronger" than another for a parser ?

      Nothing, because I don't think that. Try the experiment, and re-read my posts.

      I find the mindstate about spaces being "weak" chars annoying.

      Er... okie dokie. I'm not suggesting that spaces are weak characters, so if that's the conclusion you've reached I encourage you to re-read my post, as well as those in the related sub-thread.

      I still encourage you to try the experiment for yourself, but here's the answers to the earlier two questions:

      1) indentation - many people would recognize this intuitively, but if you don't, the experiment I suggested makes it very clear. You can also confirm it by glancing at code from a distance, looking at how you write pseudocode, or try writing code without indentation for a few hours. If you're feeling especially masochistic, download a non-trivial module from an OSS project, remove all indentation and blank lines, and then read through it to become familiar with how the code works.

      2) braces - obviously, because that's how the syntax of the language has been defined

      If we were to quantify the "weight" of each block indicator to humans vs computers, it's probably something like:

      - for humans: indentation=95+%, braces=5% or less (again, if the value you come up with is substantially different, then you really ought to try some of the above experiments to see for yourself. Seriously, take a few minutes and give it a whirl.)

      - for computers: indentation=0%, braces=100% (by definition, per the language syntax)

      Now, take a step back, put on your language designer hat, and think through the implications of this.

    39. Re:Not that surprising by Anonymous Coward · · Score: 0

      Try this: take a well-formatted C or Java program and remove all the curly braces, and try to objectively quantify how much this affects your ability to determine the program's structure. Now, take the same program and leave the curly braces but remove all the indentation and again make your best guess about how much this affects your ability to determine the program's structure.

      Now ask yourself two questions:
      1) Which of the two (indentation or curly braces) is the much stronger indicator of program structure to a human?
      2) Which of the two is the much stronger indicator of program structure to the computer?

      Am I the only one who likes to use _both_ curly braces and whitespace to demarcate the programs' structure? Which language allows me to do that?

    40. Re:Not that surprising by dbrueck · · Score: 1

      Most people /use/ both in typical modern languages of course; but in terms of conveying the structure of the code to a person reading it, the whitespace is more significant (that's the point of the suggested experiment - to give you first-hand knowledge of the relative importance of each, independent of the other).

      It's not "wrong" that both are used, but this whole sub-thread is in response to the implication that Python is somehow broken because it doesn't have the braces. But what Python has done is neither bad nor all that crazy of an idea: not only does it work extremely well in practice, it avoids the oddity of the tools looking at one structure indicator and the humans using it too but (subconsciously or not) actually relying fairly heavily on another.

      Again, I don't think it's a big deal that other languages have been designed that way, but Python is certainly not poorly designed because it didn't opt to follow suit.

    41. Re:Not that surprising by phantomfive · · Score: 1

      It's only worth it if you can can be sure that you won't have more backwards incompatible changes in the future. Otherwise there is no reason to use that language, since there are other languages that work perfectly well. PHP being bad doesn't make Python good; both of them are bad choices, for different reasons.

      --
      "First they came for the slanderers and i said nothing."
    42. Re:Not that surprising by lattyware · · Score: 1

      That doesn't make sense. Of course there is a reason to use Python. I'm not saying PHP being bad makes Python good, I'm using PHP as an example in favour of making breaking changes, as not doing that has caused the language to develop in a way that turns it into a really bad language. As to 'It's only worth it if you can can be sure that you won't have more backwards incompatible changes in the future.' - that makes no sense. Potentially, in the future, the language could have new features added, or features changed to make it better that require breaking backwards compatibility. Naturally, there is always a need to weigh up the cost of doing so with the benefits. I doubt we'll see breaking changes in Python for a long time yet, but they may well happen at some point, and there is nothing wrong with that.

      --
      -- Lattyware (www.lattyware.co.uk)
    43. Re:Not that surprising by phantomfive · · Score: 1

      It doesn't make sense to you because you don't understand the importance of backwards compatibility.

      That's fine, go ahead and use Python if it suits your purposes. I'm going to use some other language that doesn't make me rewrite my code at the whim of the creator.

      --
      "First they came for the slanderers and i said nothing."
    44. Re:Not that surprising by lattyware · · Score: 1

      Of course I understand the importance, but your argument is that there should never be any breaking changes, which means languages have to be perfect from the moment they are made, which is clearly never going to happen. Either we accept a broken language, which is not acceptable, or we make breaking changes. That's necessary sometimes. If you need to, 2.x still exists and you can run something using it.

      --
      -- Lattyware (www.lattyware.co.uk)
    45. Re:Not that surprising by phantomfive · · Score: 1

      So is Python 3 good enough now to not make backwards incompatible changes? For me it is good enough, but the creator disagrees. He has stated he is happy enough to make backwards incompatible changes.

      Therefore I will not use his language. It's simple.

      --
      "First they came for the slanderers and i said nothing."
    46. Re:Not that surprising by lattyware · · Score: 1

      It's unlikely to happen in the foreseeable future, but ruling it out would be foolhardy - clearly in time the language could be expanded and need alterations.

      --
      -- Lattyware (www.lattyware.co.uk)
    47. Re:Not that surprising by phantomfive · · Score: 1

      Since the creator has declared his willingness to break things in a backwards incompatible way, that's enough for me to stay away from the language. Check here for further perspective.

      Another example is the Berkeley socket API: it's an ugly, miserable thing to use, but I feel confident that code I write for it will still work in ten years. So I encapsulate the ugliness into some functions, and use it.

      --
      "First they came for the slanderers and i said nothing."
    48. Re:Not that surprising by lattyware · · Score: 1

      There is a huge difference there - the kernel isn't like Python, you can't run two disparate versions on the same system to run older code.

      Programming language have to be able to make occasional breaking changes like this, because otherwise we settle for crap languages with all the problems that can't be fixed. You know what happens then? People make more languages to fill the gap, and then rather than just making a few changes, you either stick with the old bad version (just as you could have before), or switch to a new language which will require a complete rewrite rather than just modification (which, in Python's case, is often as simple as pushing it through 2to3).

      If you really can't afford to have Python change, continue to use 2.x - guess what, it's still (as evidenced by this post) being supported, and will be for some time yet.

      --
      -- Lattyware (www.lattyware.co.uk)
    49. Re:Not that surprising by phantomfive · · Score: 1

      There is a huge difference there - the kernel isn't like Python, you can't run two disparate versions on the same system to run older code.

      And yet somehow you still miss the point, that backwards compatibility is important. Why do you have trouble understanding that? I'm not sure, maybe you've never worked in a real production environment, or maybe when things get bad, you skip to the next job and stop supporting your work. I don't know why you have trouble understanding this.

      Programming language have to be able to make occasional breaking changes like this,

      No they don't, this is your inexperience talking.

      If you really can't afford to have Python change, continue to use 2.x - guess what, it's still (as evidenced by this post) being supported, and will be for some time yet.

      Like Redhat, having a ton of legacy code that requires a specific version of the language? No thankyou, I'd rather use a mature language and avoid all the hassle.

      --
      "First they came for the slanderers and i said nothing."
    50. Re:Not that surprising by lattyware · · Score: 1

      You say that you need the backwards compatibility, but you never say why? Why does it make a difference that you are running old code in an old interpreter rather than the new one? Why force new code to be the same as that old code? It's the same thing, only you get the choice to advance.

      It's not impossible to upgrade code, and it's always possible to run it in an old (but supported) interpreter. There is no merit to keeping the language at a standstill.

      --
      -- Lattyware (www.lattyware.co.uk)
    51. Re:Not that surprising by phantomfive · · Score: 1

      You say that you need the backwards compatibility, but you never say why?

      I've explained it several times, but you didn't understand. When you learn it, you will be wiser.

      --
      "First they came for the slanderers and i said nothing."
  11. Re:Yay! by K.+S.+Kyosuke · · Score: 0

    I get it, don't allow facts to get in the way of a good quarrel. ;-)

    --
    Ezekiel 23:20
  12. Re:Yay! by Anonymous Coward · · Score: 0

    It's called C++ and it's the only real coding language for anything that isn't a toy or esoteric

    Real programmers can write FORTRAN programs in any language.

  13. A feature still missing by Urkki · · Score: 1, Insightful

    A very important feature of any language still seems to be missing: a sane reference documentation.

    In a duck-typed language this is even more important, because compiler/IDE can't really help programmer there. Below is a sample from core library docs, links included. To fully appreciate this, there's no link to this "read()" method, and whole BytesIO class documentation does not contain such method, so you're going be manually searching the page to find documentation for read(). Fortunately it is on the same page, which conveniently documents entire module, so it's really easy to quickly find particular piece of information in that wall of text.

    read1()

    In BytesIO , this is the same as read()

    1. Re:A feature still missing by lattyware · · Score: 3

      The documentation is great in general, you seem to have found one missing link in a relatively obscure class. As a whole, Python's docs are great. They generally explain well and give full examples.

      --
      -- Lattyware (www.lattyware.co.uk)
    2. Re:A feature still missing by Anonymous Coward · · Score: 0

      Compared to Java and C# I always found Python's documentation lacking.

    3. Re:A feature still missing by lattyware · · Score: 1

      Wow, really? I've not dealt with C# much, so I can't talk on that front, but I've found Python's docs far more useful than Java's.

      --
      -- Lattyware (www.lattyware.co.uk)
    4. Re:A feature still missing by EngnrFrmrlyKnownAsAC · · Score: 0

      It may not be perfect (e.g. your example) but Python takes documentation seriously. How many other languages allow you to embed the documentation right in the freaking source file?

      --
      Howdy howdy howdy
    5. Re:A feature still missing by Anonymous Coward · · Score: 1

      How many other languages allow you to embed the documentation right in the freaking source file?

      ... All of them?

      At least, all the popular ones. You might have heard the name "JavaDoc", for example.

    6. Re:A feature still missing by Urkki · · Score: 1, Insightful

      The documentation is great in general, you seem to have found one missing link in a relatively obscure class. As a whole, Python's docs are great. They generally explain well and give full examples.

      Just compare (not, these are not exactly same thing, just pretty close):

      Of these, Python's is least clear and useful in my eyes, by quite a margin. YMMV.

    7. Re:A feature still missing by Urkki · · Score: 1

      It may not be perfect (e.g. your example) but Python takes documentation seriously. How many other languages allow you to embed the documentation right in the freaking source file?

      I think the problem is, what Python community considers good documentation does not match what I consider good documentation.

      Many parts of Python docs are quite ok even to me, and I guess it's mostly issue with presentation, cross-linking, and how the whole thing is split into chapters. Also, I think the bad parts are more concentrated on areas used by less experienced Python programmers, seldom really used by those who might get around to fixing them. It could possibly be fixed just by modifying documentation tool output.

      Oh, and yes, documentation comments/embedded documentation are used in most mainstream languages these days...

  14. Say what? by Anonymous Coward · · Score: 0

    I was the only one who thought, What Monty Python updates....?

    Te be followed by a doh?!

    Nice work on the update to Python, guys!

    GreekGeek :-)

  15. Re:Add curly braces and you have C by mrvan · · Score: 5, Informative

    [...]And such stupendous stupidities such as isoweekday() returning a range of 1..7 [...]

    Maybe, from a CS point of view, any index should always be zero-based. However, for weekday there are two compelling arguments why this should not be the case:

    1) Authoritative: The ISO specs clearly state that weekday number should be 1..7 [from wikipedia: A date is specified by the ISO week-numbering year in the format YYYY, a week number in the format ww prefixed by the letter W, and the weekday number, a digit d from 1 through 7, beginning with Monday and ending with Sunday."]. So, any library that returns an "ISO week day number" of 0 is simply non-compliant

    2) Customary: All human readable date components are 1-based (the first "CE" date is 0001-01-01, not 0000-00-00). So why should weekday (which is intended for human consumption) be different?

  16. Lua tables vs. JavaScript objects by tepples · · Score: 1

    One statement in the article you linked was a bit hard for me to believe: "Although most scripting languages offer associative arrays, in no other language do associative arrays play such a central role." It then goes on to describe what appear to be the semantics of a JavaScript object, albeit with = separating the name and value instead of :.

    1. Re:Lua tables vs. JavaScript objects by Anonymous Coward · · Score: 0

      Yeah, it's an overstatement, though JS objects are certainly less versatile than Lua tables - at least for a simple reason that they actually only map string->object, whereas Lua tables are actual object->object maps. Then there are various goodies added over time like prototype-based being easiest, but not the only one possible inheritance model, ability to set weak ref mode on keys and values separately, and so on.

  17. Instead of Pygame by tepples · · Score: 2

    Pygame, an interface layer to SDL, doesn't appear to have broken the top 200. What replacement for Pygame that fully supports Python 3 should developers be using?

    1. Re:Instead of Pygame by baijum81 · · Score: 1

      pyglet 1.2beta1 has partial support for Python 3

  18. Re:Yay! by tepples · · Score: 1

    What's remarkable is that whiners don't whine about their own inabilities. No. They whine about tools or circumstances.

    In circumstances where startups are deliberately given measurably inferior tools, and one needs experience to get experience, how should one proceed without whining? Say, for example, the manufacturer of a device dictates that an application developed by a startup must be written in 100% pure Python for "security reasons", but an application developed by an internationally recognized company may use extensions written in C++. Applications developed by startups would then suffer the disadvantage of poor performance because everything goes through the interpreter, and possibly poor I/O because the Python modules that the manufacturer makes available to Python developers don't expose all of the device's hardware features.

  19. Endless discussions by Anonymous Coward · · Score: 0

    It is stupid to compare languages, a programmer or an engineer should know to use which language to use for a given project. This said, all languages have pros and cons. But it still looks like some angry teenagers witnessed some coding practices or discussions come here and critisize things that they dont understand deeply. Thanks to you, good programmers with flexible skills will always take the job while you are still waiting on "thanks for the application, we will call you"

  20. Re:Yay! by Anonymous Coward · · Score: 0

    C++ can't be so complex if its fanboys are so simple-minded.

  21. Re:Yay! by PiSkyHi · · Score: 1

    1. A bad craftsmen blames his tools no matter the quality of them.
    2. An average craftsmen working with bad tools becomes a bad craftsmen when he is afraid to admit the tools are bad.
    3. A good craftsmen blames his tools when they are bad and himself when they are not - otherwise he does good work with good tools for that is why he is a good craftsmen.
    From experience, I'd say most people who state no. 1. as paramount are actually number 2. The number 3. people don't usually talk about it too much and they usually think they are number 2. Personally, I think C++ can be replaced by Go and nothing can replace C.

  22. Re:Yay! by dbrueck · · Score: 1

    Is this a real scenario you've encountered (if so, provide details), or just some made up, straw man scenario trying to illustrate... something?

    Not sure it's relevant, but I/O in Python is quite fast, and Python can call non-Python libraries just fine. Also, it seems far more likely that the "internationally recognized company" would be the one with the bizarro artificial coding policies, while the startup would be the one more free to do whatever makes sense.

  23. Re:Still broken by dbrueck · · Score: 2

    lol, there's no static type checking because it's not a statically typed language.

    Static and dynamic typing each have their advantages ya know.

  24. Re:Still broken by znrt · · Score: 2, Insightful

    it's indeed not a language intended for code monkeys. feel free to move along. here, have a banana.

  25. Re:Yay! by Anonymous Coward · · Score: 0

    I get it, don't allow facts to get in the way of a good quarrel. ;-)

    You're responding to an AC, arguing with an AC... And you're even arguing against the wrong one!

  26. Master Craftsman by gd2shoe · · Score: 1

    It took me some time to figure this out:

    A master craftsman is one who produces work that even great craftsman admire. What makes him a master craftsman, and not just a great one? He chooses the very best tools available, and then creates new ones that do what must be done to create his masterpieces.

    In other words, once someone is good enough that the very best tools available hold him back, then he is about to master his trade.

    --
    I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
    1. Re:Master Craftsman by PiSkyHi · · Score: 1

      Kind of like Miles Davis: "You gotta learn the rules before you can break them."

  27. Cryptographic hardware lockdown by tepples · · Score: 1

    Is this a real scenario you've encountered (if so, provide details)

    I haven't seen this happen with Python particularly, but if you replace "Python" with "C#", you end up with the exact difference between Xbox Live Indie Games (C# only) and Xbox Live Arcade (native allowed), or the difference between downloadable Windows Phone 7 applications (C# only) and applications that the carrier bundles with a device (native allowed). Replace "Python" with "JavaScript" and you end up with several devices, where anyone can write a web application but only certain hand-picked developers can write native applications.

    and Python can call non-Python libraries just fine.

    I'm aware of this. But in the situation I posit, this ability has been locked down for "security reasons".

    while the startup would be the one more free to do whatever makes sense

    Not if "whatever makes sense" is restricted by a security policy that a device's user isn't allowed to change in the interest of locking out Trojan horse programs.

    1. Re:Cryptographic hardware lockdown by dbrueck · · Score: 1

      Ok, thanks. I was trying to understand if this was actually relevant to Python or to this sub-topic of whining.

      Circling back: how should one proceed without whining? A few thoughts:

      1) How effective is whining exactly? Probably not all that much, especially vs e.g. organizing like-minded people/companies into a more powerful group and articulating why a change in policy would be a good thing for all parties involved. Whining is so weak compared to doing something about a problem. Whining is not something that the winners do (but it's not /because/ they are the winners that they don't whine).

      2) The xbox example isn't actually a good one. By that I mean, it's not startups vs big companies. A startup that is genuinely serious about developing for that platform can most certainly get ahold of dev kits and publish native apps. It's pretty easy, at least in the US. Also, there are much larger factors than C# vs native when it comes to Live Indie vs Live Arcade - the companies doing the latter are investing far more resources and are held to much higher and stricter standards, not to mention much higher expectations being placed on their output.

      3) When your business faces an obstacle and you overcome it, you often find that you want that obstacle to remain in place as it could be an obstacle for your competition. It's easy to think you want a level playing field, but the opposite is almost always the case - you want every advantage possible but you want it for /you/.

      4) I think the GP is right in that whining is what people do instead of overcoming challenges. Using the xbox example again, someone who is completely put off by the C# vs native thing, to the point that they feel it's the insurmountable challenge, would probably not be able to succeed anyway - there are much larger challenges they'd have to face on the road to success, so such a minor one up front is probably a filter to weed out those who would later fail on a bigger and more real challenge. In a way, it's a blessing in disguise as it helps divert those people down some other path where they have better odds of success.

  28. Re:Yay! by dkf · · Score: 1

    You're responding to an AC, arguing with an AC... And you're even arguing against the wrong one!

    One had a bit of a point (C++'s template metaprogramming is indeed complicated, especially when combined with some features of C++'s type system, operators and exceptions, and some C++ programmers think that using every last bit of that complexity is a good thing) but obscured it with stupid and irrelevant ranting. (It's not really that important whether the standard library is large or small.) The other was just valueless denialisms from the C++ Internet Defense Force that could only be said to have won anything because the net value of the post being responded to was actually negative.

    I wish I hadn't clicked on the "2 hidden comments" link; watching a battle of wits between two unarmed combatants isn't really my thing.

    --
    "Little does he know, but there is no 'I' in 'Idiot'!"
  29. Re:Add curly braces and you have C by dkf · · Score: 1

    Customary: All human readable date components are 1-based (the first "CE" date is 0001-01-01, not 0000-00-00).

    The first date when that calendar system was used was substantially later; it wasn't invented until 525 and took the best part of 300 years to become widely used.

    Of course, the widely used calendar systems from 2000 years ago were mostly pretty weird; the Romans would name the year according to the consuls in office that year. Think of it a bit like calling the current year "the fifth year of Obama" except that was the name that was widely used for things like censuses, commercial transactions, and not just politics. (The other system, counting from the founding of the Roman Republic, was only really used by historians.) To our modern eyes, this is just crazy.

    --
    "Little does he know, but there is no 'I' in 'Idiot'!"
  30. Mac OS X Python madness by danceswithtrees · · Score: 1

    I guess I am firmly in the realm of knowing enough to get myself into trouble. I use Mac OS X 10.7 and have Macports installed. I just installed python 2.7.4 to stay current. When I start up python, it was still v2.7.1. Trying to figure out why, it seems that python is installed in too many places!

    The original Apple installs are in /System/Library/Frameworks/Python and include versions 2.3 , 2.5.6, 2.6.7 and 2.7.1.
    The python foundation installs into /Library/Frameworks/Python. Furthermore, macports has taken over python and decides which one should be executed using a link from /opt/local/bin/python which currently ->/usr/bin/python2.7 which is the Apple installed 2.7.1.

    I once tried to get rid of python 2.3 and 2.5 (who uses those, right?) and found out that iPhoto didn't work anymore!

    Has anyone found a saner, neater and more space efficient way to organize all the python installs on Mac OS X?

    1. Re:Mac OS X Python madness by Anonymous Coward · · Score: 0

      Yes.
      1. Ditch Macports for homebrew (mxcl.github.com/homebrew/) (ok, off-topic and optional, but you won't regret it).
      2. Install pythonbrew (https://github.com/utahta/pythonbrew). Everything is installed in the user dir, you can switch easily between versions : right now I am working in a split terminal with 3.3.1 at my left and 2.7.4 at my right.

  31. Re:Add curly braces and you have C by Anonymous Coward · · Score: 0

    Customary: All human readable date components are 1-based (the first "CE" date is 0001-01-01, not 0000-00-00).

    The first date when that calendar system was used was substantially later; it wasn't invented until 525 and took the best part of 300 years to become widely used.

    So in other words, you consider the analogy to Python to be spot on.

  32. Re:Yay! by K.+S.+Kyosuke · · Score: 1

    C++ can't be so complex if its fanboys are so simple-minded.

    How would that be a contradiction? You'd expect simple-minded people to do unnecessarily complex stuff. As Blaise Pascal once wrote, "I apologize for writing such a long letter, I didn't have enough time to write a short one." Or, another of my favorite quotes, "you don't really understand a problem unless you can simplify it".

    --
    Ezekiel 23:20
  33. Re: Same reason old IE wont go away by Billly+Gates · · Score: 1

    Once something becomes the pillar that everything rests upon its impossible to remove. Witness XP and IE 6.? One 1/4 of corps still have not upgraded to the all so cutting edge 4 year old IE 8 and windows 7! Those that have upgraded still have apps like Dell EMS that cant run on anything newer. IE 10 is considered broken at work even though its the first W3C version.

    Python 3 is just that. Broken! Python 2.7x is the standard now and why leave since it works fine? Welcome to the lgacy club! You can have a seat there next to Cobol andXP?

    The only reason IE 6 support is dying is because MS is EOLing it. Can the python foundation exert this much conrtol? Too many apps and apis are dependent on it amd its old quirks are hard coded into apps.

  34. Re:Still broken by Waffle+Iron · · Score: 1

    No static type checking. Move along, nothing to see here.

    Yeah, it would be better make most of the code you write boilerplate like copy constructors and redundant interface declarations in order to comply with a static type system. Then, once you realize that you still can't do what you need with static typing, run everything in a bloated "bean" container to get dynamic typing without having to admit it. And as an added benefit, you get a generous dose of XML to go with that!

  35. Re:Still broken by Anonymous Coward · · Score: 0

    Code monkeys? You mean the parents of Python script kiddies? You do realize that the main Python implementation is written in C, a statically typed language? You can keep your banana. You obviously need it.

  36. Mutability by tepples · · Score: 1

    JS objects [...] actually only map string->object, whereas Lua tables are actual object->object maps

    Python dicts also map objects to objects, but objects used as keys need some immutable property from which the key's hash is derived. This is why the immutable tuple and the mutable list coexist in Python, as do the immutable frozenset and the mutable set. I think the designers of JavaScript chose to serialize keys to a string in order to have something guaranteed to be hashable, as opposed to falling back on object identity (analogous to Python id(something) ) as the key.

  37. Re:Still broken by Anonymous Coward · · Score: 0

    Static and dynamic typing each have their advantages ya know.

    Then why not have both?

  38. Re:Add curly braces and you have C by phantomfive · · Score: 1

    Maybe, from a CS point of view, any index should always be zero-based.

    And even that isn't particularly clear. The only real good argument for starting indexes at zero is that it's more efficient in some cases. Other than that, it's entirely preference. In some languages, like Pascal, indexes start at 1 (or any other number of your choosing).

    --
    "First they came for the slanderers and i said nothing."
  39. Re:Still broken by dbrueck · · Score: 1

    Maybe some languages do. But if I were to guess, it's probably a couple of things:

    - design - it's a pretty fundamental decision in a language's design, and has implications that kind of ripple through everything. Both the language/tool maintainers as well as the users need to be able to reason about how things behave, and having both and not violating the principle of least surprise (among others) might be tricky, I dunno. There's a sweet spot between firm decisions around the "right" way to do things in a language and having a language be flexible enough to do things many different ways; I suspect that doing both would make it hard to remain in that sweet spot.

    - resources - it's reasonable to assume that most language designers/implementors have finite resources, so they'd probably prefer to spend them on making new language features or libraries, improving performance, etc. than implementing and maintaing the complexity associated with supporting both.

    - philosophy - a language designer probably has an opinion about the general benefits of one vs the other and therefore probably has less interest in whatever that "other" is. It has been shown that both types of languages can be used to build large, complex, reliable systems, so neither has inherent, complete, and overwhelming advantages over the other.

    Just guesses though.

  40. Re:Add curly braces and you have C by Anonymous Coward · · Score: 0

    The first date when that calendar system was used was substantially later....

    True. But, the first date in the Anno Domini series IS 0001-01-01. It doesn't matter when it's use was started or became popular.

    The intro crawler at the beginning of the original Star Wars did not include "Episode IV". That system of number was developed later. Lucas can say that he always had nine movies planned out, but he's full of shit. Just like the writers of Lost saying they had the whole story mapped out before the first season.

  41. Re:Still broken by znrt · · Score: 1

    i doubt there are many languages out there whose implementation isn't written in C, or on a language whose implementation isn't written in C in turn, or in another language with static typing. this doesn't alter the fact that you completely fail to see the benefits of having dynamic untyped languages around, to the extent to say "they are broken". my first guess then is that you must be a code monkey. my second guess is that you are some sort of director/manager of a team of code monkeys. if i'm still wrong, my third guess would be that you have no clue about programming whatsoever. but if you don't like the banana, that's just fine. move along :-)

    --
    i like bananas!

  42. Re:Yay! by Anonymous Coward · · Score: 0

    But back on topic, I'll bet all you homos would love to witness the release of my python.

    At least all of us Linux faggots would enjoy seeing that kind of package being released.

  43. Re:Yay! by overlordofmu · · Score: 1

    Never argue with an idiot. Bystanders won't be able to tell the difference.

    Or something like that.

  44. Too late! We switched to Perl. by Thrill+Science · · Score: 0

    Because of the nonsense at the last PyCon, we're switching to Perl

  45. Re:Add curly braces and you have C by Half-pint+HAL · · Score: 3, Informative

    The only real good argument for starting indexes at zero is that it's more efficient in some cases.

    Or more specifically, it's more efficient in a low-level language with compile-time-fixed-length arrays. If your array isn't a fixed block of memory referenced by index+offset, there's no technical reason to have a zero-index. All you're left with is "we've always done it that way". (Which is a fair point considering the number of errors that would arise if people got confused switching from one to the other.)

    --
    Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
  46. Re:Still broken by Half-pint+HAL · · Score: 1

    Static and dynamic typing each have their advantages ya know.

    Then why not have both?

    What interpreted language does have static typing? Static typing is carried out by the compiler, and we don't have a compiler here....

    Anyway, the architecture of Python has particular rules about scoping, and you do not know until a function is called what will be in scope at the time. I have my concerns that this is powerful in a "more than enough rope to hang yourself" kind of way, but it's there, it's part of the language, and (yeah) it's mostly a consequence of being an interpreted language....

    --
    Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
  47. Re:Still broken by Half-pint+HAL · · Score: 1

    And you, in turn, realise that when you drill down your C toolchain, you'll eventually find something called "assembler", which hardly classes as having typing of any description. Integers and floats, and that's pretty much your lot... and nothing stopping you from misreading one as the other.

    --
    Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
  48. Re:Add curly braces and you have C by Anonymous Coward · · Score: 0

    No, more specifically, it's more efficient in a low-level language where arrays are actually simply pointers to raw memory and indexing is simply syntax sugar for addition and dereferencing. I believe, actual "compile-time-fixed-length arrays" only appeared in C++ and even there their legacy pointerness shows up here and there. Uses of Pascal arrays can be compiled as efficiently as C arrays, whether their index range is stated as [1..256], [0..255] or [char] (in fact accesses to those three will likely look the same after compilation).

  49. Shebangs, shebangs.... by Half-pint+HAL · · Score: 1

    Yes, but here's the thing: Python, regardless of version number, identifies itself to the shell interpreter as a single thing by its shebang string -- "/usr/bin/python". It shouldn't do that.

    It's broken design -- if you write something that's not backward compatible, you have to give it a new command so that it's different in the shebang, meaning that when a user runs the script, the computer knows which shell interpreter to use.

    Recompiling and non-standard installs work, yes, but they also mean your code is no longer portable, because you're using a non-standard execution mechanism.

    Guido needs to sort out the shebangs....

    --
    Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
  50. Re:Add curly braces and you have C by Half-pint+HAL · · Score: 1

    I believe, actual "compile-time-fixed-length arrays" only appeared in C++ and even there their legacy pointerness shows up here and there.

    C arrays have their length fixed at compile time too, you know....

    --
    Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
  51. Re:Still broken by Anonymous Coward · · Score: 0

    You missed the point. Python is not *written* in assembler. It's written in C. If static typing is so bad, then why does Guido use a statically typed language to implement Python? He could have chosen Perl for instance.

  52. Re:Still broken by Anonymous Coward · · Score: 0

    What interpreted language does have static typing?.

    These came to my mind:
    * Dart has an interpreter and supports both dynamic and static typing
    * Pike has an interpreter that supports both dynamic and static typing
    * TypeScript is a typed superset of JavaScript that supports dynamic typing, but is first compiled into JavaScript

    The first two fulfil your requirement. The last one doesn't, but is nevertheless an interesting approach.

  53. Re:Still broken by Half-pint+HAL · · Score: 1

    You're missing the point. I'll give you a clue: "high-level language". And another: "low-level language". The two have different goals. One is more computer-friendly, the other is (intended to be) either more human-friendly or simply more logically self-complete.

    --
    Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
  54. Re:Add curly braces and you have C by shutdown+-p+now · · Score: 1

    Or more specifically, it's more efficient in a low-level language with compile-time-fixed-length arrays. If your array isn't a fixed block of memory referenced by index+offset, there's no technical reason to have a zero-index. All you're left with is "we've always done it that way".

    The actual distinction is not between 0-based and 1-based, but between end-exclusive and end-inclusive - i.e. [0..n) vs [1..n]. There are some languages which tried to do both at the same time, but the result is the worst of both worlds, and unintuitive to boot - e.g. in BASIC, DIM a(10) actually declares an array of 11 elements, indexed from 0 to 10.

    So if your upper bounds tend to be exclusive, then it makes more sense to start counting from zero, so that the length of the array is also its starting bound. And exclusive upper bound is pretty convenient in general because of various associated properties - e.g. the length of the interval is then always upper bound minus lower bound, and upper bound is never less than lower bound (when it's inclusive, it is less for an empty interval). Simply put, [0..n) tends to be more convenient in practice because your code will have fewer +1 and -1 in it.

  55. Re:Still broken by Anonymous Coward · · Score: 0

    I understand that you want to feel as being the winner of this argument, but it doesn't happen by replacing the premise with something you can easily argue about. What you say about high and low level languages is true, it's in fact obvious, but still besides the point. I'll reiterate the question: if static typing makes one a code monkey, then why did Guido choose to be a code monkey, and not use a language where he wasn't one? He could have chosen Perl or assembly. Neither of them is statically typed, and one is higher level than C and the other lower. Of course we know the answer. It's obvious. Static typing is not such a bad idea that it renders language unusable. There were other practical concerns that were more important than babbling about the evil of static typing.

    The other interesting question is why call language broken, when it doesn't have static typing. I can give you a hint: code coverage and readability. These two things become important factors in large code bases. Software projects tend to grow larger than they were intended, and at some point Python's lack of static checking becomes a burden.

  56. Re:Still broken by Anonymous Coward · · Score: 0

    I infer that your only exposure to static typing is from Java and C++. In that case, I wouldn't blame static typing, but the overly verbose language in general. Try Haskell for a change.

  57. Re:Still broken by Anonymous Coward · · Score: 0

    Then how does it make you feel that your beloved language was then written by a code monkey by your own definition?

    You only manage to appear to posses small intelligence when you throw derogatory terms so casually. I doubt that knowing this will make you any wiser, but all your assumptions are wrong.

  58. Re:Still broken by znrt · · Score: 1

    Then how does it make you feel that your beloved language was then written by a code monkey by your own definition?

    in absolutely no way. i know the benefits of static typing pretty well. what i assume is that someone who says "is broken because it's not statically typed" must be a code monkey. at best.

    You only manage to appear to posses small intelligence when you throw derogatory terms so casually. I doubt that knowing this will make you any wiser, but all your assumptions are wrong.

    my small intelligence suggests you got lost on the very first post of this thread already. reread slowly if you care.

  59. Re:Add curly braces and you have C by Half-pint+HAL · · Score: 1

    The actual distinction is not between 0-based and 1-based, but between end-exclusive and end-inclusive - i.e. [0..n) vs [1..n]. There are some languages which tried to do both at the same time, but the result is the worst of both worlds, and unintuitive to boot - e.g. in BASIC, DIM a(10) actually declares an array of 11 elements, indexed from 0 to 10.

    So if your upper bounds tend to be exclusive, then it makes more sense to start counting from zero, so that the length of the array is also its starting bound. And exclusive upper bound is pretty convenient in general because of various associated properties - e.g. the length of the interval is then always upper bound minus lower bound, and upper bound is never less than lower bound (when it's inclusive, it is less for an empty interval). Simply put, [0..n) tends to be more convenient in practice because your code will have fewer +1 and -1 in it.

    I would argue that most newcomers to programming would be more comfortable with an ordinal number, which would imply that the first element is 1 and the final element is "len", and so to look at one end or the other as the "actual distinction" would be to ignore the validity of this argument.

    I can't think of any technical situation that benefits greatly from the end-exclusive notation. Javascript's array[n]=((whatever)) is generally frowned upon (implement a .append() method instead!) and array[n+1] is functionally identical if the array is 1...n . The tiny performance advantage in the former assumes that you're statistically significantly more likely to append than to access the last element, which seems counter-intuitive to me on the grounds that you normally read values more often than writing them.

    But I'd be genuinely interested in hearing of any technical benefits of [0...n), because I'm geeky like that.

    --
    Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
  60. Re:Add curly braces and you have C by shutdown+-p+now · · Score: 1

    I can't think of any technical situation that benefits greatly from the end-exclusive notation.

    Think about STL algorithms and how they specify ranges. It's the same thing really, just abstracted away from numbers.

    It's not really about perf so much so as less need to add/subtract ones for typical operations.