Python 3.4 Released

Ubuntu Install? by LifesABeach · 2014-03-18 15:05 · Score: 2

That would be nice.

Re: Ubuntu Install? by Anonymous Coward · 2014-03-18 23:20 · Score: 1, Funny

Your Momma Uses My Python, Bitch.
Re:Ubuntu Install? by gatkinso · 2014-03-18 23:38 · Score: 1

Ubuntu Server is pretty nice actually.

--
I am very small, utmostly microscopic.
Re:Ubuntu Install? by wjcofkc · 2014-03-19 01:35 · Score: 1

I am a little confused about your request. On my very modest system, Python takes just under three-minutes to compile from: extract > cd > ./configure > make > make install

I run several Ubuntu derivatives and honestly never considered apt-get - but I also often run more than one version of Python on any given system and compiling manually makes that easier to maintain. If you are a Linux user so stuck on apt-get that you cannot work with source code at all, I highly suggest you download the source from here: https://www.python.org/downloa... and give it a try.

--
Brought to you by Carl's Junior.
Re:Ubuntu Install? by wjcofkc · 2014-03-19 02:07 · Score: 1

In fact I will even get you started:
cd ~/Downloads
tar -zxvf Python-3.4.0.tgz
cd Python-3.4.0
./configure
make
sudo make install

This harmless method will only install python in the directory you built it in. So if you type "python" you will still get the old interpreter. If you type ./python you will get 3.4 - As far as replacing your existing installation completely or doing something more complicated, I will leave it to you to Google that so I don't lead you down an irreversible path you did not intend to go down.

--
Brought to you by Carl's Junior.
Re:Ubuntu Install? by wjcofkc · 2014-03-19 03:49 · Score: 3, Insightful

If someone is looking to install Python 3.4 the day it is released, they are not an average office worker. If someone is above average enough as a user to want\need this, they may as well have the expertise that goes with it. What exactly are you expecting them to be doing with it? Fully upgrading a 2.x system to 3.x will only break things - clearly nerdy goals are at hand. Therefor nerdier instruction in required. Plus there is no other way to do it on day one of it's release.

--
Brought to you by Carl's Junior.
Re:Ubuntu Install? by wjcofkc · 2014-03-19 07:00 · Score: 1

sudo make install will not install it system wide, without a prefix it will install into the directory you compile it in. Prove it to yourself and try it. Otherwise my preferred location is opt: /configure --prefix=/opt/python3.4

--
Brought to you by Carl's Junior.
Re:Ubuntu Install? by DutchUncle · 2014-03-19 12:09 · Score: 1

Well, no, actually I figure they don't *understand* just how badly they'll break things. And the fact that upgrading always breaks things is another problem that normal people won't put up with. "This is new and better" is *not* supposed to mean "and all of your old stuff is as useless as LPs on a CD player".

and... by Anonymous Coward · 2014-03-18 15:16 · Score: 5, Insightful

And everyone will keep using 2.6/2.7, the windows XP of python.

Re:and... by jasonla · 2014-03-18 15:20 · Score: 1

And in about 20 years, it will make it into the REHL derivative my company uses... sigh.
Re:and... by sg_oneill · 2014-03-18 15:27 · Score: 1

In fairness Python 3 isn't really as widespread as it should be. I think people have found the 2.7 branch just works well for them.
With that said I do wish people WOULD move to python 3. 2.7's unicode handling is infinitely awful and fragile compared to 3.

--
Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
Re: and... by electrosoccertux · 2014-03-18 15:40 · Score: 1

Paraphrasing other Slashdot posts I have seen, there are no compelling reasons to upgrade to Python 3. Removing the global interpreter lock would be one major reason to, but no one has submitted a good patch for that, and besides, someone would probably just backport it to Python 2.7.
Re:and... by gweihir · 2014-03-18 16:03 · Score: 2

There are very few reasons to stick with the old model. Sure, it takes a bit to get used to some of the changes, but it is not that hard. And most good libraries have already moved over or are compatible with both.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:and... by Marginal+Coward · 2014-03-18 16:20 · Score: 1

Yup. In fact, the lack of any new features in 2.7 is a primary feature that the 3.x line sadly will lack for the foreseeable future. ;-)
Re: and... by dmbasso · 2014-03-18 16:26 · Score: 4, Insightful

There are plenty of good reasons to use Python 3, it is way more elegant and consistent. The way text and binary data is dealt with is incomparably better. I doubt that anyone who ever had done any serious coding in Python 2 escaped from the mindfuckery of mixing unicode and ascii.
The problem for a wider acceptance continues to be the libraries... for instance, Twisted. It is good that there is an async module in the standard library now, but too bad that my code already relies heavily on Twisted.
And about the GIL: if you are complaining about it, you most probably are not using the right language for the job.

--
`echo $[0x853204FA81]|tr 0-9 ionbsdeaml`@gmail.com
Re: and... by Kremmy · 2014-03-18 16:42 · Score: 1

The way I've heard it the manner in which Python 3 has modified the Python Standard Library has made it so cases where you aren't working with pure Unicode data (such as in any real world problem) get all the hassle and more of Python 2. Interoperability with foreign systems is kind of a basic foundation of data processing, having to workaround inconsistencies with the Python 3 Standard Library to do so probably means it's no longer the right tool for the job.
Re: and... by spitzak · 2014-03-18 16:59 · Score: 3, Informative

This exactly.
If your UTF-8 string is not completely valid, Python 3 barfs in useless and unpredictable ways. This is not a problem with Python 2.x.
Until they fix the string so that an arbitrary sequence of bytes can be put into it and pulled out *UNCHANGED* without it throwing an exception then it cannot be used for any serious work. Bonus points if this is actually efficient (ie it is done by storing the bytes with a block copy).
Furthermore it would help if "\xNN" produced raw byte values rather than the UTF-8 encoding of "\u00NN" which I can get by typing (gasp!) "\u00NN".
Re: and... by Kremmy · 2014-03-18 17:51 · Score: 1

I have tried it myself, I actually decided to stick with Python 2 because I ran into plenty of awkward behavior and found that most of the third party modules I was interested in weren't available. Just now, as an example, I've visited this page (note the 3 in the URL). Trying to get up to date GUI library bindings for Python 3, I find the state of them to be quite disheartening. The PyGTK+ binding has been replaced by PyGObject in Python 3, but the PyGTK+ recommends staying with PyGTK+ on Windows. The rest aren't looking much better - our options are Tkinter and Qt if want to be '3 clean'. This exercise actually reminded me of something I discovered the last time I made the attempt to support Python 3 on Windows, the finding that I was explicitly limited to installing particular combinations of Python versions and libraries if I expected the libraries to actually work. Something was severely broken along the way, and I get the impression that it's one of those things that nobody will ever admit to.
Re: and... by Anonymous Coward · 2014-03-18 18:19 · Score: 5, Informative

This is why there's a bytes type.
If what you have is not text, don't use the text type.
Re: and... by Anonymous Coward · 2014-03-18 18:25 · Score: 5, Informative

The inconsistencies are fully within Python 2. My experience is closer to full-scale horror when having to consider different encodings in Python 2, and since I am from a country that actually needs these "bells and whistles" regarding encoding for regular I/O on a regular basis, I have met these issues many times. Using chains of codecs to read and write files, having to intercept exceptions and .encode() .decode() in differing combinations to be able to avoid Python 2 "double-crashing" when reporting an exception, deep level hacking to reinitialize sys.stdout before output on certain machines, etc.
In Python 3, it does not "just work", but that is because character encoding is never a "just works" problem, and languages that say it is fail miserably in this regard as soon as it meets real world international encodings. Python 3 defines the problem correctly, and solves it natively in the best way I can imagine, by always being aware of the problem. No more prepending the u qualifier to every single string that might or might not be output (or combined with any other string that might or might not be output). Python 3 solves it correctly, by acknowledging character encoding as something that is actually an issue, and it does not make the silly assumption that ASCII is the way of the world. This assumption has been silly for at least 40 years, but many products were developed in ASCII centric regions, or at least in regions where you seldom saw more than one encoding, and never fully addressed the problem.
The Python 3 standard library does strings right , and should get credit for it. Instead it gets flac from programmers who do not like that it does not inherit the quirks from Python 2 that we have become accustomed to (and are still miles better than in many other languages; PHP and unicode, anyone?).
Heck, the number one reason that I have converted as many projects as I can to Python 3 is because of the blocks of encoding centered Python 2 code I can just throw out the window, and ease future maintenance. There are still some big module holdouts, but that was a much larger problem in ~2010. Today, the ones I miss in Python 3 are e.g. WXPython (where work is ongoing in the Phoenix project) and MySQLdb (the MySQL connection alternatives for Python 3 are outright silly -- either non-functional or non-documented).
There are several introductory programming courses I know of that focuses on Python, and they all use Python 3 by default. I am sincerely looking forward to the day when Python 3 is the natural order.
It takes a lot of motivation to change language structures from Python 2, and those working on the drafts are certainly top-class in their fields, so if one finds any design changes weird, the first instinct should be to read up on the rationales for the decisions. I have yet to encounter a change that seems "silly" or unnecessary after reading about the process.
Also, for the early adopters, not that Python 3.3 (och 3.4, as this article is about) is not 3.0 or 3.1. There is a lot of things that have been fixed along the way.
Re: and... by gbjbaanb · 2014-03-18 20:34 · Score: 1

I'm legitimately curious, what are some that come to mind?
IronPython? :-)
Re:and... by gnupun · 2014-03-18 21:33 · Score: 1

Python 3 // division operator breaks division polymorphism:

Let's do integer division first. Python 2: >>> 20 / 2 # int divide int -> int 10 Python 3: >>> 20 // 2 # int divide int -> int 10 >>> 20 / 2 # int divide int -> float (wtf?) 10.0 Now let's do floating point division. Python 2: >>> 2.5 / 5.0 # float divide float -> float 0.5 Python 3: >>> 2.5 // 5.0 # float intdivide float -> rounded float 0.0 # You have to use "/" for floating-pt division for the right answer: >>> 2.5 / 5.0 0.5

With python 2, it doesn't matter if the numerator and denominator is int or float, it automatically returns the correct answer all the time -- division operator is polymorphic. With python 3, if operands are integers, you must use "//" for division. If operands are floating-pt, you must use "/" for division. This is retarded because sometimes the programmer can't know what types of a, b and c are in the expression "a = b divide c." If a, b and c can be int or float, how can a programmer implement "a = b divide c" without a lot of ugly type checking? Another poor design in Python 3.
Re:and... by uneek · 2014-03-18 23:34 · Score: 1

And everyone will keep using 2.6/2.7, the windows XP of python.
2.6 is the standard that is packaged with popular OSes like Red Hat. Also who has the time to upgrade their python code to the new object model?
Re: and... by ultranova · 2014-03-19 00:43 · Score: 1

In Python 3, it does not "just work", but that is because character encoding is never a "just works" problem, and languages that say it is fail miserably in this regard as soon as it meets real world international encodings.

What's wrong with simply presenting everything with Unicode? It might not be the most efficient possible way of representing text, but that's unlikely to matter much, given that you're using Python.

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re: and... by Billly+Gates · 2014-03-19 00:51 · Score: 1

BASIC!!
Just kidding

--
http://saveie6.com/
Re:and... by MurukeshM · 2014-03-19 01:11 · Score: 1

I take issue with your "musts". The way I see it, if I want a real answer to x/y I use x/y. If I want some special meaning of / (like integer division or rounded answers or blah blah), I use //. That is good design.
Re: and... by Megol · 2014-03-19 01:22 · Score: 1

I guess you have no experience with Unicode? It isn't a character encoding, it isn't a glyph encoding and so one have to support all other "features" of it to be compliant. Most code doesn't.
Re: and... by godefroi · 2014-03-19 02:39 · Score: 1

If it's not UTF-8, why do you claim it's UTF-8?
That's like arguing that XML parsers should allow unclosed tags, because otherwise, they just throw exceptions and can't be used for serious work.
You're probably the guy we have to thank for "tag soup". Asshole.

--
Karma: Poor (Mostly affected by lame karma-joke sigs)
Re: and... by Jmc23 · 2014-03-19 02:41 · Score: 1

... and by serious work you mean compromising systems right?

--
Don't complain about syntax, grammar, or spelling. There is no.hell like input on android.
Re: and... by jythie · 2014-03-19 02:42 · Score: 2

this kinda highlights why a lot of people are still on Python 2.x. Python 3.x kinda comes across as a language fetish rather then something pragmatic, incompatible changes for the sake of sexyness. Elegance and consistency are great when you are waxing poetic, but are off less importance when you are interested in a language as just a tool. A lot of the library authors fall into that later category, it is a tool to get a job done and Python3 does not really prioritize pragmatism.
Re: and... by mypalmike · 2014-03-19 02:46 · Score: 3, Insightful

> Unless you're a framework author, chances are you'll have to care very little about mucking with bytes.
Right, because none of us write code that interacts with other code or systems that use bytes.
Like C libraries.
Or binary files.
Or network protocols.

--
There are 0x40000000 types of people: those who understand 32-bit IEEE 754 floating point, and those who don't.
Re: and... by jythie · 2014-03-19 02:47 · Score: 1

Exclusive no, but it has a pretty impressive community for scientific work, and community means priority and visibility, so python development tends to be more sensitive to the needs of scientific computing then other communities.
Re:and... by jythie · 2014-03-19 02:52 · Score: 2

When stability and consistency are more important then having new toys to play with, the lack of new features is a big selling point.

Adding new features to a language has always been a rather controversial idea, with many feeling that languages themselves should remain limited and stable with libraries being the place new features should be added.
Re:and... by Waffle+Iron · 2014-03-19 03:03 · Score: 1

In the real world, 5 / 2 == 2.5. This is true whether the operands are integers or floats.
In some discrete math systems, 5 / 2 == 2.
A language has no way to really know what kind of problem you are working on and which calculation would be more appropriate. Python 2 made the assumption that if you fed in two integers, then you were working in a discrete math system. This turns out to not usually be the case, and was a source of surprise and bugs for many people. (Python 2's division was modeled after C, which is still problematic, but at least C's division behavior is lexically determined by its static typing system, so it's usually somewhat less surprising. However, it would have been more clear if C had based its decision on the type being *assigned* to.)
With Python 3's implementation, it's always explicitly clear which kind of math you're doing. If your problem's math system depends on the types of the operands and you need to do type checking, then so be it. At least it's clear what's going on.
Re:and... by wiredlogic · 2014-03-19 03:53 · Score: 1

The problem is that 2.7's unicode is a hack that doesn't play nice with legacy code or 2to3. If you have any legacy code or third party library that expects string to behave like a bytes object, 2to3 will turn them into incompatible unicode strings. If you import unicode_strings from __future__, you get a monkeypatched string class which will cause problems with the existing 2.x code that expects string to behave like a bytes object.
The biggest barrier to the Python 3 transition has been the lack of support from key libraries. That issue is rapidly fading away as most of the major third party libraries have been ported.

--
I am becoming gerund, destroyer of verbs.
Re:and... by wiredlogic · 2014-03-19 04:18 · Score: 1

You've decided for yourself that integer division must return an integer result even though it isn't always mathematically correct. You don't have to use truncating division unless you need it. On the other hand, unexpected truncation can cause a wide assortment of problems. If you want guaranteed integer division use //. It's been there since 2.2.
The Python 3 division system is more consistent because you always get the correct result and have the choice to throw away precision when it is unwanted. The truncatating operator serves as an indicator to readers that the fractional bits are unneeded in the result. With traditional "polymorphic" division you have to guess the intent and things may break if one of the operands ever changes to a float.
Moreover you shouldn't be dependent on knowing the type of a numeric variable. The Pythonic way is to make things work in the general case and specialize with conversions using int() or // where needed.
The only potential problem is that 2to3 can't figure out where truncating division is needed which could subtly break translated code. The best remedy is to use "from __future__ import division" in all new 2.x code and use the 3.x style division. You will soon learn to use // when truncation is needed.

--
I am becoming gerund, destroyer of verbs.
Re:and... by Waffle+Iron · 2014-03-19 06:23 · Score: 1

Personally, if I have to divide up five children between two parents, I don't consider 2.5 to be an acceptable answer.
Acceptable to you or not, that's how it would be done physically.
Your acceptable solution involves a discrete math system. That's fine, but the safety of those children still shouldn't magically depend on whether you allocated them as Python ints or floats.
Re: and... by spitzak · 2014-03-19 11:38 · Score: 2

No, all that means is that EVERYTHING has to be changed to use the bytes type.
I mean every single library function that takes a unicode string, every use of ParseTuple that translates to a string, etc. Pretty much the entire Python library must be rewritten, or a wrapper added around every function that takes a string argument.
Everybody saying that "it's good to catch the error earlier" obviously has ZERO experience programming. Let's see, would it be a good idea if attempting to read a text file failed if there was a spelling error? Or perhaps it might be a good idea to defer this problem until it actually makes a difference?
This crazy belief that somehow some physically possible patterns of bytes will just magically not happen because you said they are "invalid" is inexplictable. No other system than UTF-8 seems to cause this weird brain damage, no other system is so totally unprepared for invalid storage and pretends that all storage will be valid. I cannot explain it except that it seems like exposure to ASCII where all bytes sequences are always valid has rotted people's minds so that they dismiss the problem.
Re: and... by spitzak · 2014-03-19 11:40 · Score: 1

The text is 99.9999999% UTF-8.
What I want to do is gracefully handle tiny mistakes in the UTF-8 without having to rewrite every function and every library function it calls to take a "bytes" instead of a "string", and thus completely abandon useful Unicode handling!
Come on, it is blindingly obvious why this is needed, and I cannot figure out why people like you seem to think that physically possible arrangements of bytes will not appear in files. The fact that all serious software cannot use Unicode and has to resort to byte twiddling should be a clue, you know.
Re: and... by spitzak · 2014-03-19 11:45 · Score: 1

God damn you people are stupid.
I am trying to PREVENT denial of service bugs. If a program throws an unexpected exception on a byte sequence that it is doing nothing with except reading into a buffer, then it is a denial of service. If you really thing that invalid UTF-8 can lead to an exploit you seem to completely misunderstand how things work. All decoders throw errors when they decode UTF-8, including for overlong sequences and ll other such bugs. So any code looking at the unicode code points will still get errors. And if you think there is some exploit that relies on the byte pattern that somehow only works for invalid UTF-8 then you have quite a fantastic imagination but no knowledge of reality.
Re: and... by spitzak · 2014-03-19 12:28 · Score: 1

I'm arguing against a design that is the equivalent of saying "you can't run cp on this file because it contains invalid XML".
There is nothing wrong with the xml interpreter throwing an error AT THE MOMENT YOU TRY TO READ DATA FROM THE STRING.
There is a serious problem that just saying "this buffer is XML" causes an immediate crash if you put non-xml into it.
Re: and... by Jmc23 · 2014-03-19 17:45 · Score: 1

God damn you people are humourless.

--
Don't complain about syntax, grammar, or spelling. There is no.hell like input on android.
Re: and... by godefroi · 2014-03-20 02:17 · Score: 1

There's your problem right there. There are no "tiny mistakes" in UTF-8. Either it's valid UTF-8, or it's not. It's valid XML, or it's not. It's valid JSON, or it's not. It's valid HL7, or it's not. There is no "graceful" handling of invalid data, not in the general case.
Physically possible arrangements of bytes will appear in files, yes, but those files are not necessarily UTF-8.
Oh, and all *my* serious software can handle Unicode just fine (in all its various encodings), because I use a platform that was designed FROM THE START to handle it correctly. It does fail gracefully, which is nice, in the invalid-data case, but nonetheless, garbage-in, garbage-out.

--
Karma: Poor (Mostly affected by lame karma-joke sigs)

And it still has the GIL by halfdan+the+black · 2014-03-18 15:41 · Score: 1, Insightful

Yup, Global Interpreter Lock so Python is still fundamentally single threaded -- only a single thread can be executing any python code at any given instance.

Its 2014 and we still can't have a multi-threaded python, this is ridiculous.

If you read Guido's criteria for getting rid of the GIL, he lists so many things that are specific to the current single threaded system (which is evidently perfect) that the only solution that meets his criteria is the current system.

I guess the only solution is to either live with single threaded system or fork it.

Re:And it still has the GIL by steveha · 2014-03-18 16:12 · Score: 5, Insightful

You make it sound as if it were no big deal to remove the GIL. It has been tried, and Python got 2x slower, so that attempt was abandoned. Python 3.2 gained a different implementation of the GIL, and that fixed some problems, but other problems still occur.
The GIL is Python's hardest problem.
https://www.jeffknupp.com/blog/2012/03/31/pythons-hardest-problem/
https://www.jeffknupp.com/blog/2013/06/30/pythons-hardest-problem-revisited/
As noted in the above referenced blog, you can use Jython or IronPython to avoid the GIL; PyPy will be using Software Transactional Memory to avoid the GIL; and you can use the multiprocessing module to use multiple cores without GIL problems. You do have options other than just using CPython.
If removing the GIL was as easy as you seem to think, it would be gone now, at least in a fork of CPython. Yet still it remains.

--
lf(1): it's like ls(1) but sorts filenames by extension, tersely
Re:And it still has the GIL by Anonymous Coward · 2014-03-18 16:50 · Score: 1

Dynamic typing is OK if done well. But JavaScript and typing in JavaScript are a disaster. Python gets some things wrong, but nowhere near as badly as JavaScript.
Re:And it still has the GIL by halfdan+the+black · 2014-03-18 17:00 · Score: 1

I never said it was easy removing the GIL, nor do I know how to do it and meet all of Guido's requirements.

The GIL is a design flaw of the language. If Python remained just a way to add quick scripting to existing programs, just like TCL, I would have no problem with its design. But I do have problems with Python becoming a systems language. Its far far far too dynamic for its own good, it should not encourage dynamically replacing bits of the runtime at runtime. The GIL really shows the age and intent of Python.

These sort of ultra dynamic language may be good at writing quick and dirty scripts, but such dynamic features make maintaing and understanding any large system a nightmare. After all, bugs are so much more fun to find months after you've released an app that right away that a static analyzer could have found.
Re:And it still has the GIL by halfdan+the+black · 2014-03-18 17:52 · Score: 1

And therein likes the problem: python is a "incredibly dynamic language" which makes any sort of performance difficult if not impossible. The problem is Python is so dynamic that its impossible to perform any sort of meaningful validation before the code is actually run.
Re:And it still has the GIL by halfdan+the+black · 2014-03-18 17:58 · Score: 1

2. The only reason it's hard to fix is because certain parts of Python are overly dynamic. Since they broke backwards compatibility in Python 3 it would have been the perfect time to fix it. Instead they broke backwards compatibility for stuff 99% of the community doesn't give a fuck about and now nobody is upgrading even though Python 3 has been out for over 5 years.

That is really insightful, seriously. Python 3 did break backwards computability, this really would have been the time to fix some original design flaws, but they didn't, instead, they focused on stuff, like you said 99% of the people out there don't care about, hence why so many use 2.7 today and how many new projects are even started with 2.7.
There's nothing wrong with design flaws, we all make them, you just at some point have to go back and realize you made a mistake and fix it.
Re:And it still has the GIL by steveha · 2014-03-18 18:05 · Score: 1

You actually have two complaints, not really related.
One complaint is the GIL. As I understand it, the GIL is mainly needed because the CPython reference counting model needs to touch a whole bunch of reference counts when churning objects, and this needs to not screw up. Thus Jython and IronPython, leveraging the garbage collection of their respective underlying VM platforms, didn't need a GIL.
The other complaint is that the language is too dynamic. That's just part of the design of the language. I can somewhat sympathize; I don't overuse the dynamic nature of the language (I know I can rebind "list" to do my own thing, but I never ever do that), and since I don't do that I would rather not pay the runtime penalty for it.
PyPy's JIT works by watching what the code is actually doing, and when it sees that (for example) your code never rebinds "list" it can strip out the dynamic lookup of "list" and just hard-code in a reference to the builtin "list" class. It can gain an extreme speedup when it detects a simple for loop, because it can replace all the Python VM machinery (creating new int objects, then freeing them) with a simple loop.
And, on the other hand, I do use the dynamic "duck typing" features to write less code, and I don't want to lose that. I love writing functional Python code where you can pass a function object in to another function and get a lot done with a relatively small amount of code.
It's possible that someday, some other language with a different static/dynamic tradeoff might eclipse Python; but I'm still using it because I get work done in it, and it is fast enough for the things I am doing.

--
lf(1): it's like ls(1) but sorts filenames by extension, tersely
Re:And it still has the GIL by Anonymous Coward · 2014-03-18 18:25 · Score: 1

Ruby also has a GIL.
Perl clones the entire interpreter for every thread.
This is not a new problem, and there are not yet any great solutions. You only hear about it with Python because Python's succeeded well enough for people to be bumping against the limitation.
The async IO in Python 3.4 should help relieve the need for using threads in the first place in a good few cases, at least.
Re:And it still has the GIL by WinstonWolfIT · 2014-03-18 19:14 · Score: 1

Jesus if the whole damn stack isn't thread safe the whole damn stack should be deuce-canned. Grow up and adopt a 21st century stack.
Re:And it still has the GIL by kyrsjo · 2014-03-18 21:42 · Score: 1

Yeah, it would be nice if there was some way of *voluntary* declaring the type of a variable - i.e. state that this variable wil ONLY accept an Integer etc. - and that trying to store something else in the variable would raise a TypeError. Same goes for function arguments.
That could lead to
(1) safer code where the mistake is caugth earlier and at the point where the data is stored as the unexpected type, not when you in a completely different piece of code do some operation on it which expects it to be a different type, and backtrace fireball ensues.
(2) Potential performance increase
For me, when writing Python, (1) would be the main thing.
Re:And it still has the GIL by Daniel+Hoffmann · 2014-03-19 00:30 · Score: 1

If you call C code from python the GIL is released so it can run in parallel in normal python threads. Many of the libraries and most of the core of the language are written in C. How much parallelism that nets you depends on your application, but simply using python threads does provide some parallelism.
Re:And it still has the GIL by BreakBad · 2014-03-19 00:35 · Score: 1

Those stats are just for submissions to codeeval challenges. I believe python is about 7th on the chart, while C/Java top it. Then C++,C#,PHP. Even Obj C is widely used due to iOS. I find it impossible to believe that python has a 30% share...but that would be nice.
Re:And it still has the GIL by halfdan+the+black · 2014-03-19 02:11 · Score: 1

Ugh, yes single treaded. Multiple-processes, yes, but only one thread in can run at a time in single address space.
Re:And it still has the GIL by BreakBad · 2014-03-19 02:28 · Score: 1

Basically the whole design of Python was so any part of the runtime can be overwritten at runtime, i.e. monkey patching.
I think the big problem with Python is all the hacker types who think it so cool to swap out bits bits of the runtime at runtime just because you can. Now this leads to some truly incomprehensible and unmaintainable code.
Agreed, but stated differently: I wouldn't blame the language design as much as I would blame the 'hacker' types.
I am glad the capability is there. Consider it a frontier....maybe we'll strike gold, maybe we'll waste time digging holes.
Re:And it still has the GIL by Bill_the_Engineer · 2014-03-19 02:48 · Score: 2

Perl clones the entire interpreter for every thread.
Which isn't a bad thing. Perl, Python, and Ruby do not run natively within a CPU. Both Python and Ruby settled on a GIL so that the interpreter could have multiple threads of execution. Perl decided that it would be faster to just give each thread its own interpreter and they were right. You can also do cool things like detach long lived threads and whatever.
You can still fork in Python, Perl, and Ruby and give each process its own independent address space and use IPC to share data. However with Perl threads you can share variable data (with caveats) with multiple threads with threads::shared.

--
These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
Re:And it still has the GIL by jythie · 2014-03-19 02:57 · Score: 1

It could be argued that this is a good thing. Multiple threads within a single process, while increasingly considered standard, has always had a bit of controversy behind it. It has been argued that a single thread per process and use well defined shared memory pages or IPC is a safer and more easily debugged model for multi-processing. Having multiple threads all share read/write access to a global memory space was a bit of a hack then we have been living with since... but it introduced a whole host of nightmares.
Re:And it still has the GIL by jythie · 2014-03-19 03:02 · Score: 1

Thing is, we have plenty of other languages that are not 'ultra-dynamic' already. Python is great because it is extremely dynamic and thus it is one of the few languages that can really be thrown at tasks where that is a benefit.... so much better then expletive like Spring.
Re:And it still has the GIL by jythie · 2014-03-19 03:06 · Score: 2

well, there is a good solution.. don't run multiple threads in a single memory space. The problem with people complaining about the GIL is they are coming at it from the wrong direction. The problem is people want multiple threads all able to mess with each other's data and not bother with all that pesky IPC or locking, which yes it is quicker and easier, but causes a lot of frustrating problems that people have forgotten they do not actually need to have.
Re: And it still has the GIL by spike+hay · 2014-03-19 06:30 · Score: 1

actual language speed = the time it takes the program to run + the time it takes the program to be written.
Sure Python might be 100x or more slower than C, but the total time is usually faster. For people that need speed, they should check out Julia. It's a scripting language designed mainly for technical computing (although it is perfectly general-purpose) that is fast due to good type inference and JIT. Usually within 2x of C. It also can call Python libraries as easily as within Python.

--
If you don't understand any of my sayings, come to me in private and I shall take you in my German mouth.

Re:Python is the new Pascal by Anonymous Coward · 2014-03-18 17:23 · Score: 3, Insightful

it shouldn't have been a surprise that it would spell doom for Python to fork it into two incompatible branches for a couple of "it would be nice" type features.

No.

The Python community, overall, approves of Python 3.x. The major breakages have to do with Unicode, but that's because Python 3.x does it right and Python 2.x didn't.

If you don't think Unicode matters, my guess is you are an English-speaking American. Others disagree.

There are efforts underway to port the major Python projects to support 3.x. SciPy will be the big one... Django already has support for Python 3.x.

Perl6 never went anywhere, Python 3.x is in wide use.

Re:Python is the new Pascal by Anonymous Coward · 2014-03-18 17:37 · Score: 1

Python 3.x is in wide use.

PyPI download stats indicate that Python 3 packages account for less than 3% of all Python package downloads. That's hardly "widespread use" for something that's been released for over 5 years.

Religeous arguments abound by EmperorOfCanada · 2014-03-18 17:47 · Score: 4, Interesting

I have recently started bathing in the waters of Python. What I have realized is that it is a core group within Python who are rightfully proud of their 3.x accomplishment. But they are solidly ignoring the fact that only a tiny percentage of people are using it. The reasons are quite simple people will need 8 modules for their system and 1 barely works with 3.x and the other says something like "mostly works" Well most people aren't willing to depend upon "mostly".

Now module after module is going 3.x but the other problem is that for most people having two pythons on their machine is a pain in the ass. I know there are tools to make this less painful but I can tell you an easy way to make it painless, Don't have two versions.

Then there is this call that you should begin new projects in 3.x; but the problem again is the two versions issue.

What bothers me about all this is that I come from a C++ / PHP world. With C++ I have upgraded countless times over many years and had close to zero problems with my code. I don't even know which compiler XCode is even using right now. With PHP my various upgrades have broken exactly one module and I hear rumours that the next big version of PHP will break one module in my older code. But I don't care as I am replacing my PHP with Python.

Where I am worried is that the core Python people will do something stupid like announce an end of support date for 2.7. The problem there is that it might be easier for some people to install a whole different language to sit alongside Python 2.7 and start playing with that instead of smashing their machine in the teeth and simultaneously installing 3.x.

Re:Religeous arguments abound by Anonymous Coward · 2014-03-18 19:40 · Score: 1

Now module after module is going 3.x but the other problem is that for most people having two pythons on their machine is a pain in the ass.
What are you talking about? Most Linux distributions are shipping 2.x and 3x in parallel and have been doing so for years without issues.
Re:Religeous arguments abound by MurukeshM · 2014-03-19 01:08 · Score: 1

Python is 4 years older than Java, PHP and Ruby.
Re:Religeous arguments abound by Bill_the_Engineer · 2014-03-19 03:11 · Score: 1

Python is 4 years older than Java, PHP and Ruby.
Technically this is true. However the Python that you know and love is actually younger than the three languages you mentioned. Let me explain:
Python 2.0 was released on October 2000. This is where it gain features that actually made it useful outside of the Ameoba OS. Prior to this release, the other languages were already making themselves useful on more mainstream OS like:
Ruby 1.0 which was released on December 1996 (It implemented most of the features that weren't available until Python 2).
PHP already had three major releases by July 1998.
Java 1.0 was released in 1995.
Incidentally, Python and Perl was influenced by Perl which was released in 1987.

--
These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
Re:Religeous arguments abound by EmperorOfCanada · 2014-03-19 06:59 · Score: 1

Borland C++ in 1998 and templates were something that only academics talked about. I have watched C++ change and mutate (into something that I don't like much anymore) C++ was brilliant with map, vector, and set, I sort of like auto, and the new for (each) loop is brilliant. But most use of templates makes code that I have to ponder for a long time to figure out.
Re:Religeous arguments abound by EmperorOfCanada · 2014-03-19 07:04 · Score: 1

As far as I am concerned Python didn't exist until about 8 years ago when I suddenly heard about it more and more.

In a way I see Python prior to 8 years ago as a language before its time. On a computer before that it was just too damn slow. But now with regular desktop computers pushing into the Teraflop range the speed of the computer will usually make up for any speed problems with Python. So development time is the only speed that most should worry about. Then if something does need optimization you can start with your code, try something like Numpy, maybe PyPy, and then start looking into OpenCL or a C++ extension type technology.

Knuth talked about premature optimization being the root of all evil, I have realized that using languages that are potentially faster like C++ is effectively premature optimization.

Re:String Hash Bike Shedding by EmperorOfCanada · 2014-03-18 17:48 · Score: 1

Try xrange if you are using 2.7. Its use of memory for such a massive loop so much better.

Jesus called by WinstonWolfIT · 2014-03-18 19:28 · Score: 1

And he asks nicely that Python programmers stop invoking his name in vain.

Honestly, this entire stack is so non-deterministic, what is it doing in the Enterprise again? I haven't seen anything this Rube Goldberg since the 50s. Shit-can this stinking pile of merde for almost anything and the world will be a better place.

PyPy by Adam+Jorgensen · 2014-03-18 21:27 · Score: 1

I am personally more interested in the PyPy release that will bring transactional memory.

The project I work on right now is proudly PyPy compatible and I hope to keep it so :-)

Re:PyPy by Adam+Jorgensen · 2014-03-18 21:35 · Score: 1

I suppose I should add that with the adoption of STM PyPy will be essentially removing the GIL, something people have been asking to be done for a very long time...

Ob by Hognoxious · 2014-03-18 22:10 · Score: 4, Funny

Have they fixed the whitespace bug yet?

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."

There is something called multi-process you know by Viol8 · 2014-03-19 00:00 · Score: 3, Interesting

And python has supported it (at least on unix) virtually since it was first released.

I've never really seen much virtue in multi threading - its useful in a limited number of cases but usually it creates more problems than it solves (compared to multi process) and is usually used by people who don't really know what they're doing. Essentially multi threading takes all the advantages of protected process virtual memory and throws them in the bin.

Support on shared web hosts by tepples · 2014-03-19 01:36 · Score: 1

There are plenty of languages that are equally expressive while still being far faster and more efficient.

And available on entry-level web hosting? The only language I know of that's more widely supported on shared hosting is one that's been called a fractal of bad design.

Parallel 2/3 didn't work in Windows until 3.3 by tepples · 2014-03-19 01:39 · Score: 1

Most Linux distributions are shipping 2.x and 3x in parallel and have been doing so for years without issues.

That's fine if your audience is willing to install Linux, either as a dual boot or in a VM. It took until Python 3.3 before Windows could practically run 2.x and 3.x in parallel.

Re:Parallel 2/3 didn't work in Windows until 3.3 by greg1104 · 2014-03-19 07:26 · Score: 1

OK, it took until 3.3 before this was straighforward in Windows. Why is that still relevant today? Python 3.0 came out at the end of 2008, and several parts of the 2.X transition were still pretty rough then. A poster above made a nice comment about how that's played out: "Python 3.3 (or 3.4, as this article is about) is not 3.0 or 3.1. There is a lot of things that have been fixed along the way." Having an upgrade path that's possible to follow smoothly has been a design goal of 3.0 since its early days, but it wasn't quite there yet when 3.0 first shipped. That's history at this point though.
Re:Parallel 2/3 didn't work in Windows until 3.3 by tepples · 2014-03-19 07:53 · Score: 1

It's not only the Python interpreter but also the C extensions that you use that have to be ported to Python 3. For example, Pygame (Python bindings for SDL 1.2) doesn't have official Windows packages for 3.3 or 3.4 on its download page. (It instead relies on what appears to be an unofficial page.)
Re:Parallel 2/3 didn't work in Windows until 3.3 by spiralx · 2014-03-20 01:47 · Score: 1

Yes, but that unofficial page really does have almost everything on it :)

The type of the result by tepples · 2014-03-19 01:44 · Score: 1

If a, b and c can be int or float, how can a programmer implement "a = b divide c" without a lot of ugly type checking?

By determining what type you want in the result and choosing the operator that produces that type. And you can get Python 3 division behavior in Python 2.6 or 2.7 using from __future__ import division.

Re:The type of the result by gnupun · 2014-03-19 03:46 · Score: 1

By determining what type you want in the result and choosing the operator that produces that type.

But that's the point, sometimes the programmer implementing "a = b div c" does not know whether the result "a" is int or float because he's writing a general library function where depending on the application "a" may be int or it be float. In python 2, he does not have to know and write "a = b / c". But that won't work in Python 3 without a lot of ugly type checking.

From reading the PEP for the "// operator" the justification for the existence of this operator is some numeric python dev was having a problem where his library users were passing ints or floats as arguments and he wanted the result of his algorithm to be a float (i.e., using ints in his algorithm were causing problems with division). And instead of just doing some smart conversion of input arguments, they decided to screw up the existing "/" operator and a new "//" divide operator.

Re:There is something called multi-process you kno by halfdan+the+black · 2014-03-19 02:08 · Score: 1

Of course I know about multiprocessing. Why have one copy of the interpreter and libraries loaded when you can have N, plus its so much more efficient to marshal data across process boundaries than to access a global shared memory block.

I've heard this processes are so much better because we can't do threads for so long. Kind of like if I cut off my right arm, its so much better to only have a left arm because you only need to move 5 fingers instead of 10.

Sounds like... by HetMes · 2014-03-19 02:33 · Score: 1

...a fanboi talking.

Re:There is something called multi-process you kno by Viol8 · 2014-03-19 02:33 · Score: 1

"Of course I know about multiprocessing. Why have one copy of the interpreter and libraries loaded when you can have N, plus its so much more efficient to marshal data across process boundaries than to access a global shared memory block. "

*snort* You obviously don't have a clue about multiprocess. Look up copy-on-write then get back to me.

As for a shared memory block , uuuh , guess what , theres something called "shared memory" specificially for multiprocess access. Its been around since the 70s in unix but I wouldn't expect a clueless (obviously Windows) programmer to know about it though ironically even Windows supports it now.

Here, educate yourself:

http://www.cs.cf.ac.uk/Dave/C/...
http://en.wikipedia.org/wiki/I...

Re:There is something called multi-process you kno by jythie · 2014-03-19 03:08 · Score: 1

I take it you are also against private variables and interfaces too? For that matter, why not just expose the entire system memory to every process, let programs screw with each other or alter kernel data directly!

Re:lost of python hate here by jythie · 2014-03-19 03:13 · Score: 1

If throwing more hardware at the problem makes the project work, then it does indeed 'fix' the problem.

New computers are not 'multithreaded', they just have more cores, which Python is more then capable of taking advantage of. The GIL causes problems for one specific threading technique, and that technique is essentially the GOTO of threading. It can speed up development, but it was probably never a good idea outside some very specific use cases.

Re:lost of python hate here by jythie · 2014-03-19 03:14 · Score: 1

Shared memory. Behaves just like having multiple threads in a single process except you *gasp* control how much damage threads can do to each other.

Re:Python is the new Pascal by jythie · 2014-03-19 03:17 · Score: 1

No, the language fetishist community approves of Python 3.x. In general people who want a tool for doing something are using Python 2.x. This might change over time, but for the moment Python 2.x offers more then Python 3.x does from a developer perspective when one's goals are result oriented rather then language oriented.

The use case for a variable result type by tepples · 2014-03-19 04:17 · Score: 1

sometimes the programmer implementing "a = b div c" does not know whether the result "a" is int or float because he's writing a general library function

If the output of the general library function shall be a float, then use /. If the output of the general library function shall be an integer, then use //. If the output of the general function shall depend on the types of the arguments, then I'm having trouble understanding in what sort of case this would prove useful.

where depending on the application "a" may be int or it be float.

On what aspect of the application would it depend? As far as I can tell, whether a should be int or float depends on what kind of quantity b is intended to represent and what kind of quantity c is intended to represent. In what specific "general library function" would "use floor division if neither b nor c is a float; otherwise, use true division" be helpful? It would prove easier for me to see your point if you can give a concrete example.

Re:The use case for a variable result type by gnupun · 2014-03-19 12:38 · Score: 1

In what specific "general library function" would "use floor division if neither b nor c is a float; otherwise, use true division" be helpful? It would prove easier for me to see your point if you can give a concrete example.

Let's take this example: Write a function that computes the average of a list (or sequence, generally speaking) of numbers. The list may contain elements that are all either integers, floats or complex numbers. The result should have the same type as an individual element of the list. So int list should return int average; float list -> float average; complex list -> complex average.

And here's the Python 2 implementation:

def average(seq): return sum(seq) / len(seq) >>> average([10,20,30]) # int list returns int avg 20 >>> average([1.5, 5.5, 0.5]) # float list returns float avg 2.5 >>> average([1+2j, 5+7j]) # complex list returns complex avg (3+4.5j)

If you try running the same code in Python 3 with an integer list, you naturally get a float result.

# python 3 average([10, 20, 30]) # error: int list returns float avg 20.0

Can you implement a simple average() function in Python 3 that satisfies the specifications mentioned above? You could, but it won't be as simple or elegant as the Python 2 version.

Not all C libraries release the GIL by tepples · 2014-03-19 07:04 · Score: 1

If you call C code from python the GIL is released

Only if the library explicitly releases the GIL. Pillow (Python Imaging Library), for one, happens not to.

Re:Not all C libraries release the GIL by Daniel+Hoffmann · 2014-03-19 07:23 · Score: 1

I was not aware the library needed to explicitly release the GIL. I thought that any C call would release the GIL.
Re:Not all C libraries release the GIL by nneonneo · 2014-03-19 07:56 · Score: 1

Any C library can touch Python objects any time it likes, by nature of being linked to the Python C-API. However, you can only safely access Python objects while holding the GIL. CPython libraries are entered into with the GIL held (otherwise you couldn't even interact with the arguments given to the function), and they may decide to release the GIL some time later (and promise not to touch the Python API while the GIL is not held).
*Many* CPython release the GIL during operations that may be long-running, so you get the illusion that basically any long-running C operation releases the GIL.
PIL not releasing the GIL should be construed a bug in this case.

Reference counting and unmanaged resources by tepples · 2014-03-19 07:10 · Score: 1

As I understand it, the GIL is mainly needed because the CPython reference counting model needs to touch a whole bunch of reference counts when churning objects, and this needs to not screw up. Thus Jython and IronPython, leveraging the garbage collection of their respective underlying VM platforms, didn't need a GIL.

But these implementations pay a penalty for relying on the VM's tracing garbage collection, and this penalty is loss of finalizer functionality.

Memory managed by the VM isn't the only resource that a program uses, but it's the only resource that some garbage collectors anticipate. Other resources include open files, open network connections, open database connections, and the like. One common design pattern for ensuring that unmanaged resources get released is try/finally, which works with JVM, CLR, and CPython, but doesn't work so well with resources held longer than one method. Another design pattern is Resource Acquisition Is Initialization, which releases the resource in a "finalizer" method associated with the object. CPython, JVM, and CLR all support finalizers in theory (__del__, finalize, and IDisposable respectively), but platforms that use pure tracing collection don't guarantee that they'll get called at all, which causes an unmanaged resource leak. In CPython, so long as an object isn't part of a reference cycle, its finalizer gets called once it's unreachable, which allows closing the file, closing the connection, freeing the unmanaged bitmap, etc.

Re:Reference counting and unmanaged resources by steveha · 2014-03-19 07:39 · Score: 3, Insightful

these implementations pay a penalty for relying on the VM's tracing garbage collection, and this penalty is loss of finalizer functionality.
And this is why it is best practice in Python to use the with statement, to make sure that things get cleaned up when you are done with them.
In CPython you can get away with just dropping your objects on the floor, and the reference counting will clean them up for you. In other implementations, not so much.
This works okay in CPython:
def read_data(fname): f = open(fname) return f.read()
In CPython, with no references to f it gets cleaned up, and the open file gets closed.
In any version of Python, this works great:
def read_data(fname): with open(fname) as f: return f.read()
The with statement ensures that the open file will be closed, no matter how the function is exited (including by exception).
And I actually prefer the with statement these days... I like how it makes the lifetime explicit for the stuff it wraps.
Raymond Hettinger has expressed the opinion that the with statement is one of the really best ideas in Python. I think that was during his PyCon session where he listed his favorite things in Python.

--
lf(1): it's like ls(1) but sorts filenames by extension, tersely

import multiprocessing by tepples · 2014-03-19 07:15 · Score: 1

Perl clones the entire interpreter for every thread.

So does Python if you import multiprocessing. But then you run into other problems, such as it being hard to send objects back and forth between processes unless they're small and simple.

In a pickle by tepples · 2014-03-19 07:29 · Score: 1

One problem with import multiprocessing is that it's slow on the operating system on which the majority of desktop applications run. From the manual page that you linked:

spawn: Available on Windows, but slow
fork: Available on UNIX only
forkserver: Available on UNIX only

Another problem is that anything sent through a connection between processes has to be picklable with a bytes representation smaller than about 32 MB, so no passing big images around. Does this cause an actual problem in practice?

VPS by tepples · 2014-03-19 07:31 · Score: 1

If a hosting plan "can support large websites", then it's probably a virtual private server or larger, and you can compile your own Python in your own VPS. Or what am I missing?

Hyper-Threading Technology by tepples · 2014-03-19 07:39 · Score: 1

New computers are not 'multithreaded'

Tell that to the single-core yet simultaneous-multithreaded Atom N450 in my laptop, which injects the second thread's instructions into the first thread's pipeline bubbles and vice versa.

Re:Hyper-Threading Technology by jythie · 2014-03-19 07:59 · Score: 1

Context switching is not exactly a new thing. 'hyperthreading' as a specific implementation does it a little better, but it is the same basic technique processors have been using since the 60s. It is in the same category as branch prediction, we keep getting new methods and buzzwords associated with them, but the basic process is hardly new.

Optimize for the hardware you have by tepples · 2014-03-19 07:40 · Score: 1

There's the old parable about two programmers that are told to double the speed of their program. One guy spends a month rewriting the core in assembly and using hardware acceleration. The other guy waits 6 months and buys a new computer.

The second guy failed because the deadline was in three months. Or the second guy failed because other customers running the program on their own computers weren't likewise willing to buy a new computer just to run one program.

Applications written in Python by tepples · 2014-03-19 07:44 · Score: 1

Then why the hell are they downloading python - a programming language???

To run an application written in that programming language. For the same reason that\ downloads Flash Player to run SWFs and Java to run JARs and a web browser to run web applications, one downloads Python to run Python applications.

Interoperating with invalid data by tepples · 2014-03-19 07:46 · Score: 1

What should a well-behaved program do with bytes objects pulled from a database that already contains plenty of invalid encoding?

Re:Interoperating with invalid data by spitzak · 2014-03-19 11:42 · Score: 1

The program should produce an error AT THE MOMENT IT TRIES TO EXTRACT A Unicode CODE POINT. Not before, and not after.
If the program reds the invalid string from one file and does not check it and writes it to another file, I expect, and REQUIRE, that the invalid byte sequence be written to the new file. It should not be considered any more of a problem than the fact that programs don't fix spelling mistakes when copying strings from one place to another.
Re:Interoperating with invalid data by tepples · 2014-03-19 14:12 · Score: 1

The program should produce an error AT THE MOMENT IT TRIES TO EXTRACT A Unicode CODE POINT. Not before, and not after.
Which leaves open the question of how best to clean up all the tens of thousands of existing records that may not be valid UTF-8. Otherwise: "This product is unavailable for purchase because its description contains invalid data. This problem has been reported to the store owner."
Re:Interoperating with invalid data by spitzak · 2014-03-19 15:08 · Score: 1

Well the first thing you need to do to clean up the invalid UTF-8, for instance in filenames, is to detect it.
If reading the filename causes it to immediatly throw an exception and dispose of the filename, I think we have a problem. Right now you cannot do this in Python unless you declare it "bytes" and give up on actually looking at the Unicode in the vast majority of filenames that *are* correct.
It is also necessary to pass the incorrect filename to the rename() function, along with the correction. That is impossible with Python 3.0's library, and is probably the more serious problem.
Both of these problems are trivial to fix if it would just consider arbitrary byte sequences valid values for strings, and defer complaining about incorrect encoding until the string actually needs to be *decoded*, which actually is only really needed to display it, and sometimes for parsing in the rare cases that non-ASCII has syntactic value and is not just treated as letters.
Re:Interoperating with invalid data by godefroi · 2014-03-20 02:39 · Score: 1

Hey, I figured out what your problem is, where you went wrong. You think that a string and a bunch of bytes are the same thing. They're not. If you have a bunch of bytes, treat it as a bunch of bytes. If you have a string, treat it as a string.
Java, for example, stores strings internally as UTF-16 (or UCS-2, opinions differ). .NET stores them internally as UCS-2.
This is also why there's a difference between CHAR and NCHAR in databases.
There is not a one-to-one mapping from a given string to a given set of bytes, because it depends on how you encode the string. Furthermore, some encodings have constraints on what input can produce a valid string. ASCII (plus non-standard high-ASCII) is not one of these encodings. UTF-8 (and all other Unicode encodings) are.
However, PEP 393 should've solved your particular problem (in Python 3.3), by allowing you to store these unicode-invalid "strings" internally as ASCII. Have fun in code-page land.

--
Karma: Poor (Mostly affected by lame karma-joke sigs)
Re:Interoperating with invalid data by godefroi · 2014-03-20 02:43 · Score: 1

Pretty much every string operation is going to require decoding. Things like substr(), replace(), split(), join(), etc are all going to require decoding the string.

--
Karma: Poor (Mostly affected by lame karma-joke sigs)
Re:Interoperating with invalid data by spitzak · 2014-03-20 05:49 · Score: 1

Aha! Somebody who really does not have a clue.
No, substr() does not require decoding, because offsets can be in code units.
No, replace() does not require decoding, because pattern matching does not require decoding, since UTF-8 is self-synchronizing.
No split() does not require decoding because offsets can be in code units
No, join() does not require decoding (and in fact I cannot think of any reason you would think it does, at least the above have beginning-programmer mistakes/assumptions).
Re:Interoperating with invalid data by spitzak · 2014-03-20 05:59 · Score: 1

Stupid software that thinks it has to convert to UTF-16 is about 95% of the problem.
UTF-16 cannot losslessly store invalid UTF-8. It also cannot losslessly store an odd subset of arrangements of Unicode code points (it can't store a low surrogate followed by a high surrogate, because this pattern is reserved to mean a non-BMP code point). It also forces a weird cutoff at 0x10FFFF which a lot of programmers get wrong (either using 0x1FFFF or 0x1FFFFF). UTF-16 is also variable sized and has invalid sequences, thus it has NO advantages over UTF-8, so the entire scheme is a waste of time.
Unfortunately a bunch of people are so enamored with all the work they did to convert everything to 16-bit that they are refusing to admit they made a mistake. One way is to declare invalid UTF-8 as throwing errors and thus make it virtually impossible to manipulate text in UTF-8 form. Note that they don't throw exceptions on invalid UTF-16, care to explain that??? HMM????
UTF-8 can store all possible UTF-16 strings losslessly (including lone surrogates which are considered "invalid" in UTF-16), as well as storing invalid UTF-8. It can encode a continuous range of code points from 0-0x10FFFF, or 0x1FFFFF with a trivial change (it can do up to 0x7FFFFFFF if you use the original UTF-8 design).
PEP 393 does NOT solve the problem. The "ascii" is limited to only 7-bit characters and thus cannot store UTF-8 (valid or not).
There is a "utf-8" entry in the PEP 393 strings but it appears current design requires it to be translated to UTF-16 and back to UTF-8 to store there, thus disallowing invalid strings. My proposal is that converting bytes to a string copies the data unchanged to this UTF-8 storage, and checking for encoding errors be deferred until there actually is a reason to look at Unicode code points, which is VERY VERY RARE, despite the impression of amateur programmers. I also propose some small changes to how the parser interprets "\xNN" and "\uNNNN" in string constants so that it is possible to swap between bytes and "unicode" strings without having to change the contents of the constant.
Re:Interoperating with invalid data by godefroi · 2014-03-25 02:45 · Score: 1

Maybe you should design your own platform where strings will be represented internally as UTF-8. It would be an interesting exercise.

--
Karma: Poor (Mostly affected by lame karma-joke sigs)
Re:Interoperating with invalid data by godefroi · 2014-03-25 02:50 · Score: 1

Well, yeah, but that would completely change the way these things work. What if your split() worked on code units, and you broke up a code point? That certainly wouldn't produce results that anyone would consider optimal, or even useful.
You can continue to pretend that byte arrays are strings, and strings are byte arrays, but you're not going to get anywhere. The rest of the world decided that we want a useful abstraction over the underlying data structure. When we're working with strings, we care about characters, not bytes.

--
Karma: Poor (Mostly affected by lame karma-joke sigs)
Re:Interoperating with invalid data by spitzak · 2014-03-25 06:34 · Score: 1

Maybe you should design your own platform where strings will be represented internally as UTF-8. It would be an interesting exercise.
FLTK and Nuke, and the project I am doing at R&H all use UTF-8 with tolerance for encoding errors for all internal storage. It is really easy, far easier than dealing with two types of text.
About 90% of the work is to get around default converters in Python and Qt that screw up the UTF-8.
Re:Interoperating with invalid data by spitzak · 2014-03-25 06:40 · Score: 1

Oh no! What if your split() worked in Unicode code points, and split a combining pair? What would you do, surely your computer will instantly self-destruct in a devastating explosion! What if your split() split an english word in two? What if your split() cut a UTF-16 surrogate pair in half (which EVERY single alternative to UTF-8 does!!!!!!) Yike! Disaster! Um, well, maybe not...
Stop making up non-existent problems.
1. Splitting is done after pattern searching. It is TRIVIAL to make your pattern search (which is likely doing something like "find the next space") only find full UTF-8 code units. In fact it will help get you to write stuff that matches more complex structures such as combining pairs.
2. If you are splitting at totally arbitrary points, it is because you are copying the data to a fixed-sized buffer. Virtually every use of this later pastes the contents of the buffers together (think of buffered file I/O) and thus it is harmless.
3. This splitting is 100% detectable because *both* ends will be invalid UTF-8.
4. For some reason nobody seems to worry about this for UTF-16. Hmmmm, I wonder why?

Chinese or Hindi by tepples · 2014-03-19 07:48 · Score: 1

various internal applications and systems not needing to support more than one language

If a program supports just one language, and that one language is either standard written Chinese or Hindi, then how should it work with even one language without Unicode?

with, try/finally, and long-lived resources by tepples · 2014-03-19 08:06 · Score: 1

One common design pattern for ensuring that unmanaged resources get released is try/finally, which works with JVM, CLR, and CPython, but doesn't work so well with resources held longer than one method.

And this is why it is best practice in Python to use the with statement

This is explicitly a syntactic sugar for the try/finally approach (PEP 343) and shares some of its disadvantages. Like the lifetime of a resource allocated in try and cleaned up in finally, the lifetime of a resource allocated at the start of a with block and cleaned up at the end of said block is limited to the lifetime of that block. If you allocate an object in one method, such as the __init__ method of a class, and retain it, how do you ensure that it gets freed? For example, a class that reads a stream of objects out of a file may implement the iterator protocol: open the file in __init__, return an object in __next__ (next in Python 2), and close the file when?

Deallocating resources when the caller breaks by tepples · 2014-03-19 10:47 · Score: 1

If you are writing an object designed to work with with, you do the cleanup in its __exit__() method function.

So how does the object ensure that every other object that owns it wraps its allocation with with?

sugar can be sweet. I think it's an improvement

I agree with you that with is a syntactic improvement. I was arguing about program semantics that make with necessary, especially when object lifetimes don't correspond directly to program blocks.

You would close the file inside __next__() when you hit end-of-file and right before you raise the StopIteration exception. Am I overlooking something?

You are overlooking the case in which the caller stops using the iterator before the iterator completes, such as a for loop that hits a break statement or an exception not caught within the loop.

Re:Guido is the problem by tralfaz2001 · 2014-03-19 11:51 · Score: 1

Oh, does your highly effective, widely popular scripting language not suffer from this problem. Thank goodness Python has a benevolent dictator, otherwise it might end up with a hack at the helm that tries to satisfy every whining idiots wish feature, and you end up with something as horrid as C++. But Stroustrup is such crowd pleaser.

I've been using Python since 1.5, and I've always considered it my secret weapon to get things done faster than anyone thought it could be done. All while producing code that is easy to read and maintain, unlike the popular scripting disaster at the time that was called Perl. Over the years its usefulness has only expanded to areas I would have never expected. And so it remains as my not so secret weapon to this day. Is it perfect, no, no language is. Like all languages it has its place where it works well, and plenty where its a bad choice.

Most griping in this thread are by people that clearly have not used Python for anything significant, but have heard about the GIL issue, and feel they must whine that their favorite language is not more popular. The GIL issue can be dealt with in a number of ways, Jython being my favorite. The GIL has never been an issue in anything I've done with Python, for two reasons. One I've never used Python where that would be an issue, and two when I have chosen Python, I designed code so it would not pose a problem. It's a bit crazy, but this seems to work.

average([10, 11]) by tepples · 2014-03-19 14:24 · Score: 1

def average(seq): return sum(seq) / len(seq)

Or in Python 3.4 or later: from statistics import mean as average

Can you implement a simple average() function in Python 3 that satisfies the specifications mentioned above?

I'm not clear what average([-10, -11]) should return under your specification. Or would it raise a ValueError? In general, the average of integers is not an integer but a rational, because integers are not closed under mean. Mean would have to promote to a type capable of representing (or at least approximating) rationals. Python has no built-in exact rational type, but it does have float.

Re:average([10, 11]) by gnupun · 2014-03-19 20:53 · Score: 1

Or in Python 3.4 or later: from statistics import mean as average
I don't have 3.4, but it would be better if you could implement average using just the sum() function just so you can experience the split personality of the division operators in python 3.

I'm not clear what average([-10, -11]) should return under your specification.
As mentioned in the spec, int list should return an int. There's no confusion and no rationals because most computer programs either deal with integers or floating point numbers (rationals are extremely rare). So your function should return func([-10,-11]) --> (-10-11)/2 = -11 under the rules of integer division.

average([10, 12.0]) by tepples · 2014-03-20 04:59 · Score: 1

As mentioned in the spec, int list should return an int.

Please clarify the spec further: If the list contains an int and a float (such as average([10, 12.0])), should the result be returning a float or raising a ValueError? I admit that my process of requirements elicitation sounds like I'm asking a lot of nitpicky questions, but it's necessary to avoid an underspecified function. "What Python used to do prior to the addition of true division in 2.2" is still underspecified if the function is ported to any language other than Python.

So your function should return func([-10,-11]) --> (-10-11)/2 = -11 under the rules of integer division.

Until now you have left "the rules of integer division" unspecified. Floor division in Python 2.2+ rounds toward negative infinity, but C integer division rounds toward 0. So you've added to the spec.

Re:average([10, 12.0]) by gnupun · 2014-03-20 06:49 · Score: 1

Please clarify the spec further: If the list contains an int and a float (such as average([10, 12.0])), should the result be returning a float or raising a ValueError?
No need to care about mixed types. But if you want to be that specific:
a) if all elements are ints, result should be an int
b) if any element is a float, result should be a float
c) if any element is complex, result should be complex.
Rule (c) has higher priority over (b), which has higher priority over (a).

Until now you have left "the rules of integer division" unspecified. Floor division in Python 2.2+ rounds toward negative infinity, but C integer division rounds toward 0. So you've added to the spec.
I don't care what method you use, but the result should match whatever is the output a function implemented in C/C++, Java, C# function/method that averages an "int array."
For eg:

public static int average(int[] arr) { int sum=0; for (i = 0; i < arr.length; i++) { sum += arr[i]; } return sum / arr.length; }

Implementing this function in C/C++, C# is very similar. Your function's result should match the result of the Java average() method shown above.

average([2000000000]*4) by tepples · 2014-03-20 10:33 · Score: 1

If your spec is "do what Java does", then Java integer division rounds toward zero, and Java integer addition wraps within (signed) 32 bits. This means the naive Python 2 implementation gets it wrong too, and we're back to square one where we're scanning the list for being all ints and using the "act like Java" method if no float or complex types are found in the array.

125 of 196 comments (clear)