John+Millikin · Slashdot Mirror

← Back to Users

User: John+Millikin

John+Millikin's activity in the archive.

Stories: 0
Comments: 2
First seen: 2008-09-09
Last seen: 2008-10-03
Profile: (view on slashdot.org)

Comments · 2

Re:String f**k up on Python 2.6 to Smooth the Way for 3.0, Coming Next Month · 2008-10-03 12:24 · Score: 4, Informative

Spoken like somebody that's never had to deal with encoding issues. Using UTF-8 internally is fine, but exposing it to the programmer is insane and error-prone. And if the programmer then proceeds to manipulate that raw byte buffer as a string, he's an idiot.

The proper solution is to use 8-bit strings, but any functions that care (such as I/O) should treat them as being UTF-8. Most functions do not care and thus the treatment of "Unicode" and "bytes" are the same.
You might not be aware of this, but computers are used for more than just transmitting text. I don't want my binary streams being rewritten to gibberish because some I/O routine was written to be too clever. Furthermore, not every system uses UTF-8. Some may even need to send data over a *gasp* network! Good luck getting every other computer in the world to start using UTF-8 immediately.

The problem with UTF-16 is you cannot losslessly convert a string that *might* be UTF-8 to UTF-16 and then back again. This is because any illegal UTF-8 byte sequences will be lost or altered.
If you try to convert bytes that aren't in UTF-8 using a UTF-8 codec, an error will be raised. This behavior is proper -- if you don't know what format your input is in, there's no way to perform text-based operations on it.

This has been a real pain so far in our use of Python, and I am quite alarmed to see that they are changing the meaning of plain quotes in 3.0 to "Unicode".
Every developer I know uses Unicode strings already. The new behavior is just one less character to type in front of literals.

This is really a serious step backwards, as we will be forced to tell anybody using our system to put 'b' before all their string constants
Otherwise said as: "We're too stupid to fix the glaring encoding errors in our product, so we'll just use bytes everywhere and pretend it's all working". Also, Unicode strings in Python are implemented with either UTF-16 or UCS-4 depending on platform.
Re:The real question is ignored here... on Why Mozilla Is Committed To Using Gecko · 2008-09-09 16:06 · Score: 2, Informative

This article ignores the real question: Why change? I personally see nothing 'outdated' or 'bloated' about Gecko, and there is no point in changing if Webkit provides no real advantage.
Have you ever tried to embed Gecko into an existing program? It's an absolute nightmare. All popular Gecko-using applications are actually written *in* XPCOM because it's easier to write an entire browser in XUL and Javascript than try to bind Gecko to a sane language like C or Python.