spitzak · Slashdot Mirror

Re:2.7.4 on Python Family Gets a Triplet Of Updates · 2013-04-08 13:12 · Score: 1

You seem to have forgotten that a string is an ARRAY. Adding a multiple of 64 bits of garbage to an array of 64 bit doubles, you will have a larger array of 64 bit doubles (in case you can't figure out how to do this in your favorite language, imagine binary i/o of a data structure containing such an array). I am unaware of any language where it turns into a "not a 64 bit double array" because one of the doubles has an invalid pattern. In fact all I/O in modern systems prints "NaN" or even the entire bit pattern in hex for invalid numbers, because the programmers realize that this "bad" data is important and has to be preserved!

So your attempt to find an example is completely wrong and further proves my point.

Re:2.7.4 on Python Family Gets a Triplet Of Updates · 2013-04-08 10:38 · Score: 1

How to encode invalid UTF-8 into UTF-16 for NTFS filenames

Python itself is the originator of the most popular mapping that preserves errors. In this case the error bytes (which are only going to have the high bit set and thus in the range 0x80-0xFF) are mapped to the UTF-16 code units 0xDC80..0xDCFF. These are unpaired surrogates and thus technically the UTF-16 string is invalid as well, which is a nice property.

These are not unique mappings, as most UTF-8 to UTF-16 encoders will convert the 3-byte UTF-8 encoding of these code points to the same code. The Unicode standard suggests that these 3-byte sequences be considered errors, but I am not convinced this is a good idea, as it will break data stored by CESU (broken UTF-8 encoders that did not decode UTF-16 to code points before encoding UTF-8). Since NTFS already maps more than one string to the same result (for instance '\' and '/' map to the same file, and trailing spaces are ignored), I'm not sure if the multiple mapping is a real problem. You are correct that the only 100% proper solution is to preserve the original byte values, but there are a zillion other problems with Windows file systems and this seems pretty minor.

Somewhat more dangerous but perhaps more user-friendly is to convert they error bytes to the equivalent Unicode character from the CP1252 code page. This would preserve compatibility with the existing 1-byte api in Windows except in the incredibly rare occurance that a CP1252 sequence of characters matches a valid UTF-8 sequence.

HFS+ does not use UTF-16, it uses UTF-8. It has problems with enforcing normalization, however. This is similar to all the bugs caused by case-folding on older Windows file systems (they should instead match files using normalization, but preserve the normalization, similar to how modern Windows file systems preserve case).

Re:2.7.4 on Python Family Gets a Triplet Of Updates · 2013-04-07 18:07 · Score: 1

You need to know that strings in other encodings are in those encodings, however allowing the string to contain an arbitrary sequence of bytes, rather than limiting it to valid UTF-8, will allow you to defer the decision about handling the alternative encoding until usage of the string, rather than having to do it during storage and copying. This is usually a lot easier and failures are more predictable and understandable.

It is also extremely easy to distinguish UTF-8 from another encoding, because a huge majority of patterns with the high bit set are not valid UTF-8 (96% for 2 byte sequences, and higher for higher numbers of bytes). So if there is only one alternative encoding it is probably very possible to detect this at run time without an encoding id, as it will have a greater number of errors than correctly-encoded non-ASCII UTF-8. This does not help if there is more than one non-UTF-8 as they are difficult to distinguish from each other.

Re:2.7.4 on Python Family Gets a Triplet Of Updates · 2013-04-07 14:05 · Score: 1

A UTF-8 string with an "error" in it is NOT a "not an UTF-8 string object". It absolutely is a UTF-8 string but it has an error in it. This should be plenty obvious with simple transforms: append an error to a UTF-8 string can't possibly change it's type to "not UTF-8" because if you take a substring near the start it suddenly IS UTF-8 again, which is not true of a byte array which would remain a byte array!

More to the point, if there is a mistake it is MUCH more useful to display as much of the string as possible as UTF-8, with error indicators for the bad bytes. Suddenly turning ALL your text into garbage because of a single byte error is incredibly stupid!

And there is a very clear and unambiguous interpretation of invalid UTF-8. The Unicode standard even defines some explicit rules, in particular that an error must not consume any of the next sequence that forms a valid code point. Python even defines an error much more explicitly in some of the converters, an error is one byte, with the parsing continuing by trying the next byte. It even has a somewhat-data-preserving conversion to UTF-16 where it turns these 127 possible errors into 127 different "errors" in UTF-16 (unparied surrogates in the range 0xDC80..0xDCFF). This was done because at least a few people working on Python realized that the strings are USELESS if we cannot defer checking for errors until after our data manipulations are done.

What Python should do in 3.3 is support arbitrary byte arrays as UTF-8. This can be done in the 3.3 version by allowing arbitrary bytes to be in the UTF-8 setting, and adding an iterator for the string so you can look at each code point without copying it to a UTF32 array, and the iterator can also return more information that a code point number, such as the fact that it is currently at a UTF-8 or UTF-16 error. Iterators would also allow you to visit the code points in various normalization forms which would help a lot in avoiding string copying.

Re:2.7.4 on Python Family Gets a Triplet Of Updates · 2013-04-07 09:40 · Score: 1

Concrete example: You have a filesystem that stores filenames as strings of bytes and any byte sequence is supported. You want to encourage use of UTF-8 for these filenames. Figure out how to write something in Python that can rename a file with a "bad" filename (ie invalid UTF-8) to have a "good" filename that is valid UTF-8. You must use the Unicode api.

You will quickly find that this is impossible. Instead you have to use some other API, one THAT IS NOT UNICODE. That is absolutely the wrong direction. Now you need two versions of your filename renamer. Perhaps you need four versions if you want to be able to pass Unicode to any name that is valid Unicode, rather than being forced to use raw bytes.

If Python just did not complain and passed invalid UTF-8 around, we could stop using other encodings. With the 3.3 strings with a UTF-8 option this is entirely possible and cleanly done.

Re:2.7.4 on Python Family Gets a Triplet Of Updates · 2013-04-07 09:35 · Score: 1

If you have a string of bytes that MAY be UTF-8, it can be used in an incredible number of ways. For instance it can be the name of a file. It can be data in that file. It can be copied from one file to another. It can be passed to some code that checks to see if it is UTF-8.

You seem to be under the impression that somehow I want ASCII. That is absolutely false, and the problem is that this refusal to handle invalid sequences is by far the biggest thing FORCING the use of ASCII (or other 8-bit codes where there are no "errors" so you can be pretty safe using the strings for filenames). Your belief that somehow the string cannot be used unless it "conforms to an encoding scheme" is one of the biggest impediments to I18N. No other area would somebody say that blocks of data must conform to a regexp JUST TO BE COPIED!!!! That is insane, but that seems to be what we get when morons get ahold of Unicode.

Re:2.7.4 on Python Family Gets a Triplet Of Updates · 2013-04-07 06:26 · Score: 1

A killer with Python 3 is the stupid handling of strings.

A string should be a sequence of byte values. If they *happen* to be UTF-8 then it "is unicode" but I should not be prevented from using my string just because it does not match a complex pattern called "valid UTF-8". Nobody in the history of computer science would ever have made a scheme where the simple act of *looking* at data that has been sucessfully read from an outside source can throw an exception, but for some reason Unicode and UTF-8 turns otherwise intelligent people, such as Guido, into the most incredible morons or idiot savants.

At least Python 2 does not mangle the strings when you do this, though you have to be careful when you actually want to print them. Python 3 actually makes it impossible to store invalid UTF-8 in a "string". This is counter-productive, as we are forced to use byte arrays for all strings, including ones that are valid UTF-8. That is NOT encouraging use of Unicode, it is going backwards!

Re:2.7.4 on Python Family Gets a Triplet Of Updates · 2013-04-07 06:19 · Score: 1

It seems to me that a simple handling of a space after the first word is all that is needed. The following statement:

a b

is treated exactly the same as

a(b)

Then old print would be supported since it would turn into the new print() function. It would also let "help foo" work, which is something I keep typing wrong for some reason.

I'm sure there is some odd interactions with the parsing of some obscure syntax that need to be figured out, but since all the obvious tests I can do produce syntax errors right now, I suspect rules can be made so this works for all the uses people want.

Re:batteries are not rechargable on Israeli Firm Makes Kilomile Claims For Electric Car Battery Tech · 2013-04-04 06:35 · Score: 1

It sounds like the electricity is "stored" in the fact that there is pure aluminum in there. Therefore the "electrical generation" is really "aluminum smelting" (which does use a lot of electricity, so this makes sense) and the manufacture of the battery. Basically the battery comes out of the factory fully-charged. It also sounds like the used battery is pretty much really clean aluminum ore and thus the recycling would be done at the same place. Both could be powered by your nuclear plants.

Re:Unanswered questions... on Remote Desktop Backend Merged into Wayland · 2013-04-03 09:19 · Score: 2

Does a GUI have to be running on the remote server?

No, the clients on the remote machine will be talking to a different Wayland server, one that only sends RDP to a remote display.

Yes there is special code running on the remote machine, but even X forwarding requires xlib and the code that translates a socket connection to actual network packets to be running on the remote machine. And modern programs require freetype and cairo and libpng and lots of other code that you think of as "GUI" running on the remote machine. X is not magical despite what some here seem to believe.

Re:Fahrvergnügen on A German Parking Garage Parks Your Car For You · 2013-03-28 07:08 · Score: 1

I remember in England the speed camera locations are clearly marked by patterns painted on the roads. I took a taxi in the early morning to the airport and the driver pretty much slammed on the breaks as he entered these areas (which I thought were very long pedestrian crossings) and accelerated afterwards, which is how I learned this. There are also signs beforehand showing a camera.

Makes sense if the purpose of the cameras is to actually get the speed down, rather than to collect revenue.

Re:Makes sense to me on PlanetIQ's Plan: Swap US Weather Sats For Private Ones · 2013-03-26 04:30 · Score: 1

Why should the USA's taxpayers be funding the weather data collection for the entire globe

For an added cost they will design new technology so that the satellites only orbit over the United States, and special barriers so that weather outside the United States has no effect on weather inside it.

Re:Barn Door on Political Pressure Pushes NASA Technical Reports Offline · 2013-03-21 13:12 · Score: 2

According to the movies, when you delete a file it vanishes simultaneously from every screen that is viewing a copy of it. I'm sure that is what the people who proposed this are relying on.

Re:Don't Be Sony on GoPro Issues DMCA Takedown Over Negative Review · 2013-03-21 13:04 · Score: 1

He's trying to avoid the DMCA take-down notice.

Re:Communism failed? on CCTV Hack Takes Casino For $33 Million · 2013-03-18 10:36 · Score: 1

I think it has been "attempted". Even Marx said that you had to go through a government-controlled stage before the workers paradise would magically appear. We have several cases of the government-controlled stage and I think these all count as "attempts". The fact that they did not turn into worker's paradise means the attempt failed, not that it did not happen. This is like saying that perpetual motion was "never attempted" because every experiment stopped some point before perpetual motion happened.

PS: "Libertarian Paradise" has also never been "attempted" either. That is because it is just as much of a fantasy as pure communism.

This is all just a lot of posturing on Defcad.com Wants To Be the Google of 3D-Printable Guns · 2013-03-12 06:10 · Score: 1

1. There are plenty of people with access to a machine shop and the correct skills to build a gun right now. And they can build *all* the parts, including ones exposed to gasses and pressures different than ambient air. This adds nothing new.

2. If in fact home 3D printing gets to the point that you can actually manufacture a working gun (not just a "part") then it is also going to be able to manufacture replacement car parts, replacement parts for other machines, or entire machines. Then they are going to get attacked by the people who own the copyrights and trademarks on those designs who are relying on their control of the design for their own profit. I predict these guys will fold in a week and remove all trademark-infringing content, which will probably put in perspective the relative power of the "evvvvil govmint is trying to take my guns" verses other things in the current world.

Re:Microsoft docs on Developers May Be Getting 50% of Their Documentation From Stack Overflow · 2013-03-05 13:06 · Score: 1

Of course, you're more likey to get a "sprintf(buffer, "DEL %s", filename); system(buffer)" type answer from Stackoverflow...

That is a great example of bad stuff you can get from Stackoverflow. When the only answer is something like this, I cannot tell whether it was posted by an idiot, or posted by a really smart person who knows that this is the *only* method available (but didn't bother to say this fact, because to him it is "obvious").

Re:Blame Google on Developers May Be Getting 50% of Their Documentation From Stack Overflow · 2013-03-05 12:59 · Score: 1

I do this too. I long ago gave up on having the search box on MSDN ever go anywhere I expected, and just type the same query into google and pick the msdn link.

Surely they could improve this. Just copy some tech from bing or something...

Re:A planet full of windfarms could power half the on Study Suggests Generating Capacity of Wind Farms At Large Scales Overestimated · 2013-02-25 13:06 · Score: 3, Informative

Huh? The article says 'If we were to cover the entire Earth with wind farms, he notes, "the system could potentially generate enormous amounts of power, well in excess of 100 terawatts"'.

Re:What is this nascar? on NASCAR Tries To Squelch Video of Spectators Injured By Crash · 2013-02-24 12:46 · Score: 1

No, it's network attached storage car analogy.

Re:Gross? on NASCAR Tries To Squelch Video of Spectators Injured By Crash · 2013-02-24 12:38 · Score: 1

Don't be an idiot. The audio pretty clearly has a "Oh! Here we go!" and "alright" and many other exclamations of joy which are clearly timed to when the crash started, long after the cars became visible. The announcer on the speaker however seemed to be much more into the actual racing and shut up when the crash started.

Asteroid! on Planetary Resources To 'Claim' Asteroids With Beacons · 2013-02-20 12:17 · Score: 1

As shown in this Super-8 film made in about 1980, the claim beacon can be defeated by blocking it's radio transmissions. But if the miner has a nuclear license then watch out!

Please excuse the bad acting:

http://www.youtube.com/watch?v=mPaPe3aJEPI

Re:charge trains?? on Wirelessly Charged Buses Being Tested Next Year · 2013-02-19 08:35 · Score: 1

Maybe it can still accelerate, just slowly.

Re:Why not popular? on Wirelessly Charged Buses Being Tested Next Year · 2013-02-19 08:32 · Score: 1

Are you sure the parking is free?

Even if validated with a 3 hour time limit, such a parking lot in the city is going to fill to capacity immediately and Best Buy is not going to be getting any of those customers.

Re:That's funny.... on Are Plastic Bag Bans Making People Sick? · 2013-02-18 06:53 · Score: 3, Insightful

I agree something is fishy about this. Where I am the ban does not cover the handle-less cellophane bags that are on a big roll in the produce department. Virtually everybody uses these for produce. I think the cashiers would be very unhappy if you brought loose produce to the checkout, at least for items that can be contaminated this way (ie I don't put a pineapple in a bag).

Slashdot Mirror

User: spitzak

Comments · 5,741