spitzak · Slashdot Mirror

Re:So? on Ruby 1.9.1 Released · 2009-02-01 04:05 · Score: 3, Interesting

An ASCII string can be mapped to a Unicode string. For each byte in an ASCII string there is a matching Unicode symbol.

How should string compare work? Well for simple ==, I think it should fail if the encodings are different, as they really are different. It can also fail if the two strings encode the same Unicode but in different ways (this is possible in some encodings).

There can however be a more complex call that converts both strings to Unicode and compares them. One huge problem with most current imlementations, including Python, is absolutely brain-dead (and "politically correct") handling of invalid UTF-8, where invalid encodings throw errors, which makes use of UTF-8 actually impossible for non-trivial programs. Instead it should never throw errors. Error bytes should represent something unique in Unicode, one popular proposal is U+D8xx (which is also an "error" in UTF-16).

A problem Python (and a lot of other programs have) is that they think "Unicode" means "some sort of encoding where each Unicode symbol takes the same space". This requires 21 bits per symbol, the only practical way to support this to use 32 bits. Almost immediatly they run into difficulty in that Windows uses UTF-16, so they punt and use UTF-16, basically abandoning their entire idea. They should instead use UTF-8 which has enormous advantages: UTF-8 errors are not lost and can be translated differently when finally used (ie for display turning them into CP1252 is better, but for a filename turning them into the above error codes is better), all other encodings (which are byte based) can be stored in the same object, and no translation is needed when you load UTF-8 data. Also UTF-16 (including invalid UTF-16) can be losslessly translated to UTF-8 and back, so there is no problem supporting Windows backends either.

My mom on Happy 25th, Macintosh! · 2009-01-23 13:33 · Score: 1

I remember borrowing one from work so I could show my family how a gui would work.

My mom had an interesting misunderstanding of the mouse: she kept moving it as though it moved the screen, while the pointer would stay still, ie she moved it backwards from how you would. This seemed quite logical to her and she did it repeatly.

I think that is interesting that a lot of things are not as intuitive as you think.

Re:Not a vulnerability on Trojan Hides In Pirated Copies of Apple iWork '09 · 2009-01-23 07:37 · Score: 1

As the parent quite rightly pointed out, the "nagging" is due to the application writers, not Microsoft. If the applications stopped doing things that required Admin privledges the UAC would stop popping up.

You could say that Unix had this problem too (mostly fixed nowadays, btw): lots of programs had the setuid bit turned on. This was because they needed to do one little thing (a typical one was to update utmp) that required root, and the programmer was too lazy to either work around it or to fix the system so root was not needed (for the utmp example, a solution would be to just not bother doing it, who cares if "who" does not show all your terminal emulators?). I think there was even a UAC equivalent: programs would literally quit with an error saying "please change the executable to setuid".

Re:Quicklaunch? on Windows 7 Taskbar Not So Similar To OS X Dock After All · 2009-01-23 06:46 · Score: 1

As I mentioned in the other reply, the huge change is that the running icon *replaces* the launch icon. This is certainly copied from OSX. The icons you are talking about are just more start menu and not part of the dynamic section of the taskbar.

Re:Oh come on, now on Windows 7 Taskbar Not So Similar To OS X Dock After All · 2009-01-23 06:43 · Score: 1

The *replace with the running application* is what makes these part of the taskbar and thus copied from OSX. I never considered those icons to be part of the "taskbar", they are more part of the "start menu" section. In fact can't you make them pop up submenus?

Enabling the text is an excellent idea as it will make it possible to distinguish running and non-running apps, and even know which document is being worked on. Microsoft should switch to this and probably scrap their current scheme for showing the running apps (it is seriously flawed as a non-running app between two running ones looks nearly identical to three running ones in a row). Programs that really think they work without text can set the text to a blank string.

I think Microsoft should get all those indicators (the battery, etc) into the taskbar as well. They really do indicate running programs. This would be much better and consistent than the Mac which put these things on the menubar. They should also maybe take some ideas from Gnome and make the "start" be a sort of running application. Gnome kind of has the right idea but they think the "all the visible windows" is a single "thing" to put on the taskbar, when instead it should be a dynamic collection of many "things", one per window.

Re:Windows itself is a vulnerability. on US-CERT Says Microsoft's Advice On Downadup Worm Bogus · 2009-01-22 16:14 · Score: 1

I'm saying that if the worm can execute "bash foo.sh" it can also execute "rm -rf ~" and other bad stuff.

Oh come on, now on Windows 7 Taskbar Not So Similar To OS X Dock After All · 2009-01-22 15:12 · Score: 4, Insightful

The obvious change in the new Windows Taskbar is that there are icons for non-running-applications. I don't care how you try to word it, that is the major difference between the OSX Dock and the Windows Taskbar. So Damn right it is copying it.

But is that really bad? Yes they copied good ideas, and perhaps made their own improvements to it. But that is how we get better software! Is this somehow wrong when Microsoft does it? You mean you really want Look & Feel Patents and Lawsuits? Don't be idiotic!

And the Microsoft astroturfers should not be showing such knee-jerk stupid reactions. Why not say *proudly* "we copied good ideas and improved on them even more!" instead of convoluted arguments that somehow they did not copy it.

Re:You'll still have to keep ahead of the tide on US-CERT Says Microsoft's Advice On Downadup Worm Bogus · 2009-01-22 11:06 · Score: 1

I would think any such browser exploit could be more easily taken advantage of by just putting the page on the web and getting people to visit it.

Re:Hmmm... on US-CERT Says Microsoft's Advice On Downadup Worm Bogus · 2009-01-22 08:45 · Score: 1

Good point, if it appears to work on current ones. Do those flags help, or is it mostly that the file is a directory? I would think if the virus writer did the work to get rid of a directory, they would also think to ignore any protection those flags provide. If those flags do provide added protection against rmdir, it might help to *not* turn them on, as maybe some will notice that they have to delete the file but not think to do whatever is needed to avoid the flags. Then you can use the flags later once you start seeing ones delete the file.

I'm guessing that just a regular file with the hidden bit set will not work, as that is something the virus writers are doing already, and each virus wants to wipe out the other ones and put their own on.

Re:What DRM is that? on US-CERT Says Microsoft's Advice On Downadup Worm Bogus · 2009-01-22 06:49 · Score: 1

You (and perhaps some other astroturfers) keep coming up with the bogus argument that "without DRM there would be no HDDVD playback on Vista".

How about this for a scenario: Microsoft could have said "fuck you we will output unencrypted all the time because it will make our product a good deal more reliable, faster, and useful. If BluRay does not like it, well I think HDDVD might be happy that only their disks play on Windows computers".

The HD consortium would have rolled over in a minute and we would not have DRM cluttering up the Windows drives and we could have working switches between the computer and the montior.

It is fascinating how you people somehow ignore this possibility. Working off a script, I guess?

Re:Hmmm... on US-CERT Says Microsoft's Advice On Downadup Worm Bogus · 2009-01-22 06:35 · Score: 1

What makes you think the virus writers did think to delete the file before trying to write the new one. Are you assuming they are stupid?

Re:You'll still have to keep ahead of the tide on US-CERT Says Microsoft's Advice On Downadup Worm Bogus · 2009-01-22 06:31 · Score: 1

I think autorun could be replaced with "auto open this page in your browser". That would put the full protection the browser has against stuff on the web between your machine and whatever is on the disk. Most likely the best name for the file is index.html in the root of the mounted file system.

It would also be portable between operating systems, which is why Microsoft will never implement it...

Re:Windows itself is a vulnerability. on US-CERT Says Microsoft's Advice On Downadup Worm Bogus · 2009-01-22 06:25 · Score: 1

"bash foo.sh" is itself a script that needs the executable bit set, so it does help.

However I think the -x excuse for security on Linux is bogus.

First of all, it was not designed for that. It was designed so that a shell could quickly read all the executable commands into memory so that they could be instantly located when the user typed a command, and memory was limited so adding this so that non-executable commands were thrown out immediatly helped. If it really was a security mechanism I think bash itself might insist on foo.sh having the bit set.

Second if it were not for seeing what happened with Windows, I'm sure the people writing Firefox or Mozilla would have, without a second thought, added a feature so any downloaded executable file got the bit set for you. Or people writing finders would have made it so double-clicking, once it identified the file as an executable or shell script, would conveniently turn on the bit for you. This would have been considered a way to make it more user friendly.

Re:All modern desktop distros are easy on The Secret Lives of Ubuntu and Debian Users · 2009-01-18 15:50 · Score: 1

IMHO a far worse problem is that it did not produce an error message, than the reason it could not delete the files. That is inexcusable.

Re:This was not very good, Ubuntu on Ubuntu's Laptop Killing Bug Fixed · 2009-01-17 18:55 · Score: 5, Informative

FIX UBUNTU HARD DISK CYCLING HOW-TO: The laptop_mode command does the right thing, so most of this is to get it called everywhere it needs to be, and to remove calls that mess with the hdparm settings and thus defeat laptop_mode. There are claims that "laptop mode" causes problems, but this does *not* enable it. The program "laptop_mode" does other stuff besides the problem part. That is controlled by a line in /etc/laptop-mode/laptop-mode.conf, where you can individually set it on/off for battery, ac, and when the lid is closed. Change them all to zero there if you are worried. It works fine on my machine, however, and the battery lasts far longer now. 1. Edit /etc/laptop-mode/laptop-mode.conf and change correct line to read: CONTROL_HD_POWERMGMT=1 (this makes laptop_mode call hdparm) 2. Edit /etc/default/acpi-support and change correct line to read: ENABLE_LAPTOP_MODE=true (this makes power.sh run) 3. Edit /etc/acpi/power.sh Comment out or delete the 4 for...done loops containing $HDPARM commands. (this stops power-on from messing with the disks) And change the arguments to $LAPTOP_MODE from start/stop to "auto" in both cases. (this makes it run the laptop_mode command correctly rather than forcing the mode on and off) 4. Create /etc/pm/power.d/laptop-tools and make it read "exit 0" and then "chmod +x" it. (this stops suspend/resume from messing with hdparm settings) 5. Create /etc/pm/sleep.d/10laptop_mode_restart and make it contain the following: #!/bin/bash case $1 in hibernate) /etc/init.d/laptop-mode stop ;; suspend) /etc/init.d/laptop-mode stop ;; thaw) /etc/init.d/laptop-mode start ;; resume) /etc/init.d/laptop-mode start ;; *) echo Something is not right. ;; esac Chmod +x this file (this makes suspend/resume run the laptop tools) HOW TO TEST: This command will tell you how your disk is set: sudo hdparm -I /dev/sda | grep "Adv" The correct results to stop disk thrashing are 254 or 255. When laptop_mode is *really* on then the correct value is 1. If you see 128 then things are not working, this is the setting the disk resets to on suspend/sleep/power off. This command will tell you how bad you have trashed your disk (you may need to install "smartctl"): sudo smartctl -a /dev/sda | grep Load_Cycle_Count The last number is how many times your disk has parked. Over 10,000 is not good. Mine is 101187 before I finally got this fixed.

This was not very good, Ubuntu on Ubuntu's Laptop Killing Bug Fixed · 2009-01-17 17:37 · Score: 4, Informative

I followed the instructions on Ubuntu's forums (what a pain to locate the actual instructions) (I transcribed what I did and will post them).

The actual problem was that manufactures have messed with their drives and altered the head parking timeout into a "detect if windows went to sleep" method. Basically Windows writes to the disk *all the time* until it sleeps, so the best way to minimize disk use is to park the head almost instantly after any inactivity, as that will park it asap when it sleeps. Furthermore at least 2 manufactures used the timeout control as <= 195 == "on" and >195 == "off".

Ubuntu/Linux wrote a lot less often, but plenty anyway, like every 15 seconds (doing stupid stuff like writing log files). So the head unparked every 15 seconds.

The fact that Windows "worked" led a lot of people to think Windows was doing secret messing with the drives to turn on extra modes that were not in the documentation, and that Ubuntu could not be fixed until this secret was found. However I think somebody could have figured out that it was not doing anything, there were programs (ported from Ubuntu, apparently!) for reading the disk settings under Windows.

It was also known immediatly that setting the disk timeout to 255 stopped this. Who cares if this was not the "secret Windows setting", it was certainly better than how Ubuntu was working at that time. This was known the same day the bug was first talked about! Ubuntu should have immediatly patched it, but somehow the fact that this was not "ideal" caused them to delay for 14 months! That is really bad, guys! I "fixed" mine as best I could with a program I had to run every time I opened the lid (because some stupid startup thing kept turning the timeout back on, and the only way to run my program last was to manually run it!) I eventually decided to go through the hair of actually fixing it and killing off that other thing that tried to do it.

There seemed to be a bunch of conflicting programs, all of them trying to set the disk timeout to 128 or 2. You had to get *all* of them (see next posting for what I did). This is what made it Ubuntu-specific. I sure hope this patch straightens it out so exactly ONE service, and exactly ONE file in /etc, controls the disk timeout!

Yea you can blame Windows all you want, but this was really, really, bad!

And I sure hope the update (which I just did) did not get screwed up by trying to merge with all the changes I did. Have not really checked yet. What a PITA. If they had put out a patch immediatly then they would not have to patch systems that have a hundred different solutions on them.

Re:All modern desktop distros are easy on The Secret Lives of Ubuntu and Debian Users · 2009-01-16 06:41 · Score: 1

Can you explain exactly what was wrong? You not only had to use a terminal, but you had to use sudo as well, right? Do you know what the reason for the inability to delete (it could either be wrong user/group or wrong permissions, or some kind of bug in the file system). I do think it is annoying that you obviously have enough knowledge to know what was actually wrong but did not report it here.

To all the other posters, the fact that Nautulus (or whatever it is) did not produce any kind of error message is a REALLY BAD! There is no excuse for such stupid UI mistakes!

Re:Unicode on The Evolution of Python 3 · 2009-01-15 12:40 · Score: 1

Slicing would not be a problem if you cut at the iterator values. Also a huge amount of slicing is just to fit into fixed-size buffers that are glued together again later, this will not break UTF-8 if provisions are made to remember the parsing state between each block (this usually happens automatically, such as when writing to a pipe).

The main reason for creating constant strings containing UTF-8 errors is because they are actually ISO-8859-1 or CP1252 strings. You are probably right that requiring a 'b' before them is acceptable. Most other uses don't require string constants but just preservation of errors, such as "write a Python program to delete this file with an invalid UTF-8 name".

I would hard-code the meaning of \uXXXX in byte strings to mean UTF-8. Most other encodings of any interest are either 1-byte ones (and thus \xXX works and is already used by most programmers), there are the older Asian 2-byte encodings but I think it is safe to require encode() or literal \xXX sequences to quote them. The reason for this is to make it easy for programmers to change their constants between bytes and unicode when using UTF-8.

I would also make \xXX in Unicode strings mean "decode this as though it is UTF-8". The primary reason for this is because this is the only form that is portable to many languages such as C++. You are right that using encode() would work as well but because no compilation error is produced people are going to make mistakes without this.

Re:Your Goal: One Second or Less on Ubuntu 9.04 Daily Build Boots In 21.4 Seconds · 2009-01-15 10:13 · Score: 1

He meant that the machine may sit there for hours before somebody wants to use it and decides to login.

Re:Unicode on The Evolution of Python 3 · 2009-01-15 08:24 · Score: 1

Iterating is pretty easy in UTF-8. The problem is that many people think iterating is "increment an integer and then pass it to this function" and the only way to implement that is to iterate again all the way from the start of the string, which is inefficient. A "iterator object" would do the job and you could select from several iterators depending on whether you wanted combining characters, words, Korean syllables, etc.

The fact that lots of programmer think that you do string[++integer] to iterate is a huge problem and really the big obstacle to using UTF-8 or UTF-16. But it would help Unicode considerably if people were forced to use iterators, as they are the only way to correctly do decomposed characters and a number of languages. I think it would have been nice if Unicode had refused to implement precomposed characters to force people to do it correctly from the start.

"constant".encode() does seem to work. Concatenation of the results can be used if you need to make a string constant containing both UTF-8 errors and readable source of Unicode characters. I still feel this is a lot less obvious that just adding/removing the 'b' in front of the string (ie I did not think of it) and thus I still feel the string constant issue should be addressed.

Re:Ars Technica report on Qt Becomes LGPL · 2009-01-15 06:59 · Score: 1

I agree that if you think the binary API is unlikely to have to change, the LGPL is a good idea.

However for a big library like Qt, it is far more likely that the binary api will change each version. With C++ especially this is almost impossible to prevent (believe me I have tried and been a complete failure at it). In this case the LGPL is a huge hindrance to allowing people to use your library, and in fltk's case we ended up putting in a linking exception.

Re:Unicode on The Evolution of Python 3 · 2009-01-15 05:31 · Score: 1

I believe counting anything other than code units will lead to far more broken strings. The problem is that eventually somebody will use that number as an offset rather than calling the "count n unicode points" function. Far better to use offset at all times.

I do believe that using UTF-8 as an internal representation is a very good idea, but that few have realized it except for K&R in Plan9. The main reason is that it is the easiest way to preserve invalid strings and to avoid the many security and other bugs when a function maps more than one string to the same object.

However another potential solution is a lossless conversion from UTF-8 to UTF-16 such as utf-8b. This is problematical because a simple implementation will break the lossless conversion of UTF-16 to UTF-8, which in effect means you cannot use any UTF-16 api to your program any more. It may be possible to make lossless conversion both ways, but I have tried to figure this out and it is not easy. In any case even if Python switched to UTF-8, this is going to be necessary if we are going to work around Windows stupid decision to use UTF-16 for filenames.

Certainly with the variable length there is now zero reasons to choose UTF-16 over UTF-8, but UTF-32 I suppose still has an argument for it.

My primary concern with Python3 was that you can no longer write bytestring()=="string constant" without it producing an exception if the bytestring has invalid encoding in it. Actually I learned that Python 3.0 gives you a type conversion error always for this, that is a lot better, but it means an awful lot of Python software will not compile.

I am very concerned, just from previous experience, that unless Python makes it trivial to handle invalid UTF-8, most programmers will just punt and translate it as either ISO-8859-1 or even ASCII, which is extremely counter-productive if the intention is to encourage Unicode. And the converter throwing exceptions I think will lead to a lot of DOS attacks.

I am also concerned that it is impossible to correctly change a string constant between unicode and bytes by putting a 'b' in front of it, due to different results for \x and \u. This I think will be a big obstacle for the "just change your code to use bytestrings" argument.

patch set on Qt Becomes LGPL · 2009-01-14 18:15 · Score: 1

I think the LGPL actually says you have to provide the full modified source code, not a patch set, so that is why I put it in quotes.

In reality, however, any company that wants to do this would instead send their patches to Nokia and try to convince them to put them into the official version. So their output really would be patches.

Re:Unicode on The Evolution of Python 3 · 2009-01-14 11:11 · Score: 1

len of a UTF-16 non-bmp character had certainly better be 2! If you think len() of a variable length endcoding should do anything other than return the number of code units, you have certainly never worked with variable length encodings. I can tell you that is USELESS, unless your purpose is to artifically make it impossible to use a variable length encoding (I have seen this ridiculous approach used as "proof" that UTF-8 is unusable, but it is completely bogus. strlen() returns the number of code units and if you think otherwise you are an IDIOT).

I don't think the Python authors are so stupid. The fact that len returns 2 indicates they have not gone off the deep end, and they are supporting UTF-16 in the correct way.

ISO-8859-1? I don't think that word means what you think it means

What I meant by "ISO-8859-1" is the expected result when lazy programmers are forced to write their own decoders because Python is not providing a lossless UTF-8 encryption. I expect a very common solution will be "change every byte into the matching word". In fact I was shocked when I saw an even *worse* solution which was "change every byte with the high bit set to "\xNN" and then run the normal converter". I had better learn to stop being shocked by how shoddy the things programmers will do. I do not think you have experienced this or you would not be so blase about "oh they will implement their own decoders".

Do you think Microsoft added wide byte support because they were kind?

Microsoft forced wide bytes on us because a bunch of politically-correct idiots thought that there was some horrible problem if English gets the "better" shorter encodings. If they really wanted I18N they would have implemented UTF-8 like intelligent people had done before them in Plan9. Microsoft has done more damage to Unicode than 20 years of ASCII-only programmers could ever do with this boneheaded move.

The only advantage of the Unix wars is that it delayed the same politically correct bullshit from appearing on Unix (Sun was certainly busy on implementing "wide characters" when I was working there), which probably would have forced it onto the internet protocols. Now only we have to deal with this crap on Windows. And with stupid people writing Python, apparently.

Re:Unicode on The Evolution of Python 3 · 2009-01-14 09:36 · Score: 1

I find it very hard to believe that Python turned U+1D11E into a two-word string when it "does not do UTF-16". No plausable bug in UCS-2 converting would do that, it would either produce an error or a 1-word string.

How do you handle incorrect UTF-8? You report the error, use 'some other error handling scheme instead of 'strict' for decoding, or write a decoder of your own.

Yes you would like to think that. However that is NOT what programmers do, and you are living in fantasy land if you think they will.

Just yesterday, completely by coincidence, a very smart programmer encountered an invalid UTF-8 encoding that they were trying to display, and the resulting "fix" was: she ran EVERY string through a filter she called "sanitize" that replaced EVERY byte with the high bit set with the string "\xNN" (where NN is the byte in Hex). As far as she was concerned, this "fixed" it, because English text still worked, and the ONLY non-English she ever encountered was an invalid UTF-8 encoding. Basically not only did she completely break UTF-8, she even broke ISO-8859-1 characters!

Please you have to realize what people will really do!

Also an anonymous poster points out the Python3.0 forces the argv command line arguments through the default conversion and there is NOTHING the Python program can do about this. This is absolutely one of the worst possible decisions possible! You basically are unable to put an invalid encoding on the command line, and if there is no way to force UTF-8 before this happens, you cannot put anything other that ISO-8859-1 on the command line! So much for being able to make a Python program that can delete or rename a file with invalid UTF-8 in it's name.

Slashdot Mirror

User: spitzak

Comments · 5,741