"Very efficient" is still much less efficient, because on 16-bit characters, you simply split at char_count * 2 bytes, where with UTF-8, you have to examine every byte up to char_count to see if any of them are part of a surrogate pair.
The way I understand it is that UCS-2 (not UTF-16, because UTF-16 supports surrogate pairs) gives significant performance advantages for operations like substring(). That's why lots of platforms use it internally (.NET, Java, others I'm sure).
The OP doesn't see the procedure as a liability. That's a lie. The OP sees the CAB as a threat to his authority. He sees the servers as "his", and this will remove some of the power he has over them.
The OP needs to understand that the servers belong to the company, and that the company gets to decide how patches are approved and administered. If the job changes into something he's no longer interested in doing, then he needs to move on.
Being an asshole administrator might seem like fun, but usually, in "normal" businesses, it gets you fired.
Well, yeah, but that would completely change the way these things work. What if your split() worked on code units, and you broke up a code point? That certainly wouldn't produce results that anyone would consider optimal, or even useful.
You can continue to pretend that byte arrays are strings, and strings are byte arrays, but you're not going to get anywhere. The rest of the world decided that we want a useful abstraction over the underlying data structure. When we're working with strings, we care about characters, not bytes.
Pretty much every string operation is going to require decoding. Things like substr(), replace(), split(), join(), etc are all going to require decoding the string.
Hey, I figured out what your problem is, where you went wrong. You think that a string and a bunch of bytes are the same thing. They're not. If you have a bunch of bytes, treat it as a bunch of bytes. If you have a string, treat it as a string.
Java, for example, stores strings internally as UTF-16 (or UCS-2, opinions differ)..NET stores them internally as UCS-2.
This is also why there's a difference between CHAR and NCHAR in databases.
There is not a one-to-one mapping from a given string to a given set of bytes, because it depends on how you encode the string. Furthermore, some encodings have constraints on what input can produce a valid string. ASCII (plus non-standard high-ASCII) is not one of these encodings. UTF-8 (and all other Unicode encodings) are.
However, PEP 393 should've solved your particular problem (in Python 3.3), by allowing you to store these unicode-invalid "strings" internally as ASCII. Have fun in code-page land.
There's your problem right there. There are no "tiny mistakes" in UTF-8. Either it's valid UTF-8, or it's not. It's valid XML, or it's not. It's valid JSON, or it's not. It's valid HL7, or it's not. There is no "graceful" handling of invalid data, not in the general case.
Physically possible arrangements of bytes will appear in files, yes, but those files are not necessarily UTF-8.
Oh, and all *my* serious software can handle Unicode just fine (in all its various encodings), because I use a platform that was designed FROM THE START to handle it correctly. It does fail gracefully, which is nice, in the invalid-data case, but nonetheless, garbage-in, garbage-out.
Well, sure. Just as, however, you (apparently) can't legislate morality (we tried, see prohibition), I believe that you also can't legislate equality.
Even if we pulled down the "1%", we'd simply have a new set of ultra-rich replace them. The fact that the new ultra-rich came from the old ultra-poor would be small comfort. It's human nature to attempt to get wealth and power.
If you're a religious sort of person, of course, you likely have a particular perspective on all this.
So, if the ruling class makes a mess of things, it's their fault, and if, in response, the working class makes a mess of things, it's still the ruling class' fault? Does the working class have any responsibility for their actions?
It seems to me that you're making the case that the working class is inferior, and thus can't be held responsible for their actions, and yet, should be equal to the ruling class. I'm not sure you can have it both ways.
That (4) is a much bigger influence than you imply. What Medicare and Medicaid cover and do not cover, at what levels and under what conditions they cover things have an enormous impact on the healthcare of the rest of us.
I download my TV shows wirelessly. It's very high bandwidth, too. I *NEVER* wait more than an hour for a one-hour 1080i episode (though I never wait less than an hour either), and I can download several episodes at once without slowing down or impacting other users of the network in the house.
The best part? There's no monthly fee, and it's *LEGAL*.
My $99 "radio thermostat" (Mine's the cheaper one from THD) does it. Doesn't need the internet, though it can work over it, when I'm at home my phone talks directly to the thermostat, doesn't need any external servers.
Historically, Persians were Zoroastrian, not Muslim, right?
Re: It was already a dangerous site to visit ...
on
PHP.net Compromised
·
· Score: 1
Tabs are simply not a good choice for indentation *regardless* of language because there is no standard for how to use them.
That's a stupid thing to say. Replace "Tabs" with "Spaces" and it's just as true.
If we use tabs, I can make them 1, 2, 4, 8, or 32 wide, whatever I prefer. If we use spaces, I have to agree with and accept whatever the team prefers. Seems pretty simple to me.
If you make a standard for how to use any character or set of characters for indentation, then there's a standard.
Oh, and while you're right, I haven't written much Python, I have written a WHOLE LOT of code in brace languages. I use IDEs that brace-match for me, so in the really hairy cases, I have a little help. How do you brace-match spaces?
I'm a senior TrueCrypt developer, and I have access to the Master Keys that can unlock any TrueCrypt encrypted data.
Now do you feel better?
strstr() is fine, yes, but what about strncpy()?
"Very efficient" is still much less efficient, because on 16-bit characters, you simply split at char_count * 2 bytes, where with UTF-8, you have to examine every byte up to char_count to see if any of them are part of a surrogate pair.
The way I understand it is that UCS-2 (not UTF-16, because UTF-16 supports surrogate pairs) gives significant performance advantages for operations like substring(). That's why lots of platforms use it internally (.NET, Java, others I'm sure).
Yes, because it's better than leaving said charred remains to rot. Charred remains smell bad, foreign or not.
If that's true, then it's great for him. It means he'll be able to hire more subordinates, thereby climbing the corporate ladder.
The OP doesn't see the procedure as a liability. That's a lie. The OP sees the CAB as a threat to his authority. He sees the servers as "his", and this will remove some of the power he has over them.
The OP needs to understand that the servers belong to the company, and that the company gets to decide how patches are approved and administered. If the job changes into something he's no longer interested in doing, then he needs to move on.
Being an asshole administrator might seem like fun, but usually, in "normal" businesses, it gets you fired.
Well, yeah, but that would completely change the way these things work. What if your split() worked on code units, and you broke up a code point? That certainly wouldn't produce results that anyone would consider optimal, or even useful.
You can continue to pretend that byte arrays are strings, and strings are byte arrays, but you're not going to get anywhere. The rest of the world decided that we want a useful abstraction over the underlying data structure. When we're working with strings, we care about characters, not bytes.
Maybe you should design your own platform where strings will be represented internally as UTF-8. It would be an interesting exercise.
Pretty much every string operation is going to require decoding. Things like substr(), replace(), split(), join(), etc are all going to require decoding the string.
Hey, I figured out what your problem is, where you went wrong. You think that a string and a bunch of bytes are the same thing. They're not. If you have a bunch of bytes, treat it as a bunch of bytes. If you have a string, treat it as a string.
Java, for example, stores strings internally as UTF-16 (or UCS-2, opinions differ). .NET stores them internally as UCS-2.
This is also why there's a difference between CHAR and NCHAR in databases.
There is not a one-to-one mapping from a given string to a given set of bytes, because it depends on how you encode the string. Furthermore, some encodings have constraints on what input can produce a valid string. ASCII (plus non-standard high-ASCII) is not one of these encodings. UTF-8 (and all other Unicode encodings) are.
However, PEP 393 should've solved your particular problem (in Python 3.3), by allowing you to store these unicode-invalid "strings" internally as ASCII. Have fun in code-page land.
There's your problem right there. There are no "tiny mistakes" in UTF-8. Either it's valid UTF-8, or it's not. It's valid XML, or it's not. It's valid JSON, or it's not. It's valid HL7, or it's not. There is no "graceful" handling of invalid data, not in the general case.
Physically possible arrangements of bytes will appear in files, yes, but those files are not necessarily UTF-8.
Oh, and all *my* serious software can handle Unicode just fine (in all its various encodings), because I use a platform that was designed FROM THE START to handle it correctly. It does fail gracefully, which is nice, in the invalid-data case, but nonetheless, garbage-in, garbage-out.
Well, sure. Just as, however, you (apparently) can't legislate morality (we tried, see prohibition), I believe that you also can't legislate equality.
Even if we pulled down the "1%", we'd simply have a new set of ultra-rich replace them. The fact that the new ultra-rich came from the old ultra-poor would be small comfort. It's human nature to attempt to get wealth and power.
If you're a religious sort of person, of course, you likely have a particular perspective on all this.
If it's not UTF-8, why do you claim it's UTF-8?
That's like arguing that XML parsers should allow unclosed tags, because otherwise, they just throw exceptions and can't be used for serious work.
You're probably the guy we have to thank for "tag soup". Asshole.
So, if the ruling class makes a mess of things, it's their fault, and if, in response, the working class makes a mess of things, it's still the ruling class' fault? Does the working class have any responsibility for their actions?
It seems to me that you're making the case that the working class is inferior, and thus can't be held responsible for their actions, and yet, should be equal to the ruling class. I'm not sure you can have it both ways.
We need to do something about the gaming inequality. We should regulate PC gaming hardware, as it's got an unfair advantage over consoles.
And, let's not forget, that the French Revolution didn't exactly produce utopia...
Won't someone please think of the old ladies?
That (4) is a much bigger influence than you imply. What Medicare and Medicaid cover and do not cover, at what levels and under what conditions they cover things have an enormous impact on the healthcare of the rest of us.
I download my TV shows wirelessly. It's very high bandwidth, too. I *NEVER* wait more than an hour for a one-hour 1080i episode (though I never wait less than an hour either), and I can download several episodes at once without slowing down or impacting other users of the network in the house.
The best part? There's no monthly fee, and it's *LEGAL*.
OTA, it's the future.
Ugh, unless you're on a gbit wired connection, why bother? Sure, wireless can work, but it can be shit too.
Oh, you forgot to wire jacks in your house? That's a problem.
My $99 "radio thermostat" (Mine's the cheaper one from THD) does it. Doesn't need the internet, though it can work over it, when I'm at home my phone talks directly to the thermostat, doesn't need any external servers.
Glad it helped you. Is it your argument that you are typical of welfare recipients?
Chew on that for a bit.
Historically, Persians were Zoroastrian, not Muslim, right?
Tabs are simply not a good choice for indentation *regardless* of language because there is no standard for how to use them.
That's a stupid thing to say. Replace "Tabs" with "Spaces" and it's just as true.
If we use tabs, I can make them 1, 2, 4, 8, or 32 wide, whatever I prefer. If we use spaces, I have to agree with and accept whatever the team prefers. Seems pretty simple to me.
If you make a standard for how to use any character or set of characters for indentation, then there's a standard.
Oh, and while you're right, I haven't written much Python, I have written a WHOLE LOT of code in brace languages. I use IDEs that brace-match for me, so in the really hairy cases, I have a little help. How do you brace-match spaces?