The Most Expensive One-Byte Mistake

← Back to Stories (view on slashdot.org)

The Most Expensive One-Byte Mistake

Posted by Soulskill on Tuesday August 2, 2011 @04:18PM from the catchy-but-totally-misleading-internet-headline dept.

An anonymous reader writes "Poul-Henning Kamp looks back at some of the bad decisions made in language design, specifically the C/Unix/Posix use of NUL-terminated text strings. 'The choice was really simple: Should the C language represent strings as an address + length tuple or just as the address with a magic character (NUL) marking the end? ... Using an address + length format would cost one more byte of overhead than an address + magic_marker format, and their PDP computer had limited core memory. In other words, this could have been a perfectly typical and rational IT or CS decision, like the many similar decisions we all make every day; but this one had quite atypical economic consequences.'"

67 of 594 comments (clear)

Min score:

Reason:

Sort:

The Road Not Taken by symbolset · 2011-08-02 16:18 · Score: 5, Insightful

Two roads diverged in a yellow wood, And sorry I could not travel both And be one traveler, long I stood And looked down one as far as I could To where it bent in the undergrowth; Then took the other, as just as fair, And having perhaps the better claim, Because it was grassy and wanted wear; Though as for that the passing there Had worn them really about the same, And both that morning equally lay In leaves no step had trodden black. Oh, I kept the first for another day! Yet knowing how way leads on to way, I doubted if I should ever come back. I shall be telling this with a sigh Somewhere ages and ages hence: Two roads diverged in a wood, and I— I took the one less traveled by, And that has made all the difference.

- Robert Frost, 1920

--
Help stamp out iliturcy.
1. Re:The Road Not Taken by IICV · 2011-08-02 17:46 · Score: 3, Interesting
  
  Everyone misunderstands that poem.
  Robert Frost had a fairly depressing outlook on life, and the point of the poem is that it doesn't matter what road you take.
  I mean, just pay attention to the narrative tense in the last stanza, the one people take to be so life-affirming and "do something different!". The narrator isn't saying "I did this, and I know it was important"; he's saying "I did this, and I think that in the future I'm going to tell people it was important".
  The narrator is a vain, shallow individual who frets about insignificant decisions like this, thinking that they will have some gigantic impact on his life, and then later on blows those choices up to be of earthshattering proportions. This is all despite the fact that half the poem is about how the roads are effectively identical; and in the end, he doesn't even tell us what was important about the path he took, just that it was the "one less traveled by" (which makes no sense! They were "just as fair", they had been "worn ... really about the same", they "both that morning equally lay".)
  Basically, if we apply this poem to the current situation, what it's saying is that in alternate 2011 we'd have an article about how null-terminated strings would have been better than Pascal strings. It doesn't matter what path you take, if you're the right kind of person you'll always blow up the significance of it in your mind later.
2. Re:The Road Not Taken by j.+andrew+rogers · 2011-08-02 17:53 · Score: 2, Informative
  
  As a nitpick, this poem is not from 1920. I have an original copy that was inscribed by the owner in 1919.
  According to Wikipedia, the original poetry was published in 1916. The 1920 version was a second edition.
3. Re:The Road Not Taken by billstewart · 2011-08-02 19:06 · Score: 4, Funny
  
  Yes, you got the author right. The trick is that in the 1920 edition, he's taking the other road...
  
  --
  
  Bill Stewart
  New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
4. Re:The Road Not Taken by Anonymous Coward · 2011-08-03 00:13 · Score: 2, Funny
  
  Clearly the poem was originally two different poems written by two difference people: call them person 'H' for 'honest' and person 'C' for 'cynic'. At some later date, the 'H' text and the 'C' text were merged, with modifications, by scribes. In truth, we can't be sure whether the person traditionally held to be the author, Frost, was 'H' or 'C', or whether he, in fact, wrote any part of the text at all.
5. Re:The Road Not Taken by looie · 2011-08-03 00:14 · Score: 5, Insightful
  
  Not sure where you took your "poetical exegesis" class, but you should ask for a refund.
  The narrator as "vain, shallow individual" is entirely a character pulled out of your hindquarters, as there is nothing in the text of the poem to lead to that conclusion.
  The poem is simply a reflection on how we, as individuals, make choices in life. Some of us choose to take the direction taken by most of those around us. That might be university, family, job, retirement in FL. Some of us choose to turn aside from that direction and try another path. Programming a PDP to play "Space Travel," for example. Or writing an operating system "just for fun."
  Frost's suggestion is that these choices of path may seem insignificant at the time -- both paths being nearly the same; but that, as "way leads on to way," there's no going back and thus we may find ourselves down a path that leads to unexpected places. When Linus Torvalds wrote linux, he could not know that "the path less traveled" would lead to fame and fortune, literally. The college kids who created Slashdot could not know it would make them rich.
  In fact, the point of the poem is exactly that it does matter which path you take. But that you don't always know how your choice is going to turn out. Frost himself might have continued his career as a teacher, a stable and certain means of supporting his family. Instead, he chose to focus on his poetry. He took a chance. And it worked well for him.
  mp
  
  --
  "The secret to strong security: less reliance on secrets." -- Whitfield Diffie
6. Re:The Road Not Taken by brusk · 2011-08-03 02:03 · Score: 2
  
  The meaning of the poem lies in the NUL character at the end.
  
  --
  .sig withheld by request
7. Re:The Road Not Taken by cecille · 2011-08-03 02:26 · Score: 5, Insightful
  
  Would anyone care you join me in flicking a few pebbles in the direction of teachers who are fond of asking the question: "what is the poet trying to say?" as if Thomas Hardy and Emily Dickinson had struggled but ultimately failed in their efforts - inarticulate wretches that they were, biting their pens and staring out of the windows for a clue. Yes, it seems that Whitman, Amy Lowell and the rest could only try and fail, but we in Mrs. Parker's third-period English class here at Springfield High will succeed with the help of these study questions in saying what the poor poet could not, and we will get all this done before that orgy of egg salad and tuna fish known as lunch. -- from Billy Collins "The Effort"
  
  --
  ...no two people are not on fire.
8. Re:The Road Not Taken by Waffle+Iron · 2011-08-03 03:33 · Score: 3
  
  Whose code is this I think I know
  'Tis filled with buffer overflows
  His pointer is not stopping here
  As the megs of garbage data grow
  My CPU must think it queer
  To scan for null bytes not found here
  Between the stack and blocks of code
  Canary values, segfault near
  It gives the PC bell a quake
  To ask if there is some mistake
  The only other sound's the sweep
  Of swapping pages disk head shake
  The stack is swelling very fast
  But allocated buffer's past
  And megs to fill before a crash
  And megs to fill before a crash
9. Re:The Road Not Taken by drawfour · 2011-08-03 05:52 · Score: 2
  
  Since it was published before 1923, it's already in the public domain. See the footnote at the bottom of the wikisource page for the poem, and then you can follow the links from there if you care to read more.
10. Re:The Road Not Taken by IICV · 2011-08-03 18:08 · Score: 2
  
  Frost's suggestion is that these choices of path may seem insignificant at the time -- both paths being nearly the same; but that, as "way leads on to way," there's no going back and thus we may find ourselves down a path that leads to unexpected places. When Linus Torvalds wrote linux, he could not know that "the path less traveled" would lead to fame and fortune, literally. The college kids who created Slashdot could not know it would make them rich.
  In fact, the point of the poem is exactly that it does matter which path you take. But that you don't always know how your choice is going to turn out. Frost himself might have continued his career as a teacher, a stable and certain means of supporting his family. Instead, he chose to focus on his poetry. He took a chance. And it worked well for him.
  You know, just because you have a positive interpretation of the poem doesn't mean that it's more supported by the text than a negative one.
  Basically, Robert Frost was trolling, and you got bit by it. Why do you think he ended the poem with such a great couplet? Even though it makes such little sense in the context of the rest of the poem? (the dude's thinking about how he's going to talk about it in the future, the rest of the poem is about how the paths are equal) Because he knew it would catch people's attention, and that then they'd look in to the poem some more and see the dissonance. We've just gotten to the point where the popular interpretation is so positive people just ignore the incongruities.
  Look: the narrator is vain and shallow because he's dithering about a minor choice in his life, and in his head it's this giant, life-altering moment.
  Imagine if someone said to you "It took me a long time to decide if I should wear navy socks or black socks this morning" - you'd think they were kinda silly for even thinking about it.
  If they said "I'll tell people about this day - the day I wore black socks, and not navy socks - I'll tell them with a sigh, that I took the pair less traveled by, and that has made all the difference", you'd think they were, well, vain and shallow. Their choice in socks is the most important thing ever! Oh em gee!
  Look, all those things you got out of the poem, those positive life-affirming things about making choices and stuff - that's all great. It really is. There's definitely a place for that in everyone's life.
  But that's not what this poem is about. You're projecting what you want to see onto the poem, instead of taking it in as a blank slate and seeing what the author wrote.
  I mean, I know what that's like. I was disappointed the first time I read the poem. I'd heard people - people like you, in fact - talk about how it's all about taking the road less traveled and being your own person and taking chances, so when I realized I could just read it on my own I was kinda excited, I thought it was gonna be awesome with him thinking about it and then striking out on his own.
  But no, he doesn't! I was prepared for the "road less traveled" to be some third option, not going through either of the paths but striking out on his own. Instead, the narrator sits and dithers and thinks about it and just picks one basically at random. I mean, what bullshit is this? If you're going to take the road less traveled, then it damn well better be less traveled! If you're picking from two clearly laid out choices that other people have walked through, that's not the road less traveled - there's no chances, there's no being your own person, none of that stuff. You're just picking which footsteps to follow in, and then rationalizing it afterwards as having been "the road less traveled".
  The last lines are ironic, and Robert Frost is spinning in his grave singing the trolololo song.
  (funnily enough, the poem also predicts hipsters before they were popular)
Missed the point by mgiuca · 2011-08-02 16:24 · Score: 5, Informative

Interesting, but I think this article largely misses the point.
Firstly, it makes it seem like the address+length format is a no-brainer, but there are quite a lot of problems with that. It would have had the undesirable consequence of making a string larger than a pointer. Alternatively, it could be a pointer to a length+data block, but then it wouldn't be possible to take a suffix of a string by moving the pointer forward. Furthermore, if they chose a one-byte length, as the article so casually suggests as the correct solution (like Pascal), it would have had the insane limit of 255-byte strings, with no compatible way to have a string any longer. (Though a size_t length would make more sense.) Furthermore, it would be more complex for interoperating between languages -- right now, a char* is a char*. If we used a length field, how many bytes would it be? What endianness? Would the length be first or last? How many implementations would trip up on strings > 128 bytes (treating it as a signed quantity)? In some ways, it is nice that getaddrinfo takes a NUL-terminated char* and not a more complicated monster. I'm not saying this makes NUL-termination the right decision, but it certainly has a number of advantages over addr+length.
Secondly, this article puts the blame on the C language. It misses the historical step of B, which had the same design decision (by the same people), except it used ASCII 4 (EOT) to terminate strings. I think switching to NUL was a good decision ;)
Hardware development, performance, and compiler development costs are all valid. But on the security costs section, it focuses on the buffer overflow issue, which is irrelevant. gets is a very bad idea, and it would be whether C had used NUL-terminated strings or addr+len strings. The decision which led to all these buffer overflow problems is that the C library tends to use a "you allocate, I fill" model, rather than an "I allocate and fill" model (strdup being one of the few exceptions). That's got nothing to do with the NUL terminator.
What the article missed was the real security problems caused by the NUL terminator. The obvious fact that if you forget to NUL-terminate a string, anything which traverses it will read on past the end of the buffer for who knows how long. The author blames gets, but this isn't why gets is bad -- gets correctly NUL-terminates the string. There are other, sneaky subtle NUL-termination problems that aren't buffer overflows. A couple of years back, a vulnerability was found in Microsoft's crypto libraries (I don't have a link unfortunately) affecting all web browsers except Firefox (which has its own). The problem was that it allowed NUL bytes in domain names, and used strcmp to compare domain names when checking certificates. This meant that "google.com" and "google.com\0.malicioushacker.com" compared equal, so if I got a certificate for "*.com\0.malicioushacker.com" I could use it to impersonate any legitimate .com domain. That would have been an interesting case to mention rather than merely equating "NUL pointer problem" with "buffer overflow".
1. Re:Missed the point by Anonymous Coward · 2011-08-02 16:37 · Score: 5, Informative
  
  http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2009-2510
2. Re:Missed the point by snowgirl · 2011-08-02 16:41 · Score: 3, Interesting
  
  Not to mention the argument for "because space was at a premium" is specious, because either you had a 8-bit length prepended to the string, or you had an 8-bit special value appended to the end of the string. Both ways result in the same space usages.
  From what I read in the summary, (didn't read TFA) this whole thing sounds like a propaganda piece supporting the idea that we should use length+string, by presenting it as "this should have been a no-brainer but the idiots making C screwed up."
  As a nitpicky pedantic note though, if C had gone with length+string format, then other languages would have been written around the C standard, since most of them were written around the C standards to begin with to increase interoperability in the first place.
  
  --
  WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
3. Re:Missed the point by snowgirl · 2011-08-02 16:44 · Score: 2
  
  "...it would have had the insane limit of 255-byte strings, with no compatible way to have a string any longer."
  Compatible with what? Seems to me they could have just used continuation bit for the size field, much the way UTF-8 works to store non-ASCII characters.
  This would still make the strings incompatible, because you would only have a 127-byte string length before the "continuation bit" comes into play and you need to switch to a 15-bit string length. All the previous code written with longer-than-127-byte strings would be incompatible.
  
  --
  WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
4. Re:Missed the point by e9th · 2011-08-02 16:46 · Score: 2
  
  My personal fave is strncpy(), which will silently not terminate the string if the buffer is too small, but if you give it a huge buffer it punishes you by NUL padding the string all the way to the end of the buffer.
5. Re:Missed the point by snowgirl · 2011-08-02 16:48 · Score: 4, Informative
  
  I'm correcting myself here... apparently they weren't considering going with a 255-byte limit, but a 65535-byte limit, which would have increased the size overhead by one.
  
  --
  WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
6. Re:Missed the point by mgiuca · 2011-08-02 16:49 · Score: 2
  
  They could have but they didn't (e.g., in Pascal, where strings actually are limited to 255 bytes). So, history has made some worse string representations than C.
7. Re:Missed the point by mgiuca · 2011-08-02 16:56 · Score: 2
  
  Good point.
  
  As a nitpicky pedantic note though, if C had gone with length+string format, then other languages would have been written around the C standard, since most of them were written around the C standards to begin with to increase interoperability in the first place.
  Yes, but perhaps the simplicity was partly why it caught on. The reason I raised all of the "what about..." questions was to illustrate just how many small variations in an address+length standard there could have been. Even if C had made a decision on all of those, how many implementations would have gotten it wrong?
  Not just implementations, but individual programs. Assuming that in this hypothetical universe in which C doesn't use NUL terminated strings, but still assuming that C is a low-level unsafe language in general, how would this have been any different? Unlike C++ or Java, in C, programs manually construct strings. So we wouldn't have people forgetting to NUL-terminate strings. We would instead have people forgetting to set the length field, or setting the wrong length, or being given a 257-byte string and writing a "1" in the length field due to wraparound (granted, that wouldn't often be a security risk, just a bad result). If they had decided to use a variable-length length field, people would have found some way to screw that up. I'm sure hackers would have found a way to inject a long length into a short string and thus read past the end.
  At the end of the day, the problem is that C lets programmers do whatever they want with memory, not the NUL terminator. And you can't really say "they should have designed it better," because it is rather the point of C that it lets you do this.
8. Re:Missed the point by dbc · 2011-08-02 17:02 · Score: 5, Informative
  
  Oh, Lordy, if you had ever programmed in a language with a 255 character limit for strings you would praise $DIETY every time you use a C string. Dealing with length limited strings is the largest PITA of any senseless and time-wasting programming task.
  Suppose C had originally had a length for strings? The only thing that makes sense is for the string length count to be the same size as a pointer, so that it could effectively be all of memory. A long is, by C language definition, large enough to hold a pointer that has been cast into it. So string length computations all become longs. Not such a big deal for most of life... until.... 64 bit addressing. Then all sorts of string breakage occurs.
  The bottom line is that in an application programming language strings need to be atomic, as they are in Python. You just should not care how strings are implemented, and you should never worry about a length limit. The trouble is, C is a systems programming language, so it is imperative that the language allow direct access to bit-level implementation. If you chose to use a systems programming language for application programming, well, then it sucks to be you. So why did we do that for so long? Because all the other alternatives were worse.
  Hell, I've used languages where the statement separator was a 12-11-0-7-8-9 punch. (Bonus points if you can tell me what that is and how to make one.) So a NUL terminated string looks positively modern compared to that.
9. Re:Missed the point by arth1 · 2011-08-02 17:25 · Score: 3, Informative
  
  That's still an arbitrary limit.
  The advantages that I see for counted length are:
  - it makes copying easier - you know beforehand how much space to allocate, and how much to copy.
  - it makes certain cases of strcmp() faster - if the length doesn't match, you can assume the strings are different.
  - It makes reverse searches faster.
  - You can put binary in a string.
  But that must be weighed against the disadvantages, like not being able to take advantage of CPUs zero test conditions, but instead having to maintain a counter which eats up a valuable register. Or having to convert text blocks to print them. Or not being well suited for piped text or multiple threads; you can't just spew the text into an already nulled area, and it will be valid as it comes in; you have to update a text length counter for every byte you make available. And... and...
  Getting a free strlen() is NOT an advantage, by the way. In fact, that became a liability when UTF-8 arrived. With a library strlen() function, all you had to do was update the library, but when the compiler was hardcoded to just return the byte count, that wasn't an option. Sure, one could go to UTF-16 instead, but then there's a lot of wasted space.
  All in all, having worked with both systems, I find more advantages with null-termination.
  There's also a third system for text - linked lists. It doesn't have the disadvantage of an artificial string length limit, and allows for easy cuts and pastes, and even COW speedups, but requires far more advanced (and thus slower) routines and housekeeping, and has many of the same disadvantages as byte-counted text.. Some text processors have used this as a native string format, due to the specific advantages.
  I'd still take NULL-terminated for most purposes.
10. Re:Missed the point by MrEricSir · 2011-08-02 17:51 · Score: 2
  
  If we were to switch now, is that the compatibility you're referring to? Well sure.
  But nobody's talking about switching now, the point of the topic is that C should have been designed differently. In those days there was very little backwards compatibility to worry about.
  
  --
  There's no -1 for "I don't get it."
11. Re:Missed the point by mgiuca · 2011-08-02 17:54 · Score: 2
  
  Is it, or are you just used to dealing with NUL-terminated strings?
  Nope, they are simpler. Re-read all of the questions I asked regarding design decisions that could be made around address+length formatted strings and tell me that they are just as simple. Now I think higher-level languages should be using lengths, because their libraries abstract the details (e.g., C++ or Java). But in a language where programmers fabricate their own strings, simplicity is best.
  
  That's what libraries are for :-)
  Well, let's assume a hypothetical universe in which C is still exactly the same C, only with length-delimited strings (still the same level of safety, still malloc and free, still pretty much the same library, only the string functions are implemented differently, etc). Could you write a library that abstracts over the string representation without ever requiring the user to manually read or write the string? I think if you did that (and certainly, C++ does that), you would have a much higher-level library. That isn't what C is good for. C is for when you need low-level access to the underlying representation.
  The beauty of using C (and there aren't many) is that you can write your own efficient string manipulation code. For example, if you know you are going to concatenate three strings, you can allocate enough space for all three, then manually copy the bytes over and seal it with a NUL. In C++, you would probably have a stringstream and push each of the strings onto the end, but it would mean the library is internally adjusting lengths and so on -- the programmer can't make the code do exactly what he asks; there is a layer of abstraction. So you could change C's string representation and then provide a high-level API for manipulating it, but someone is going to get pissed off that the library doesn't do exactly what he wants, and dive down and do it himself. It would be very un-C-like to provide that API.
  To put it another way, if you were going to provide a high-level string API for C and tell programmers "never ever manipulate strings on your own; use this library," then you might as well use NUL-terminated strings anyway, since the library will handle it, and programmers will never make a mistake. But again, that would be very un-C-like.
  So once again, it comes down to this: NUL-terminated strings aren't the problem with C. C is the problem with C: the fact that it gives programmers a lot of power. You might argue that we should stop using C to write programs that don't need that speed or power. But there's no point arguing that C should have been a higher-level language, because then it wouldn't be C.
12. Re:Missed the point by mgiuca · 2011-08-02 18:02 · Score: 2
  
  I think the bigger point that's missed is that if a size field were used, you'd still have the same buffer overflow problem if someone simply specified a size that didn't match the allocated memory, same as strncpy will happily try to keep writing to a buffer if you give it bad size information.
  Exactly. The real problem* is that C lets programmers fabricate data however they want.
  *I say "problem" but it really is the whole point of C. It is a dangerous and powerful tool. To make it less dangerous would make it less powerful, and if you wanted such a language, there are plenty available.
13. Re:Missed the point by mgiuca · 2011-08-02 18:06 · Score: 2
  
  Well C++ includes a class that is pretty much exactly what you ask for. It wouldn't make sense for C to include that, as the whole point is that C gives you the ability to manipulate data however you want. If C included that, it would be criticised for having two incompatible string types. If it only included that, it would be criticised for not being low-level enough (the programmer is forced to call all these inefficient string manipulation functions that do bounds checking).
  You might ask why C doesn't include closures and list comprehensions: if you want high-level language features, then C isn't the language for you.
14. Re:Missed the point by yuhong · 2011-08-02 18:27 · Score: 2
  
  I think it was intended to convert null-terminated strings to fixed-length null padded strings, as used in many places in the Unix kernel at the time it was invented, like filenames.
15. Re:Missed the point by stderr_dk · 2011-08-02 18:32 · Score: 4, Insightful
  
  Poor guy. I guess sooner or later he's going to have to learn how to manage his memory and understand how the underlying physical hardware works. That must be a real toughie for anyone who learned to "program" in the Java/C# world.
  Yeah, clearly PHK doesn't knows anything about memory allocation. (Except for the malloc library he wrote for FreeBSD...)
  
  Maybe he should RTFM.
  I don't have a FreeBSD system at hand, but I wouldn't be surprised if the malloc page was written by PHK.
  
  --
  alias sudo="echo make it yourself #" ; # https://pipedot.org/~stderr & http://soylentnews.org/~stderr
16. Re:Missed the point by mgiuca · 2011-08-02 19:01 · Score: 2
  
  They are simpler IFF you never use the null value. Do you have any files on your system which have NUL bytes in them? Hint: yes.
  Yes -- this is a good reason not to use NUL-terminated strings (which, once again, TFA missed). Remember: I never said NUL terminated strings were good, just that the article missed the point by blaming NUL strings for a different, unrelated problem, and not actually picking up on any of the problems with NUL strings.
  If you need a 0 byte in your strings, then this won't work. However, to be technically correct, strings should contain text, and text should not contain a 0-byte. What about binary strings? Those should absolutely not be stored as NUL-terminated. Remember, nothing in C forces you to use NUL-terminated strings -- it just means you should not use the string.h functions on binary strings. Instead, you MUST separately keep the length around, as you do for an array of ints. Think of a binary string as an "array of chars" and not a NUL-terminated string, and there *shouldn't* be any trouble. (Yet as I pointed out with the MS certificate bug, there can still be trouble.)
  
  string concat (string a, string b, string c) {
  string ret = strnew( strlen(a)+strlen(b)+strlen(c) );
  strfill(ret, 0, a);
  strfill(ret, strlen(a), b);
  strfill(ret, strlen(a)+strlen(b), c);
  return ret;
  }
  What's so hard about that?
  Nothing was hard about it. It's just that you had to invent two new library functions (strnew and strfill) which are much higher-level than other C library functions (with the possible exception of strdup, which combines allocation and copying). You are now saying to your C users (in the hypothetical "C with length-delimited strings" language) "you must never manually manipulate your own strings -- only ever use these library functions." That is antithetical to the way C works. C programmers want absolute control over the representation of everything. If you want a higher-level language, use a higher-level language.
17. Re:Missed the point by mcvos · 2011-08-02 20:45 · Score: 2
  
  Allow me to summarize that for the tl;dr crowd:
  C's "everything is a pointer" approach gives you the power to easily do lots of cool stuff, and adding length to a string would break that elegance. But using NUL-terminated strings creates a lot of security problems, not merely limited to buffer overflows, which are really caused by C's backward memory allocation.
18. Re:Missed the point by snowgirl · 2011-08-02 20:52 · Score: 3, Insightful
  
  If we were to switch now, is that the compatibility you're referring to? Well sure.
  But nobody's talking about switching now, the point of the topic is that C should have been designed differently. In those days there was very little backwards compatibility to worry about.
  And if it had been decided to be 1-byte length + data, and everyone used it like that, and assumed that the full 8-bits are available for the length, then when we switch to variable-byte length encoding, it would create an incompatibility. The incompatibility I speak of is the hypothetical one switching from 1-byte fixed-length length encoding to variable-byte length encoding.
  "They could have just used variable length encoding from the beginning." Sure, and they could have programmed everything in Java from the start... the idea of a variable length encoding would have been over-engineering the problem that they were facing.
  
  --
  WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
19. Re:Missed the point by TheRaven64 · 2011-08-02 21:08 · Score: 2
  
  They also need to be defined in terms of abstract characters
  How big is an abstract character? When C was created, the choices were basically ASCII or EBCDIC, so 7 bits. Then you got 8-bit encodings, but they were all incompatible. When OpenStep was published, Objective-C strings were defined as ordered collections of unicode characters, which were 16-bit values. Modern versions of the unicode specification require more than 16 bits for the entire range, so you end up needing 32 bits for each character. In a high-level language, you can just have a character type that you periodically redefine to be bigger and let the VM / runtime sort it out. In a low-level language like C, you break the ABI every time you do that.
  
  --
  I am TheRaven on Soylent News
20. Re:Missed the point by jeremyp · 2011-08-02 22:02 · Score: 3, Interesting
  
  That's still an arbitrary limit.
  An arbitrary limit equal to the virtual machine size of the computer that was originally targeted.
  
  The advantages that I see for counted length are:
  - it makes copying easier - you know beforehand how much space to allocate, and how much to copy.
  - it makes certain cases of strcmp() faster - if the length doesn't match, you can assume the strings are different.
  - It makes reverse searches faster.
  - You can put binary in a string.
  - It all but eliminates the possibility of buffer overruns for strings.
  
  But that must be weighed against the disadvantages, like not being able to take advantage of CPUs zero test conditions, but instead having to maintain a counter which eats up a valuable register.
  But lots of CPUs have an instruction a bit like "decrement register and jump if not zero" which can be used for length+data strings.
  
  Or not being well suited for piped text or multiple threads; you can't just spew the text into an already nulled area, and it will be valid as it comes in;
  With modern character encodings, you can't guarantee that whatever string format you use. Couple that with the fact that streamed data tends to be read and written in blocks with a length parameter anyway, and the whole advantage is gone. This is why almost all modern languages have some variation on length + data for their strings and utilities for manipulating raw byte buffers.
  
  Getting a free strlen() is NOT an advantage, by the way. In fact, that became a liability when UTF-8 arrived. With a library strlen() function, all you had to do was update the library, but when the compiler was hardcoded to just return the byte count, that wasn't an option.
  Except that strlen() has always and still does count the number of C chars before the null byte. This is enshrined in the C99 standard. UTF-8 has not changed the implementation of strlen(). Also, gcc and probably many other compilers will normally optimise things like strlen() to a few lines of assembler rather than a call to libc, so you'd have to recompile anyway if it does change.
  
  Sure, one could go to UTF-16 instead, but then there's a lot of wasted space.
  All in all, having worked with both systems, I find more advantages with null-termination.
  There's also a third system for text - linked lists. It doesn't have the disadvantage of an artificial string length limit, and allows for easy cuts and pastes, and even COW speedups, but requires far more advanced (and thus slower) routines and housekeeping, and has many of the same disadvantages as byte-counted text.. Some text processors have used this as a native string format, due to the specific advantages.
  I'd still take NULL-terminated for most purposes.
  Most modern languages have a proper string type and I would always take that over null terminated char sequences. You can bet that Java's internal implementation of String uses length+data.
  
  --
  All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
21. Re:Missed the point by bjourne · 2011-08-02 23:56 · Score: 2
  
  "But who would waste all that space just to make calculations faster?" Actually, Perl represents all Unicode in UTF-32 (native byte-alignment) and renders out to UTF-8 when printed. Precisely because space is relatively cheap, and processing time is still often the current limiting factor.
  Which is not always true either. Often the limiting factor is the space in the cpu cache, since ram access is relatively expensive.
  
  --
  Football Odds
22. Re:Missed the point by arth1 · 2011-08-03 01:22 · Score: 2
  
  But that must be weighed against the disadvantages, like not being able to take advantage of CPUs zero test conditions, but instead having to maintain a counter which eats up a valuable register.
  But lots of CPUs have an instruction a bit like "decrement register and jump if not zero" which can be used for length+data strings.
  Um, that's pretty much what I said, isn't it? The "instead having to maintain a counter which eats up a valuable register" part.
23. Re:Missed the point by KiloByte · 2011-08-03 02:17 · Score: 2
  
  On any sane platform, "long" has that property, and can also hold the machine word or more. Win64 is not sane.
  Because of it, the standard had to add intptr_t which is the only type portably known to be of same width as void* and char* (but _not_ necessarily the same as a pointer to any other type!). Of course, MSVC doesn't follow standards and doesn't have this type nor stdint.h/inttypes.h at all.
  Thus, your code will not work there, and will cut pointers. To make things worse, it will actually work if your pointers are within the first 2GB of address space and not on the stack.
  I do fully agree with your core point, though: Pascal strings would suffer from all these problems as well, and not only.
  
  --
  The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
24. Re:Missed the point by KiloByte · 2011-08-03 02:39 · Score: 2
  
  UTF-16 is not fixed width. It combines all disadvantages of UTF-8 and UCS4 while having no advantages of either.
  
  --
  The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
25. Re:Missed the point by blane.bramble · 2011-08-03 02:40 · Score: 2
  
  No, that is just the stupid. The obvious fix is to store the length of the string *after* the string itself.
26. Re:Missed the point by solkimera · 2011-08-03 03:38 · Score: 2
  
  Java's String implementation is: char array, offset, length. That way when doing a substring, the resulting string uses the same char array.
27. Re:Missed the point by ceswiedler · 2011-08-03 04:25 · Score: 2
  
  Are you suggesting strlen() should return the number of UTF-8 characters, not the number of bytes? That's insane... the entire point of UTF-8 is that stuff like strlen() can treat it as a narrow string. If you want to have a function for returning the number of printable characters in a UTF-8 string, that's going to be a separate function, and isn't any easier or harder with sized strings v.s null-terminated strings.
28. Re:Missed the point by darkwing_bmf · 2011-08-03 08:14 · Score: 2
  
  Really? How is Ada less powerful than C?
Maybe a better candidate by phantomfive · 2011-08-02 16:33 · Score: 5, Interesting

C. A. R. Hoare, the inventor of Quicksort, also invented the NULL pointer. Something he apologized for:

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.

--
"First they came for the slanderers and i said nothing."
1. Re:Maybe a better candidate by mike.mondy · 2011-08-02 18:53 · Score: 2
  
  Huh? Not allowing a "null" pointer or otherwise illegal pointer value makes no sense. Either the pointer represented by all zeros is a valid pointer or it's an illegal value. If it's not treated as somehow special or illegal, it's by definition valid. Which would not be nearly as useful as having it be illegal. At best, it would be treated like any other random bit pattern in a pointer -- maybe pointing to legal memory and maybe not. In most languages close to assembly, a valid all zeros pointer would probably point to the beginning of memory; in virtual memory systems, it would probably point to the beginning of the process's space. IIRC, Algol had "references" that were not pointers. Actually Algol is the language with pass-by-name and thunking where it's infamously impossible to write a swap(a,b) function that could handle something like swap(A[i], i).
  The word reference doesn't always mean exactly the same thing as pointer (See C++). I imagine Hoare did not mean "null pointer" when he said "null reference".
  Trivia: Multics had multiple illegal pointers. I think they were 0 for null, -1 for new-process, and -2 for disconnect process. (Terminal login sessions had a single process). The "new_process" command that threw away your (single) munged process and gave you a fresh clean one was implemented in maybe two lines of code looking vaguely like: declare pointer shiva = addr(-1); *shiva = 666;
This was modded offtopic by symbolset · 2011-08-02 16:36 · Score: 2, Insightful

Slashdot is lost.

--
Help stamp out iliturcy.
Whatever by Old+Wolf · 2011-08-02 16:37 · Score: 4, Funny

Come on , this is complete rubbish___8^)_#;3,2,.3root>^$)(^(943hellomax0984)_))1..l2l2_}[[}{
The cost of a byte - or was that the value? by Teunis · 2011-08-02 16:47 · Score: 2

hmm. marker character, or a length.

Marker: same type as string, so no need to worry about bit size, start/stop bits or other extraneous. String can be any size and only restricted by available memory. (given the ability to swap darn near unlimited pages in current hardware.... and the ability to virtualize across computers... this means strings have a potentially <i>infinite</i> limit)

Length: What's the size? What byte order? What bit size? How will this affect communications between platforms?

IMO, C and the null terminated string -saved- more than it cost. It's entirely (theoretically anyway) possible - given the kind of code I've seen in browsers and server code -that the web couldn't have existed without some of these assumptions. The "streaming" so core to unix depends on this... how else does one know when one hits the end of a file or a buffer?

When you mark cost, know what you pay. Not all costs are negative.
Slashdot Sensation Prevention Section by gmhowell · 2011-08-02 16:50 · Score: 4, Informative

FTA:

We learn from our mistakes, so let me say for the record, before somebody comes up with a catchy but totally misleading Internet headline for this article, that there is absolutely no way Ken, Dennis, and Brian could have foreseen the full consequences of their choice some 30 years ago, and they disclaimed all warranties back then. For all I know, it took at least 15 years before anybody realized why this subtle decision was a bad idea, and few, if any, of my own IT decisions have stood up that long.
In other words, Ken, Dennis, and Brian did the right thing.

--
Jesus was all right but his disciples were thick and ordinary. -John Lennon
Got it wrong by Spazmania · 2011-08-02 17:05 · Score: 3, Insightful

It probably wasn't about the bytes. The factors are:
1. Complexity. Without exception, every variable in C is an integer, a pointer or a struct. A null terminated string is a pointer to a series of integers -- barely one step more complex than a single integer. To keep the string length, you'd have to employ a struct. That or you'd have to create a magic type for strings that's on the same level as integers, pointers and structs. And you don't want to use a magic type because then you can't edit it as an array. Simplicity was important in C -- keep it close to the metal.
2. Computational efficiency. Many if not most operations on strings don't need to know how long they are. So why suffer the overhead of keeping track? That makes string operations on null terminated strings on average faster than string operations on a string bounded by an integer.
3. Bytes. It's only one extra byte with a magic type or an advanced topic struct. In both cases with an assumption that the maximum possible length on which the standard string functions will work is 64kb. If you're talking about a more mundane struct then you're talking about an int and a pointer to a block of memory which has an extra set of malloc overhead. That's a lot of extra bytes, not just one.
For the kind of language C aimed to be -- a replacement for assembly language -- the choice of null terminated strings was both obvious and correct.

--
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
1. Re:Got it wrong by PCM2 · 2011-08-02 17:15 · Score: 2
  
  Beyond those points:
  
  It is interesting to compare C's approach with that of two nearly contemporaneous languages, Algol 68 and Pascal [Jensen 74]. Arrays in Algol 68 either have fixed bounds, or are `flexible:' considerable mechanism is required both in the language definition, and in compilers, to accommodate flexible arrays (and not all compilers fully implement them.) Original Pascal had only fixed-sized arrays and strings, and this proved confining [Kernighan 81]. Later, this was partially fixed, though the resulting language is not yet universally available.
  C treats strings as arrays of characters conventionally terminated by a marker. Aside from one special rule about initialization by string literals, the semantics of strings are fully subsumed by more general rules governing all arrays, and as a result the language is simpler to describe and to translate than one incorporating the string as a unique data type. Some costs accrue from its approach: certain string operations are more expensive than in other designs because application code or a library routine must occasionally search for the end of a string, because few built-in operations are available, and because the burden of storage management for strings falls more heavily on the user. Nevertheless, C's approach to strings works well.
  And that's coming from Dennis Ritchie, who was there.
  
  --
  Breakfast served all day!
2. Re:Got it wrong by osu-neko · 2011-08-02 18:36 · Score: 2
  
  What are the common cases you are thinking of where C-style strings are faster?
  strcpy(char *d, char *s)
  {
  while ( *d++ = *s++ );
  }
  Challenge: come up with the equivalent for pascal-style strings in a way that doesn't compile into at least twice as much code.
  In fact, aside from strlen, are there any string functions that aren't made at least twice complex by using P-style instead of C-style strings? Most of strlib.c can be implemented as one-liners, assuming C-style strings.
  
  --
  "Convictions are more dangerous enemies of truth than lies."
Re:"typical and rational IT or CS decision" by perpenso · 2011-08-02 17:09 · Score: 2, Insightful

They don't look the same to me, these days the "IT" decisions are taken by the MBA type guys, with the sole purpose of maximizing their chances to get more visibility, "exceed objectives" and get a larger bonus/promotion/whatever. Sure they're rational too but what do they have in common with CS?
Programmer for 20+ years here, BS and MS in CS. I used to share such opinions. Then I went to business school. I really enjoyed business school in part because I was constantly amused by how ignorant and wrong I had been regarding such opinions. May I be bold enough to suggest that the portrayal of MBAs in popular and nerd cultures are about as accurate as the portrayal of programmers in popular and non-nerd cultures.

None of the above should be interpreted to mean that business school makes one appreciate Dilbert any less. Dilbert is actually pretty popular with MBA types and their professors as well.
PHK wide of the mark by epine · 2011-08-02 17:18 · Score: 5, Insightful

Normally I tend to agree with what I've read from PHK, but this one seems wide of the mark. If you involve a *real* C guru in the discussion, I don't think there would be much sentiment toward nixing the sentinel.
C makes a big deal about the equivalence of pointers and arrays. Plus in C a string also represents every suffix string.
char string [] = { 't', 'e', 's', 't', '\0' }; char* cdr_string = string + 1;
Perfectly valid, as God intended. A string with a length prefix is a hybrid data structure. What is the size of the length structure up front? It can be interesting in C to sort all suffixes of a string, having only one copy of the string itself. Try that with length prefix strings. (The trivial algorithm is far from ideal for large or degenerate character sequences, but it does provide insight into position trees and the Burrows-Wheeler transform.)
Nor would I blame all the stupid coding errors on the '\0' terminator convention. In C, a determined idiot can mess up just about anything, unless the compiler takes over and does things for you, a la Pascal by another name. If that had been the bias, would be all be using C now, or some other language? Repeat after me: Generativity Rocks. Nanny languages usually manage to bork generativity over. Correct Programming Made Easy never strays far from the subtitle Composition Made Difficult.
No one who ever read Dijkstra and took him serious ever made a tiny fraction of the stupid mistakes blamed on hapless zero.
If you want to point to a real steaming pile, strcpy() was designed by a moron with a bad hang-over and no copy of Dijkstra within a 100 mile radius. It was tantamount to declaring "you don't really need to test your preconditions ... what kind of sissy would do that?"
C is a nice design, as evidenced by how seamlessly the STL was grafted onto C++ at the abstraction layer (at the syntax layer, not so much). The problem with C was always a communication problem. To use C well one must test preconditions on operation validity. To use algebra well one must test preconditions on operation validity.
Where does PHK lay the blame for the algebraist who made it possible to divide both side of an equation by zero, or multiply an inequality by -1? Preferably with the complete moron who doesn't check preconditions on the validity of the operation. Two thousand years later, now we have a better solution?
PHK is right about cache hierarchies. By the time cache hierarchies arrived, we had C++ with entirely different string representations.
For some reason I've never been keen on having a programmer who can't manage to correctly test the precondition for buffer overflow making deep design decisions about little blocks of lead in the radiation path.
And it's not even much of a burden. As Dijkstra observed, for many algorithms, once you have all your preconditions right and you've got a provable variant, there's often very little left to decide. It actually makes the design of many algorithms simpler in the mode of divide and conquer: first get your preconditions and variant right (you're now half done and you've barely begun to think hard), *then* worry about additional logic constraints (or performance felicitous sequencing of legal alternatives).
The coders who first try to get their logical requirements correct and then puzzle out the preconditions do indeed make the original task more difficult than not bothering with preconditions at all, supposing there's some kind of accurate measure over crap solutions, which I refuse to concede.
1. Re:PHK wide of the mark by EvanED · 2011-08-02 18:30 · Score: 3, Insightful
  
  If you want to point to a real steaming pile, strcpy() was designed by a moron with a bad hang-over and no copy of Dijkstra within a 100 mile radius. It was tantamount to declaring "you don't really need to test your preconditions ... what kind of sissy would do that?"
  To play Devil's advocate, strcpy cannot check it's precondition. You can't tell whether a pointer you're given is valid, or how much space is left in the buffer.
  (Well, I guess you could go make malloc record far more information than it otherwise has to, and make strcpy grovel through that and some other data, but even I don't think that'd have been worth it. And I'm pretty far on the side of "why the heck are we using languages that are as unsafe as C".)
The trouble is arrays, not strings. by Animats · 2011-08-02 17:31 · Score: 3, Interesting

The problem with C isn't strings. It's arrays. Strings are just a special case of arrays.
Understand that when C came out, it barely had types. "structs" were not typed; field names were just offsets. All fields in all structs, program-wide, had to have unique names. There was no "typedef". There was no parameter type checking on function calls. There were no function pointers. All parameters were passed as "int" or "float", including pointers and chars. Strong typing and function prototypes came years later, with ANSI C.
This was rather lame, even for the late 1970s. Pascal was much more advanced at the time. Pascal powered much of the personal computer revolution, including the Macintosh. But you couldn't write an OS in Pascal at the time; it made too many assumptions about object formats. In particular, arrays had descriptors which contained length information, and this was incompatible with assembly-language code with other conventions. By design, C has no data layout conventions built into the language.
Why was C so lame? Because it had to run on PDP-11 machines, which were weaker than PCs. On a PC, at least you had 640Kb. On a PDP-11, you had 64Kb of data space and (on the later PDP-11 models) 64Kb of code space, for each program. The C compiler had to be crammed into that. That's why the original C is so dumb.
The price of this was a language with a built in lie - arrays are described as pointers. The language has no idea how big an array is, and there's not even a way to usefully talk about array size in C. This is the fundamental cause of buffer overflows. Millions of programs crash every day because of that problem.
That's how we got into this mess.
As I point out occasionally, the right answer would have been array syntax like
int read(int fd, char[n]& buf, size_t n);
That says buf is an array of length n, passed by reference. There's no array descriptor and no extra overhead, but the language now says what's actually going on. The classic syntax,
int read(int fd, char* buf, size_t n);
is a lie - you're not passing a pointer by value, you're passing an array by reference.
C++ tries to wallpaper over the problem by hiding it under a layer of templates, but the mold always seeps through the wallpaper when a C pointer is needed to call some API.
1. Re:The trouble is arrays, not strings. by hey+hey+hey · 2011-08-02 20:00 · Score: 2
  
  Why was C so lame? Because it had to run on PDP-11 machines, which were weaker than PCs. On a PC, at least you had 640Kb. On a PDP-11, you had 64Kb of data space and (on the later PDP-11 models) 64Kb of code space, for each program.
  
  Your relative comparisons are a bit off. The Altair from 1975 (the first versions of C were finished around 1973) had a whopping 1KB of memory. The mini computers of the day ran rings around what PCs there were, both in raw power and in memory.
Well I differ in my view. by hamster_nz · 2011-08-02 17:41 · Score: 3, Informative

After 25 years of using C, I don't mind the strings being terminated by nulls. If you want to do something else, just don't include string.h.
Terminating with a null is only a convention - the C language itself has no concept of strings. As others point out, it is either an array of bytes or a pointer to bytes.
it isn't forced on to you - you don't have to follow it.
1. Re:Well I differ in my view. by shutdown+-p+now · 2011-08-03 11:36 · Score: 2
  
  it isn't forced on to you - you don't have to follow it.
  It's forced in practice by the fact that the entire standard library, and all third-party libraries, all produce and consume null-terminated strings.
  What's far worse is that, since C FFI is the lowest common denominator that we have across various languages, null-terminated strings become the standard way to marshal strings between libraries written in different languages. This means many things: for one, no embedded nulls, which is bad for many scenarios where handling them is desired.
  For another, it means that high-level languages and frameworks often have to take C representation into account when designing their own strings, just so that they can be efficiently converted to a C string. For example, in Qt and .NET, strings have separately stored length, but they're also null-terminated just so that (assuming the string has no embedded nulls, which is otherwise valid) a pointer to the first character is a valid C string, and can be used to call some C API. This is especially sad when the library in question is itself written in another language that, in fact, has its own string representation which supports embedded nulls.
Re:Why not both? by c0lo · 2011-08-02 17:47 · Score: 2

I'll argue that's the correct decision at a such low-level as C.
1. with NULL-terminated strings, there's no distinction (other than in the string.h and related library) between a char * and a other_type *. Inventing a "string" type in C (not C++) would have made the compiler more complex (see footnote **)
2. because char * is no different than other_type* , I can pass the address in the middle of the string char * for processing. Not so much for a std::string. How does it matter? Well, take parsing for example (the most trivial strtok) not only that one will need an extra string-len prefix, but you'll need to keep a separate "curr_pos".
If you have a NULL-terminated char* string, one can invent/use a std::string (or GString, or NSString, or Pascal-string). The reverse is not true: having the compiler accepting only Pascal-strings, it's not possible to start using the NULL-terminated convention.

many uses are much easier and faster when we know the length and for others few things beat a null-terminated string.
While in other cases (when you pass a std::string by-value and invoke the copy constructor, which tends to happen a lot), you have a hefty performance penalty.
Footnote ** - Dennis M. Ritchie on the C history.

C treats strings as arrays of characters conventionally terminated by a marker. Aside from one special rule about initialization by string literals, the semantics of strings are fully subsumed by more general rules governing all arrays, and as a result the language is simpler to describe and to translate than one incorporating the string as a unique data type.

--
Questions raise, answers kill. Raise questions to stay alive.
Faster loops by Sloppy · 2011-08-02 17:55 · Score: 4, Insightful

TFA suggests the decision was to save a byte, but I don't believe that's the main reason it happened.
If you're traversing a string anyway, what happens is that when you load the data into your register (which you'll be doing anyway, for whatever reason you're traversing the string), you get a status flag set "for free" if it's zero, so that's your loop test right there. Branch if zero. If you have to compare an offset to a length on every iteration, then now you're having to store this offset in another register (great, like I have lots of registers to spare on 1970s CPUs?) and compare (i.e. subtract) to the length which is stored in memory (great, a memory access) or another register (oh great, I need to use another register in the 1970s?!) and the code is bigger and slower.
It's easy to laugh these days about anyone caring about how many clock cycles a loop takes and whether it uses 2 registers or 4 registers, but this stuff used to be pretty important (and more recently than the 1970s). Kids these days: if you weren't there, you just don't know what it was like.
BTW, I have a hunch K & R didn't know they were building such an eternal legacy. It's reasonable to speculate that this is still going to be part of systems a hundred years from now, but in 1970 you would have been a mad man to suggest such a thing. (Not that this invalidates TFA's point at all; I'm just making excuses for K&R I guess.)

--
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
Re:Actually tradeoff may not have been rational by c0lo · 2011-08-02 17:58 · Score: 2

this could have been a perfectly typical and rational IT or CS decision, like the many similar decisions we all make every day
Actually the tradeoff may not have been rational.
Actually, the choice was rational (at least, on purpose) - you see, it's not about a single byte, it's about a new data type.

C treats strings as arrays of characters conventionally terminated by a marker. Aside from one special rule about initialization by string literals, the semantics of strings are fully subsumed by more general rules governing all arrays, and as a result the language is simpler to describe and to translate than one incorporating the string as a unique data type. Some costs accrue from its approach: certain string operations are more expensive than in other designs because application code or a library routine must occasionally search for the end of a string, because few built-in operations are available, and because the burden of storage management for strings falls more heavily on the user.

--
Questions raise, answers kill. Raise questions to stay alive.
Re:Actually tradeoff may not have been rational by osu-neko · 2011-08-02 18:22 · Score: 2

Actually the tradeoff may not have been rational. The storage bytes saved may have been offset by the extra code bytes necessary for handling unknown length strings.
Not really, no. Having written basic library code for both, it usually requires more code to handle Pascal-style (length+data) strings than C-style (data+null) strings. You save quite a bit of code ("quite a bit" being relative, but I've had to squeeze code into 208 bytes of RAM before) by using the C-style strings most of the time.

--
"Convictions are more dangerous enemies of truth than lies."
Re:Why not both? by Psychotria · 2011-08-02 18:35 · Score: 2

Aside from your apparent confusion between NULL-terminated (0x00) and NUL-terminated ('\0') I completely agree.
Re:Everyone misunderstands that poem. by mwvdlee · 2011-08-02 18:51 · Score: 2

Which would seem to imply you have reason to believe the GP is incorrect in his interpretation of the poem.
Please enlighten us with your insights.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Be careful what you wish for... by dutchd00d · 2011-08-02 18:52 · Score: 2

If they had gone with the embedded length option we'd be sitting around bitching about how short-sighted it was to use just two bytes for the length. Including how Dennis Ritchie supposedly said "64K strings should be enough for anybody".
Re:The Road Not Taken by other Slashdot FPers by o'reor · 2011-08-02 19:18 · Score: 5, Funny

I for one welcome that refreshing new way of writing "Frost's pissed."

--
In Soviet Russia, our new overlords are belong to all your base.
Re:Why do I need a subject? by Darfeld · 2011-08-02 20:22 · Score: 2

But you reply to him knowing he won't read you? ( Or, if I let my paranoia run, you're AC and you reply so you have a link to your prior post and can check answers... And so you'ld be trolling.)

--
(\__/) This is Lapinator
(='.'=) copy it in your sig
(")_(") so it can take over the world
Frost comments on his own poem by doug141 · 2011-08-03 07:58 · Score: 2

http://poetrypages.lemon8.nl/life/roadnottaken/roadnottaken.htm Robert Frost on his own poetry: "One stanza of 'The Road Not Taken' was written while I was sitting on a sofa in the middle of England: Was found three or four years later, and I couldn't bear not to finish it. I wasn't thinking about myself there, but about a friend who had gone off to war, a person who, whichever road he went, would be sorry he didn't go the other. He was hard on himself that way."
This is why there are breadth reqs... by snowwrestler · 2011-08-03 11:25 · Score: 2

The narrator as "vain, shallow individual" is entirely a character pulled out of your hindquarters, as there is nothing in the text of the poem to lead to that conclusion.
Ahem.

The ironic interpretation, widely held by critics,[2][3] is that the poem is instead about making personal choices and rationalizing our decisions, whether with pride or with regret.
Source: http://en.wikipedia.org/wiki/The_Road_Not_Taken_(poem)
I'm tempted to bookmark this response as a great example of why engineers should not fear breadth requirements. (I'm assuming anyone with such a low Slashdot ID works in engineering...)
The ironic interpretation is widely held because it's supported not only by the text, but also Frost's own statements, and the broader context of his work--in which seemingly simple descriptive verse hides darker, more complex themes. (A major reason why he is held in such high regard.) This particular poem is a common subject for lessons on critical analysis of literature. The key starting point is that first-person narrators are not necessarily reliable.

--
Build a man a fire, he's warm for one night. Set him on fire, and he's warm for the rest of his life.