Slashdot Mirror


Mr. Pike, Tear Down This ASCII Wall!

theodp writes "To move forward with programming languages, argues Poul-Henning Kamp, we need to break free from the tyranny of ASCII. While Kamp admires programming language designers like the Father-of-Go Rob Pike, he simply can't forgive Pike for 'trying to cram an expressive syntax into the straitjacket of the 95 glyphs of ASCII when Unicode has been the new black for most of the past decade.' Kamp adds: 'For some reason computer people are so conservative that we still find it more uncompromisingly important for our source code to be compatible with a Teletype ASR-33 terminal and its 1963-vintage ASCII table than it is for us to be able to express our intentions clearly.' So, should the new Hello World look more like this?"

115 of 728 comments (clear)

  1. The thing with ASCII by enec · · Score: 5, Insightful

    The thing with ASCII is that it's easy to write on standard keyboards, and does not require a specialized layout. Once someone can cram the necessary unicode symbols into a keyboard so that I don't have to remember arcane meta-codes or fiddle with pressing five different dead keys to get one symbol, I'm all for it.

    --
    I'm sorry, I only accept criticism in the form of sed expressions.
    1. Re:The thing with ASCII by angus77 · · Score: 4, Informative

      Japanese is typed using a more-or-less standard QWERTY keyboard.

    2. Re:The thing with ASCII by MichaelSmith · · Score: 5, Informative

      Japanese is typed using a more-or-less standard QWERTY keyboard.

      Tediously.

    3. Re:The thing with ASCII by arth1 · · Score: 5, Insightful

      Once you've had to do an ad-hoc codefix through a serial console or telnet, you appreciate that you can write the code in 7-bit ASCII.

      It's not about being conservative. It's about being compatible. Compatibility is not a bad thing, even if it means you have to run your unicode text through a filter to embed it, or store it in external files or databases.

      It'd also be hell to do code review on unicode programs. You can't tell many of the symbols apart. Is that a hyphen or a soft hyphen at the end of that line? Or perhaps a minus? And is that a diameter sign, a zero, or the DaNo letter "Ø" over there? Why doesn't that multiplication work? Oh, someone used an asterisk instead of the multiplication symbol which looks the same in this font.

      No, thanks, keep it compatible, and parseable by humans, please.

    4. Re:The thing with ASCII by Ernesto+Alvarez · · Score: 3, Informative

      Japanese is typed using a more-or-less standard QWERTY keyboard.

      ...then requiring the input to pass through what amounts to a tokenizer to get the phonetic spelling, and into another program, which needs a database of words and has to prompt you for each one in order to select the proper one from a list.

      Not something as simple as writing ASCII by a long shot.

    5. Re:The thing with ASCII by BronsCon · · Score: 2, Informative

      I recommend that everyone GOAT SEe the parent video ASAP

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    6. Re:The thing with ASCII by Kagetsuki · · Score: 5, Informative

      I'm Japanese, so let me clarify how entering Japanese works here: Japanese is composed of two sets of Kana (characters with no meaning but they have a sound) and Kanji (characters with meaning). To enter a word in Japanese, let's say the word "Me/I" you would hit hit a key to activate your IME [input method editor] - usually the key on the top left of the keyboard, then type "watashi", just like that, and you would get in kana (hiragana). Next hit the space key, that converts it to kanji. Now hit enter to finish input or just start typing your next word. You can also enter multiple words, hit space, and then break up and convert the sentence all at once. It is not difficult, you don't actually need a special keyboard, and I've never heard of anybody capable of using a keyboard using voice recognition because they found the act of entering in words laborious.

    7. Re:The thing with ASCII by Tablizer · · Score: 3, Informative

      It'd also be hell to do code review on unicode programs. You can't tell many of the symbols apart. Is that a hyphen or a soft hyphen at the end of that line?

      If you want to test and/or frustrate a newbie, replace one of those in their program and see how long it takes them to fix it.

      The first time I ran into something like that it took me a good while. I ended up comparing hex dumps to find it. I should have just retyped the suspect code sections from scratch instead, but I was determined to get to the bottom of it and find out exactly why it crashed.

      I certainly turned me back into an ASCII fan.

    8. Re:The thing with ASCII by Anonymous Coward · · Score: 3, Funny

      I find the act of reading your descriptions laborious, and have decided to never bother learning Japanese just so I don't have to put up with that kind of thing EVER.

      If I could, I'd probably go about eliminating the whole language just as a gift to humanity.

      But I'm still working on my ultimate plan for destruction of English, and that has priority.

    9. Re:The thing with ASCII by znerk · · Score: 2, Informative

      Japanese characters are mostly sound-based rather than meaning-based, though a single Japanese character will generally map to two latin characters.

      I assume you're referring to the katakana, here... So, yes, using a phonetic set of approximately 50 characters, your writing will be sound-based.
      Unfortunately, you are also underinformed, as there are actually 3 character-based written languages in use in Japanese writing.
      Part of the problem, here, would be that the same (spoken) word can refer to many different concepts, and the (non-phonetic) written language reflects the meanings, rather than the pronunciation. For example:

      Some Japanese words are written with different kanji depending on the specific usage of the word—for instance, the word naosu (to fix, or to cure) is written as "" when it refers to curing a person, and "" when it refers to fixing an object.

      Bah, slashdot apparently doesn't like my attempt to use the characters. Whatever, the quoted text is from the linked article.

      --
      This work is licensed under a Creative Commons Attribution 3.0 Unported License.
    10. Re:The thing with ASCII by Z34107 · · Score: 3, Insightful

      I find the act of reading your descriptions laborious, and have decided to never bother learning Japanese just so I don't have to put up with that kind of thing EVER.

      "That kind of thing" is quite literally hitting the "space" key between words. I'm surprised you managed to put up with it long enough to finish your post.

      --
      DATABASE WOW WOW
    11. Re:The thing with ASCII by angus77 · · Score: 2, Interesting

      Japanese is typed using a more-or-less standard QWERTY keyboard.

      Tediously.

      Not in the least. I do it every day at work. It takes little more effort than writing in English. Unless, of course, your Japanese reading skills are not up to the job---but that won't be the fault of the keyboard.

      Please let me emphasize that typing with a QWERTY keyboard is the standard way of typing in Japan. In fact, despite the existence of other methods, I don't know a single person who actually uses those methods.

    12. Re:The thing with ASCII by Angst+Badger · · Score: 5, Insightful

      Funny you mention it, but the first thing I thought of was Japanese text entry, followed by the autocorrect/text-expansion facility that most word processors have, which is much the same thing applied to western languages. I've also thought it would be good to be able to make use of mathematical symbols for, you know, mathematics. The same could be said of word processor-like formatting for comments. I'm dubious about using it for actual code, but I'm open to having my mind changed about that.

      (Color-as-syntax has already been done in Chuck Moore's latest implementation of Forth. It's not a bad idea, though I suspect it works better with low-level languages like Forth than it would with a higher level language.)

      The second thing I thought of was what I always think when someone starts complaining about what languages should and shouldn't have, which is this: Quit bitching and go implement it, smart boy. Come up with something good, and I'll use it, but I am not about to run out and implement someone else's ideas. I have a day job where I get to do that all fucking day long, and they actually pay me. And contrary to popular belief, ideas are cheap and plentiful, including good ideas. The time, effort, and dedication that it takes to actually implement them are what's in short supply.

      --
      Proud member of the Weirdo-American community.
    13. Re:The thing with ASCII by Z34107 · · Score: 5, Funny

      Typing Japanese is exactly like typing in English - you press the "space" key between words. The IMEs are pretty smart, and usually the first kanji is the one you want. If it's not you might have to press "space" a second or third time, but it's rare to have to dig through a giant list of kanji to get what you want.

      So, you might have to hit the space key more often if you're typing Japanese. Or, you might not - you can space-to-kanji entire sentences at once, whilst the romance languages are stuck hitting space between every word like shmucks. Except for the Germans. I don't think their language uses spaces.

      The Japanese keyboard layout also types produces kana (most of which are romanized with two latin characters) rather than individual letters. Instead of typing w-a-t-a-s-h-i-space, you type wa-ta-shi-space.

      So, it's really not that bad. What's worse is the irony of seeing an article on slashdot complain about the persistence of ASCII. I mean, really now, slashdot.jp manages to display non-ASCII characters.

      --
      DATABASE WOW WOW
    14. Re:The thing with ASCII by rainer_d · · Score: 4, Funny

      Typing Japanese is exactly like typing in English - you press the "space" key between words. The IMEs are pretty smart, and usually the first kanji is the one you want. If it's not you might have to press "space" a second or third time, but it's rare to have to dig through a giant list of kanji to get what you want.

      So, you might have to hit the space key more often if you're typing Japanese. Or, you might not - you can space-to-kanji entire sentences at once, whilst the romance languages are stuck hitting space between every word like shmucks. Except for the Germans. I don't think their language uses spaces.

      NatürlichhabenwirLeerzeichen!

      --
      Windows 2000 - from the guys who brought us edlin
    15. Re:The thing with ASCII by angus77 · · Score: 2, Informative

      The kind of opinion you'd come to from checking Wikipedia rather than actually using it.

      Millions upon millions of Japanese (and some non-Japanese, like myself) have found the IMEs to be more than satisfactorily efficient and easy to use. Not only that, but they sometimes have predictive input as well (especially on cell phones), which makes typing in Japanese even faster and easier.

    16. Re:The thing with ASCII by BrokenHalo · · Score: 2, Interesting

      Seriously. This programmer (I use the term loosely) has problems with expression? If this is the case, he needs to go back to school and try learning assembly or fortran programming. Any program worth writing can be coded in fortran, and if it can't be coded in assembler, then it can't be done at all.

      If he really wants to go into creative writing, we might remind him that the 26 letters of the alphabet were good enough for Shakespeare.

    17. Re:The thing with ASCII by Dahamma · · Score: 2, Insightful

      This proposal isn't about giving programmers more power to code, it's about making it easier for non-english speakers who aren't coders to read the code that their programmers write.

      No, actually, it's not. Java already allows Unicode variable and function names. This is about using Unicode in basic syntax of the language, which is IMO idiotic if you ever want your language to be adopted. I mean, he says it himself in the last paragraph - he didn't use any Unicode in his article because he was using vi, which makes it difficult - not to mention even if it was doable, it would be tedious as hell with a standard keyboard.

    18. Re:The thing with ASCII by angus77 · · Score: 3, Insightful

      And we only use the Roman alphabet for English because it was a widespread standard, even though we already had a functioning writing system that suited Englisc better had worked for us for centuries (runes). We mangle the system with digraphs and multiple sounds for many of the characters (especially the vowels). It's a hack. We've made do.

    19. Re:The thing with ASCII by robbak · · Score: 2, Interesting

      i wonder if those with a non-alphabetic language, like the various Chineses or Japanese, would have chosen a keyboard at all? It seems to me that the keyboard is really designed around a language that uses a limited number of glyphs. Even the addition of dïaçrìtîçs are really hacks on the keyboard.

      --
      Prediction for end of Universe #42: Fencepost error in Quantum_bogosort.cpp
    20. Re:The thing with ASCII by Jurily · · Score: 5, Insightful

      If he really wants to go into creative writing, we might remind him that the 26 letters of the alphabet were good enough for Shakespeare.

      Exactly. Completely Missing The Point at its best.

      1. The idea behind modern programming is reducing complexity. That can't really be done by using symbols no other programmer has ever seen before.
      2. Most programming fonts go out of their way to make those symbols look distinct. You simply have to know if that's a zero or an upper-case O. Imagine trying to figure out if that there is a Greek upper-case Omega or a "Dentistry symbol light down and horizontal with wave" (taken from TFA).
      3. APL died for a reason.
      4. Author cites C++ operator overloading as a good thing. 'Nuff said.

    21. Re:The thing with ASCII by spongman · · Score: 2, Insightful

      Typing Japanese is exactly like typing in English

      hardly. when you type in english, you think of the word and you type in the letters of that word.

      when you type in japanese, you think of the word, then you have to translate it at least once (maybe twice) in your head before you have a list of roman letters to type. Then you have to assist the computer in guessing the reverse of the translations you just did. certainly, much of this is simple for the typist, and for the computer, but it's fundamentally different from typing a roman language.

    22. Re:The thing with ASCII by angus77 · · Score: 2, Informative

      500 kanji? Surely you're trolling? Elementary school children know more than that before they even get to Junior High. I know *I* can write more than that, and I've never even taken a calligraphy class. You couldn't read a (post-adolescent) *comic book* with only 1000 kanji, let alone a newspaper. And I know this because I read the newspaper every day(the Shizuoka Shinbun), not because I heard it from some asshat in an internet forum.

      I don't know the kanji for "bara", but I've definitely seen "kani" any number of times---not in texts, but definitely on signs and labels.

      "Arigatou" is certainly not something you'd see in kanji in texts, but I've been mailed with the kanji any number of times (and you'll certainly see it in the form "arigatai"). I doubt there's a junior high school graduate in this country who doesn't know the kanji for that.

    23. Re:The thing with ASCII by HonIsCool · · Score: 2, Insightful

      When I think of, por ejemplo, the word pronounced as 'hait' [*], I don't have to "translate" that at all. No, sir! Just type it straight in, exactly as it is pronounced: "height" of course! =)

      [*] IPA doesn't work on /.

      --
      "Give me six lines of C++ code written by the most competent programmer, and I will find enough in there to hang him."
    24. Re:The thing with ASCII by Bodrius · · Score: 2, Funny

      Wasn't there already a seminal paper on this topic?
      http://public.research.att.com/~bs/whitespace98.pdf

      --
      Freedom is the freedom to say 2+2=4, everything else follows...
    25. Re:The thing with ASCII by Chrisq · · Score: 3, Informative

      You know, this was tried. It was called APL. It sucked, and I mean, like the environment outside the ISS.

      I thought it sucked. You thought it sucked. A load of guys from the maths department that wanted to do quick mathematical computations loved it. APLwas not meaningless symbols to everyone.

    26. Re:The thing with ASCII by Chrisq · · Score: 3, Interesting

      Plus the fact that a spoken language changes - good chance you would not be able to understand English as it was spoken say 500 years ago. They would not only have used different words, also used a different pronunciation.

      That depends on what accents you are used to. Many Northern British and lowland Scotts accents were not changed by the "Great Vowel Shift" nearly as much as Southern English, Received Pronunciation, or General American.

      Being your slave, what should I do but tend
      Upon the hours and times of your desire?

      Will have an immediately obvious meaning when read in a lowland Scots accent

    27. Re:The thing with ASCII by TheRaven64 · · Score: 4, Interesting

      Apple's documentation in HTML form has a few of the standard ASCII characters replaced with other unicode characters. If you copy and paste into a text editor, you get compiler warnings which seem to be saying that they're expecting the character that is there. They also sometimes contain ligatures, which you don't notice unless you look one character at a time. One of the most irritating problems I found was on the Nouveau wiki a load of constants have 0x prefixes where the x is actually a unicode multiplication symbol. Copy them into the code and it looks right, but the compiler rejects it as an invalid constant type.

      --
      I am TheRaven on Soylent News
    28. Re:The thing with ASCII by NickFortune · · Score: 2, Insightful

      I thought it sucked. You thought it sucked. A load of guys from the maths department that wanted to do quick mathematical computations loved it. APLwas not meaningless symbols to everyone.

      Right. It's a niche language, very useful for a fairly narrow subset of programmers, but something of an impediment for the rest of us.

      The point is that using an expanded set of glyphs didn't, of itself, make a language that was widely useful, let alone better. At the same time, it brought considerable drawbacks, many of which have already been mentioned in this thread.

      Of course, that doesn't mean you couldn't leverage unicode to create a more expressive syntax. But TFA doesn't really have any ideas on how this is to be done apart from "obviously, more glyphs would be better", which I think APL disproves, at least in the general case.

      --
      Don't let THEM immanentize the Eschaton!
    29. Re:The thing with ASCII by Anonymous Coward · · Score: 2, Insightful

      Color-as-syntax has already been done [colorforth.com] in Chuck Moore's latest implementation of Forth. It's not a bad idea,

      As a color-blind person, I'd like to say... yes. Yes,it is.

    30. Re:The thing with ASCII by imakemusic · · Score: 2, Informative

      They have a point though. Presumably, if typing something up you would have to look back and forth between the source text and the screen as opposed to English where you can stare at the source text and be sure that when you press the "a" key you get an "a".

      --
      Brain surgery - it's not rocket science!
    31. Re:The thing with ASCII by fyngyrz · · Score: 5, Insightful

      As a martial artist of many decades, I have learned to read Chinese. Both traditional characters and the nasty simplified ones. So I'm well aware of up side - the power, and even beauty, of high-speed recognition from a large symbol set.

      But writing Chinese through a keyboard or a GUI has many cautionary lessons for us here that transfer directly to the idea of a many-symbol programming language. Take Python, for instance. A beautiful language in almost every way; visually well structured, minimalist in its core tools, yet so well thought out that it is almost unlimited in what can be done with it.

      If you were, say, to create a symbol for each Python grammar atom, you'd soon have a symbol set equal to or surpassing that required for college in China... thousands of them. This takes your average Chinese person many years to learn, by the way -- and it's non-technical.

      Now, assuming you've learned these in the first place, and stipulating that somehow, you've made them as beautiful and intuitive as the language itself, how do you select these symbols when programming? Therein lies the rub, and as no one yet has come up with a good answer for Chinese, I suspect the idea desert is just as dry for Python, or any other language one might like to turn into a concise symbolic tool.

      Now, speech has very fast mapping (although you get into context a lot... for instance "ma" can mean quite a few different things) to Chinese symbols, and so one could reasonably assume that it could also have reasonably fast mapping to my hypothetical Python symbols, but speech recognition isn't ready for this yet; and a programmer speaking "Pythonese" into a microphone isn't going to be a very good cube-mate, either.

      In the meantime, I'm quite convinced that ASCII is an excellent character set for programming, and that UNICODE belongs inside quotes for use in input and output parsing, no more, no less.

      APL suffered from all of this. You needed a special keyboard, or a GUI or other mechanism to input the "simple" symbol. You had to learn the symbolic mapping. It really represents a huge extra load in aim of simplification. All of which is completely unnecessary if you simply use ASCII. And frankly... the time it takes me to type sin(x) is going to beat your mapped keyboard input time until you've been doing it for 50 years. In which time I will have leveraged my ASCII toolkit into innumerable languages, and your APL toolkit is still only enabling you to work in APL.

      So like I said... ASCII.

      --
      I've fallen off your lawn, and I can't get up.
    32. Re:The thing with ASCII by Kagetsuki · · Score: 2

      Just to clarify again I wasn't trying to support the ideas in the article, just pointing out how Japanese was entered. But for the complex mathematical symbols I very much agree, the fact I can enter in the name of a symbol and actually get that symbol as a character is great. That does not however mean we should replace "!=" with "", I'd quickly get sick of having to constantly active the IME just to code.

    33. Re:The thing with ASCII by AmiMoJo · · Score: 2, Informative

      My experience is with Japanese but they share the Chinese writing system (as well as their own).

      While there are a large number of symbols most of them are made up of two or more other, simpler symbols. If you find a symbol you don't know you can often guess the general meaning just from the simpler ones it is made up from.

      That is not totally unlike how words in English work. Often they are made up of smaller parts or derived from other words.

      To bring this back to programming I'm not sure there is much to be gained by extending the available symbols. I don't feel any great desire to type the greater-than-or-equal-to symbol instead of >=.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    34. Re:The thing with ASCII by Luyseyal · · Score: 2, Informative

      I've also thought it would be good to be able to make use of mathematical symbols for, you know, mathematics. The same could be said of word processor-like formatting for comments. I'm dubious about using it for actual code, but I'm open to having my mind changed about that.

      Yeah, I like the idea of TeX-style typing that autoparses to a "nice" display. You can edit the display or drop to TeX (or Maple or whatever) input if you need more specificity.

      I'm not sure the benefit conveyed is sufficient to overcome the awkwardness (if you've ever used a Maple worksheet for programming, you'll understand what I mean), but I would like to see an editor take advantage of the beauty, even if the code itself is ASCII.

      -l

      --
      Help cure AIDS, cancer, and more. Donate your unused computer time to worldcommunitygrid.org. Join Team Slashdot!
  2. Don't we all know... by AsmCoder8088 · · Score: 2, Insightful

    "Syntactic sugar causes cancer of the semicolon" - Alan Perlis.

  3. Project Gutenberg by symbolset · · Score: 5, Insightful

    Michael decided to use this huge amount of computer time to search the public domain books that were stored in our libraries, and to digitize these books. He also decided to store the electronic texts (eTexts) in the simplest way, using the plain text format called Plain Vanilla ASCII, so they can be read easily by any machine, operating system or software.

    - Marie Lebert

    Since its humble beginnings in 1971 Project Gutenberg has reproduced and distributed thousands of works to millions of people in - ultimately - billions of copies. They support ePub now and simple HTML, as well as robo-read audio files, but the one format that has been stable this whole time has been ASCII. It's also the format that is likely to survive the longest without change. Project Gutenberg texts can now be read on every e-reader, smartphone, tablet and PC.

    If you want to use Rich Text format, or XML, or PostScript or something else then fine - please do. But don't go trying to deprecate ASCII.

    --
    Help stamp out iliturcy.
    1. Re:Project Gutenberg by shutdown+-p+now · · Score: 5, Insightful

      If you want to use Rich Text format, or XML, or PostScript or something else then fine - please do. But don't go trying to deprecate ASCII.

      This is false dichotomy. Plain text can be non-ASCII, and ASCII doesn't necessarily imply plain text. All the formats you've listed allow to add either visual or semantic markup to text, whereas ASCII is simply a way to encode individual characters from a certain specific set. They do not propose to move to rich text for coding, but to move away from ASCII.

      There are still many reasonable arguments against it, but this isn't one of them.

    2. Re:Project Gutenberg by Netbrian · · Score: 5, Informative

      This is untrue.

      First off, Simplified and Traiditional characters are separated in Unicode.

      Second off, Cyrillic characters and Latin characters have always been considered two different scripts, while Chinese logographs are considered to be the same script, used in different contexts.

      See http://unicode.org/notes/tn26/.

      In any event, it would make good sense for programming environments to be able to handle Unicode source.

    3. Re:Project Gutenberg by pz · · Score: 5, Insightful

      When I was a young graduate student building my first experimental setup, a professor who was older and wiser than me suggested that data should be saved in ASCII whenever possible because space was relatively inexpensive and time is always scarce. Although I thought that a bit odd, I did follow his advice.

      The result? I can use almost any editor to read my data files from the very start of my career, closing in on 30 years ago. Just this past week, that was an important factor in salvaging some recently-collected data. In contrast, I can't always read the MS Word files -- an example of an extended character set -- from even a few years ago, and I sure as hell can't view them in almost any editor. Sure, with enough time, I can or could, figure out how to read them, but, as the wise professor rightly pointed out, time is scarce.

      Thus, compatibility is important, and the most compatible data and document format is human-readable plain ASCII.

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    4. Re:Project Gutenberg by shutdown+-p+now · · Score: 2, Insightful

      Good point. Maybe one day Unicode will win out

      It's not a question anymore. Unicode has already won. The sheer amount of other specifications and standards that reference various versions of Unicode spec is such that it's going to stick around for decades to come.

      Yes, we still don't have 100% support in software (but we do have 99%). Time will fix that.

      or perhaps EBCDIC will have a resurgence. 'twixt now and then it's best to write the text in ascii, perhaps with a well-documented human-readable escape table for symbols that aren't represented - perhaps even a complete Unicode escape table current to the document.

      For programming languages, we already have that - \u1234 or \U12345678 are used as escape sequences in C++, Java and C# for just this purpose. There's nothing stopping an IDE from rendering them as if they were actual symbols and not escape sequences, too, though I haven't seen that in practice.

      But this is purely an encoding issue, not a character set issue, which is what TFA is about. They are asking why we still design languages with syntax that is restricted to characters only present in the ASCII character set, even though Unicode has many handy symbols that can represent the same things better and/or shorter. Quote:

      Unicode has the entire gamut of Greek letters, mathematical and technical symbols, brackets, brockets, sprockets, and weird and wonderful glyphs such as "Dentistry symbol light down and horizontal with wave" (0x23c7). Why do we still have to name variables OmegaZero when our computers now know how to render 0x03a9+0x2080 properly?

      For a good example of what is possible there, have a look at Fortress [PDF] programming language, which uses various traditional math symbols heavily.

      Unicode? How many revisions will Unicode see between now and then? Thousands?

      Unicode has been there for 18 years now (the second volume of Unicode 1.0 spec was published in 1992), and we've seen 5 revisions, so the rate is roughly 1 per 3.5 years. Assuming it stays the same, we're looking at Unicode 35.0 by 2100. But it won't, because in practice it will slow down eventually as we add most (and, eventually, all) scripts that we know and care about. In fact, if you look at the recent additions to the standard, they do not affect the vast majority of texts ever created in any way.

      On the other hand, it doesn't really matter in the slightest, since Unicode versions are all backwards-compatible (characters get added, but never removed or moved around). Assuming that trait persists, they'll just use the most recent version of the spec available to them.

      But then why would things be any different for ASCII-encoded text with escapes for Unicode characters? You'd still need a Unicode character table to make sense of those escapes.

      It would seem that you're arguing that any character set other than basic Latin is not future-proof. This implies that any text written in any language other than English is also not future-proof. I think this assertion is rather Anglo-centric, and not very realistic.

    5. Re:Project Gutenberg by the_womble · · Score: 3, Insightful

      The article is talking about using unicode, not a proprietary format. Do you think it likely that future text editors will be able to handle ASCII but not UTF-8?

  4. huh by stoolpigeon · · Score: 3, Insightful

    so we should start coding in Chinese?

    Seems easier to spell words with a small set of symbols than to learn a new symbol for every item in a huge set of terms.

    --
    It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
    1. Re:huh by MightyYar · · Score: 4, Insightful

      so we should start coding in Chinese?

      Exactly! Keep the "alphabet" small, but the possible combination of "words" infinite.

      You don't need a glyph for "=>" for instance. Anyone who knows what = and > mean individually can discern the meaning.

      And further (I know, why RTFA?):

      But programs are still decisively vertical, to the point of being horizontally challenged. Why can't we pull minor scopes and subroutines out in that right-hand space and thus make them supportive to the understanding of the main body of code?

      This is easily done with a split screen, and sounds like an editor feature to me. Not sure why you'd want a programming language that was tied to monitor size and aspect ratio.

      Why not make color part of the syntax? Why not tell the compiler about protected code regions by putting them on a framed light gray background? Or provide hints about likely and unlikely code paths with a green or red background tint?

      Again, if you want this, do it in the editor. Doesn't he know anyone who is colorblind? And even a normally sighted user can only differentiate so many color choices, which would limit the language. And forget looking up things on Google: "Meaning of green highlighted code"... no wait "Meaning of hunter-green highlighted code" hmmmm... "Meaning of light-green highlighted code"... you get the idea.

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    2. Re:huh by jonbryce · · Score: 2, Interesting

      No, but I think the idea of being able to draw flowcharts on the screen and attach code to each of the boxes could be an idea that has mileage.

    3. Re:huh by CensorshipDonkey · · Score: 4, Interesting

      Have you ever used a visual diagrammatic code language before, such as LabView? Every scientist I've ever met that had any experience writing code vastly prefers the C based LabWindows to the diagrammatic LabView - diagrammatic is simply a fucking pain in the ass. Reading someone else's program is an exercise in pain, and they are impossible to debug. Black and white, unambiguous plain text coding may not be pretty to look at but it is damn functional. Coding requires expressing yourself in an explicitly clear fashion, and that's what the current languages offer.

    4. Re:huh by ScrewMaster · · Score: 5, Insightful

      diagrammatic is simply a fucking pain in the ass.

      Amen.

      Every scientist I've ever met that had any experience writing code vastly prefers the C based LabWindows to the diagrammatic LabView

      Well, I'm not a scientist, just a humble software engineer, and back in my contract coding days I was always faced by managers that would try to push me to use LabView. They had this mistaken belief that because it was "visual" they could a. understand it and b. thought it was simpler and c. thought I should charge less if I used it.

      I told them that a. it's still programming, and beyond a certain level of complexity understanding still requires sufficient knowledge and b. refer to a. and c. if they were going to force me to waste time fighting such an environment up 'til the point where I found something critical that it couldn't do (such as run fast enough) and would end up re-coding the right way anyway, they damn well weren't going to pay me less.

      --
      The higher the technology, the sharper that two-edged sword.
    5. Re:huh by mr_mischief · · Score: 3, Insightful

      Let someone who reads and writes Chinese develop a programming language with Chinese keywords and syntax, then. Programming in English-like languages has largely been a waste of time, remember. English keywords are great, but using English syntax for a programming language is a nightmare. Everyone uses a syntax that's simpler than English. Even Perl's grammar is simpler than English, and that grammar is massive compared to most programming languages.

    6. Re:huh by tftp · · Score: 4, Insightful

      it might be fully resonable to have classes related to financial years (finansår), close of year (årsavslutning), the tax report (årsoppgave) and so on.

      And one day the code is sold to China or India, and then people there can't even find a way to enter the glyph. Same if a visiting programmer has to work on the code, or if you need to send a class to another country for some reason.

      How far Linux would get if Linus decided to use Finnish (or Swedish) words written with all the proper UNICODE characters for all the variables and types?

    7. Re:huh by bh_doc · · Score: 3, Insightful

      As a scientist who has a fair bit of coding experience, including LabVIEW, ++ this.

      What particularly annoys me about visual code like LabVIEW is that you can't diff. So change tracking is a pain in the arse, and forget distributed development.

      LabVIEW itself is good for setting up a quick UI and connecting things to it, but any serious processing? ...No, thanks. If I could get my hands on something else that had the UI prototyping ease, connectivity to experimental devices (motion controllers, for example), but based on a textual language, I'd be a happy camper. (There are some things that come close, I'm sure, though I've not had the time to properly search. Busy scientist is busy...)

  5. Learn2code by santax · · Score: 4, Insightful

    I can express my intentions just fine with ASCII. They have cunningly invented a system for that. It's called language and it comes in very handy. The only thing I would consider missing is a pile of shit-character. I could use that one right now.

    1. Re:Learn2code by MightyYar · · Score: 3, Funny

      You mean "@"? Looks like a pile of shit to me.

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    2. Re:Learn2code by santax · · Score: 2, Informative

      Oh crap... I guess you can forget about my earlier comment. I'm adopting unicode as we speak! U+1F4A9 ftw.

    3. Re:Learn2code by Noughmad · · Score: 5, Funny

      I don't know about you, but I have a pile-of-shit key on my keyboard, right between the left Ctrl and Alt.

      --
      PlusFive Slashdot reader for Android. Can post comments.
    4. Re:Learn2code by isorox · · Score: 2, Interesting

      I don't know about you, but I have a pile-of-shit key on my keyboard, right between the left Ctrl and Alt.

      It's a very useful "meta" key. Aside from controlling my music from amarok, I have a variety of mappings set up, Meta-s shades the window I'm using, Meta-R pops up a run dialog, Meta-CapsLock pops up an rxvt terminal window, Meta-F4 runs xrandr --auto and reconfigures when I plug in an external monitor.

      (Capslock itself is mapped to Escape, which I find a lot easier on the wrists on my laptop than using the real escape key -- I rebound it about 5 years ago when my escape key broke and haven't looked back)

  6. Yes, Unicode is "the new black" by Antique+Geekmeister · · Score: 2, Informative

    Yes, it's the next fad that just _everyone_ has to wear. this season. Within 5 years, it will be something else, and given the ability of major vendors like Microsoft to get Unicode _wrong_, it's not stable for mission critical applications. If you want your code to remain parseable and cross-platform compatible and stable in both large and small tools, write it in flat, 7-bit ASCII. You also get a significant performance benefit from avoiding the testing and decoding and localization and most especially the _testing_ costs for multiple regions.

    Look up "microsoft unicode error" on Google for hundreds if not thousands of examples. ASCII for code is like flat text for email. It assures that you're not simply publishing coding spam, and actually wrote what you meant.

    1. Re:Yes, Unicode is "the new black" by shutdown+-p+now · · Score: 2, Insightful

      Yes, it's the next fad that just _everyone_ has to wear. this season. Within 5 years, it will be something else

      Unicode has been around for, what, over 15 years now? It's part of countless specifications from W3C and ISO. All modern OSes and DEs (Windows, OS X, KDE, Gnome) use one or another encoding of Unicode as the default representation for strings. No, it's not going away anytime soon.

      If you want your code to remain parseable and cross-platform compatible and stable in both large and small tools, write it in flat, 7-bit ASCII.

      This may be a piece of good advice. Even for languages where Unicode in the source is officially allowed by the spec (e.g. Java or C#), many third-party tools are broken in that regard.

      You also get a significant performance benefit from avoiding the testing and decoding and localization and most especially the _testing_ costs for multiple regions.

      I don't see how this has any relevance to your previous point (writing the source code in ASCII). If your app source is in Unicode, it will still compile (or not compile) the same on any locale. And what would you be you testing? The compiler?

      I've no idea what "decoding and localization" means in this context, either.

      Well, unless you're also advocating for the use of ASCII as the default runtime string encoding in apps, and completely forgoing localization. Which is fine if you only intend your app to be used in the USA, I guess (and even then, considering take-up of Spanish, it may not be such a wise idea).

    2. Re:Yes, Unicode is "the new black" by Anonymous Coward · · Score: 2, Insightful

      "Yes, it's the next fad that just _everyone_ has to wear. this season."

      Like the Metric System.

    3. Re:Yes, Unicode is "the new black" by scdeimos · · Score: 3, Insightful

      Unicode has been around for, what, over 15 years now? It's part of countless specifications from W3C and ISO. All modern OSes and DEs (Windows, OS X, KDE, Gnome) use one or another encoding of Unicode as the default representation for strings. No, it's not going away anytime soon.

      And yet major vendors like Microsoft still get Unicode wrong. A couple of examples:

      • Windows Find/Search cannot find matches in Unicode text files, surely one of the simplest file formats of all, even though the command line FIND tool can (unless you install/enable Windows Indexing Service which then cripples the system with its stupid default indexing policies). This has been broken since Windows NT 4.0.
      • Microsoft Excel cannot open Unicode CSV and tab-delimited files automatically (i.e.: by drag-and-drop or double-click from Explorer) - you have to go through Excel's File/Open menu and go through the stupid import wizard.
      • Abuse of Unicode code points by various Office apps, causing interoperability issues even amongst themselves.
    4. Re:Yes, Unicode is "the new black" by shutdown+-p+now · · Score: 2, Interesting

      From your reference to Latin-1, I suspect you're from Western Europe, then. If so, then you guys didn't have it all that bad - most non-Unicode-aware apps are not truly ASCII (since we don't have 7-bit bytes around), and so the default encoding more often than not is Latin-1. Even if Americans mostly use it for "funny chars" like special quotation marks etc, you end up with a bunch of useful symbols as well. And your text doesn't end up all garbled.

      For folks from Eastern European countries, especially those with non-Latin-based alphabets - like mine - it's a rather different story. Extrapolating that, it must really suck for people with more "exotic" requirements, like Arabic or Chinese...

    5. Re:Yes, Unicode is "the new black" by icebraining · · Score: 5, Insightful

      I'm Portuguese and our language uses accents, but if I ever get a source code file with accents in variable names I'll insult the person. Writing with accents in programming serves absolutely no purpose and it only causes problems. It's slower (two key presses instead of one), it's less compatible, it can be troublesome if I need to send the code to someone without accents in the keyboard, etc.

      In fact, not only I disagree with accents in programming, but I prefer writing all the names in English. Where would OSS be if all the Gnome devs had to learn Spanish to contribute to De Icaza's code, or Finnish to contribute to Linux?

  7. We've tried this before by FeatherBoa · · Score: 4, Informative

    Everyone who tried to do something useful in APL, put up your hand.

    1. Re:We've tried this before by SimonInOz · · Score: 4, Interesting

      Incredibly, I worked for a major investment company who had, indeed, done something useful in APL. In fact they had written their entire set of analysis routine in it, and deeply interwoven it with SQL. I had to untangle it all. (Would you beleive they had 6 page SQL stored procedures? No, nor did I - but they did).
      APL is great sometimes - especially if you happen to be a maths whizz and good at weird scripts. Not exactly easy to debug, though. Sort of a write-only language.

      For the last ten plus years, we have been steadily moving in the direction of more human readable data - the move to XML was supposed to be a huge improvement. It meant you could - sort of - read what was going on at ever level. It also meant we had a common interchange between multiple platforms.

      So you want to chuck all that away to get better symbols for programming? No, I don't think so.
      I must point out that the entire canon of English Literature is written in - surprise - English, and that's definitely ascii text. I don't think it has suffered due to lack of expressive capability.

      What does supriose me, though, is how fundementally weak our editors are. Programs, to me, are a collection of parts - objects, methods, etc, all with internal structure. We seem very poor at further abstracting that - why, oh tell me why, when I write a simple - trivial - bit of Java code, do I need to write funtions for getters and setters all over the place - dammit, just declare them as gettable and settable - or (to keep full source code compatibility) the editor could do it. Simply ,easily, tranparently. And why can't the editor hide everything except what I am concerned with?
      Microsoft does a better job of this in C#, but we could go much, much further. We seem stuck in the third generation language paradigm.

      --
      "Cats like plain crisps"
    2. Re:We've tried this before by 644bd346996 · · Score: 2, Insightful

      Try reading an EULA and then come back and tell me that English is sufficiently expressive as-is.

    3. Re:We've tried this before by cgenman · · Score: 2, Insightful

      Things I would love to see standard in all new editors:

      1. Little triangles that hide blocks of code unless you explicitly open and investigate them.
      2. Dynamic error detection. Give me a little underline when I write out a variable that hasn't been defined yet. Give a soft red background to lines of code that wouldn't compile. That sort of thing.
      3. While we're at-it, "warning" colors. When "=" is used in a conditional, for example, that's an unusual situation that should be underlined in Yellow.
      4. Hard auto-indent. It may be two spaces in the source code, but accidentally copying the indentation, and putting it in the wrong places, etc, should just be taken care of. That shouldn't even be an issue any more.
      5. Code-hint hover. When you hover over a function name, bring up a window with the first few lines of that function. Maybe open it in a "related code" pane?
      6. Right-click to jump to anything. Right-click a variable to jump to the declaration, or goto other places it is used. Right-click a class name to bring up that class definition.
      7. Start typing out a function, and get a menu of variable-specific functions that can be called. Flash actually does this surprisingly well, or did before CS5.

    4. Re:We've tried this before by Yetihehe · · Score: 2, Interesting

      1. Little triangles that hide blocks of code unless you explicitly open and investigate them.

      Netbeans. (view > code folds > collapse all)

      2. Dynamic error detection. Give me a little underline when I write out a variable that hasn't been defined yet. Give a soft red background to lines of code that wouldn't compile. That sort of thing.

      Netbeans.

      3. While we're at-it, "warning" colors. When "=" is used in a conditional, for example, that's an unusual situation that should be underlined in Yellow.

      Netbeans, but not background, it gives you little yellow icons on left side of code and yellow lines near scrollbar (to track errors in whole document).

      4. Hard auto-indent. It may be two spaces in the source code, but accidentally copying the indentation, and putting it in the wrong places, etc, should just be taken care of. That shouldn't even be an issue any more.

      Netbeans. (ctrl+shift+v - paste formatted).

      5. Code-hint hover. When you hover over a function name, bring up a window with the first few lines of that function. Maybe open it in a "related code" pane?

      Netbeans. If you use comments before functions, it will show those.

      6. Right-click to jump to anything. Right-click a variable to jump to the declaration, or goto other places it is used. Right-click a class name to bring up that class definition.

      Netbeans. But with ctrl+click.

      7. Start typing out a function, and get a menu of variable-specific functions that can be called. Flash actually does this surprisingly well, or did before CS5.

      Netbeans. Also flash did this surprisingly bad comparing to netbeans.

      Another nice feature: ctrl+shift+arrow down - copies current line or selection and inserts it lower (+arrow up - inserts it above). It's a surpirisingly good idea, one I miss in many other editors.

      --
      Extreme Programming - Redundant Array of Inexpensive Developers
  8. If you can't express yourself in ASCII... by MaggieL · · Score: 4, Funny

    ...the character set isn't the problem.

    And I say this as an old APL coder.

    (There aren't many new APL coders.)

    --
    -=Maggie Leber=-
  9. Re:Examples? by 0123456 · · Score: 3, Funny

    So, what are his ideas?

    EBCDIC?

  10. It all winds up as binary anyway. by foodnugget · · Score: 4, Funny

    How silly of us to be compiling to binary all this time!
    We've been relegating ourselves to only two different options for decades!

    I reckon that a memory cell and single bit of a processor opcode should have --at least-- 7000 different possibilities. Think of everything a computer could accomplish *then*!

    Seriously, someone tell this guy you're allowed to use more than one character to represent a concept or action, and that these groups of characters represent things rather well.

  11. It ain't broke! by webbiedave · · Score: 5, Insightful

    Let's take our precious time on this planet to fix what's broken, not break what has clearly worked.

  12. Not only no, by Anonymous Coward · · Score: 4, Funny

    but fuck no.
    I eagerly await comments saying how anglo-centric, racist, bigoted, culturally-imperialist the insistence of using ASCII is.
    The nuanced indignation is salve for my frantic masturbation.
    (If my post is the only one that mentions this, all the better)

    1. Re:Not only no, by sznupi · · Score: 2, Insightful

      Also: Slashdot would never, ever, ever be able to display code snippets of such thing.

      --
      One that hath name thou can not otter
  13. limiting? by Tei · · Score: 2, Insightful

    the chinese have problems to learn his own language, because have all that signs, it make it unncesary complex.

    26 letter lets you write anything, you dont need more letters, really. ask any novelist.

    also, programming languages are something international, and not all keyboards have all keys, even keys like { or } are not on all keyboards, so tryiing to use funny characters like ñ would make programming for some people really hard.

    all in all, this is not a very smart idea , imho

    --

    -Woof woof woof!

    1. Re:limiting? by Sycraft-fu · · Score: 3, Interesting

      For that matter, we could probably even get away with less letters. Some of them are redundant when you get down to it. What you need are enough letters that you can easily denote all the different sounds that are valid in a language. You don't have to have a dedicated letter for all of them either, it can be through combination (for example the oo in soothe) or through context sensitivity (such as the o in some in context with the e on the end). We could probably knock off a few characters if we tried. If that is worth it or not I don't know but we sure as hell shouldn't be looking at adding MORE.

      Also in terms of programming a big problem is that of ambiguity. Compilers can't handle it, their syntax and grammar is rigidly defined, as it must be. That's the reason we have programming languages rather than simply programming in a natural language: Natural language is too imprecise, a computer cannot parse it. We need a more rigidly defined language.

      Well as applied to unicode programming that means that languages are going to get way more complex if you want to provide an "English" version of C and then a "Chinese" version and a "French" version and so on where the commands, and possibly the grammar, differ slightly. It would get complex probably to the point of impossibility if you then want them to be able to be blended, where you could use different ones in the same function, or maybe on the same line.

    2. Re:limiting? by yuje · · Score: 3, Informative
      China has greater than 90% literacy, and the more advanced Chinese speaking societies (Hong Kong, Taiwan, Macau, Singapore) basically have full Chinese literacy. While Japan uses a smaller subset of those characters, the Japanese have full literacy and seemed to have functioned perfectly well while retaining those characters in their writing system. The Chinese people hardly have problems learning, reading, or writing their own language.

      the chinese have problems to learn his own language, because have all that signs, it make it unncesary complex.

      26 letter lets you write anything, you dont need more letters, really. ask any novelist.

      also, programming languages are something international, and not all keyboards have all keys, even keys like { or } are not on all keyboards, so tryiing to use funny characters like ñ would make programming for some people really hard.

      all in all, this is not a very smart idea , imho

      Judging by your post, it appears that you have problems learning your own language. It certainly appears that simple spelling, capitalization, punctuation and correct grammar in the English language are apparently beyond your abilities.

    3. Re:limiting? by pipatron · · Score: 2, Insightful

      Judging by your post, it appears that you have problems learning your own language. It certainly appears that simple spelling, capitalization, punctuation and correct grammar in the English language are apparently beyond your abilities.

      Did it ever occur to you that the person you replied to isn't a native English speaker?

      --
      c++; /* this makes c bigger but returns the old value */
  14. This is nonsense by Kohath · · Score: 4, Insightful

    Programming languages usually have too much syntax and too much expressiveness, not too little. We don't need them to be even more cryptic and even more laden with hidden pitfalls for someone who is new, or imperfectly vigilant, or just makes a mistake.

    If anything, programming needs to be less specific. Tell the system what you're trying to do and let the tools write the code and optimize it for your architecture.

    We don't need longer character sets. We don't need more programming languages or more language features. We need more productive tools, software that adapts to multithreaded operation and GPU-like processors, tools that prevent mistakes and security bugs, and ways to express software behavior that are straightforward enough to actually be self-documenting or easily explained fully with short comments.

    Focusing on improving programming languages is rearranging the deck chairs.

    1. Re:This is nonsense by Twinbee · · Score: 2, Interesting

      One day, I think we'll have a universal language that everyone uses (yeah English would suit me, but I don't care as long as whatever language it is, everyone uses it). Efficiency would rocket through the roof, and hence we'll save billions or trillions of pounds.

      In the same way, we'll all be using a single programming language too (even if that language combines more than one paradigm). Yes competition is good in the mean time, but I mean ultimately. It'll be as fast as C or machine code, but as readable as a much higher level language. It won't have baggage such as headers or be unnecessarily verbose either.

      Until that point, we need to do a lot more to improve languages, and it won't just be deckchair arranging.

      --
      Why OpalCalc is the best Windows calc
  15. No we don't by Sycraft-fu · · Score: 4, Informative

    Because I don't want to have to own a 2000 key keyboard, or alternatively learn a shitload of special key combos to produce all sorts of symbols. The usefulness of ASCII, and just of the English/Germanic/Latin character set and Arabic numerals in general is that it is fairly small. You don't need many individual glyphs to represent what you are talking about. A normal 101 key keyboard is enough to type it out and have enough extra keys for controls that we need.

    To see the real absurdity of it, apply the same logic to the numerals of the character set. Let's stop using Arabic numerals, let's use something more. Let's have special symbols to denote commonly used values (like 20, 25, 100, 1000). Let's have different number sets for different bases so that a 3 can be told what base its in just by the way it looks! ...

    Or maybe not. Maybe we should stick with the Arabic numerals. There's a reason they are so widely used: The Indians/Arabs got it right. It is simple, direct, and we can represent any number we need easily. Combining them with simple character indicators like H to indicate hex works just fine for base as well.

    You might notice that even languages that don't use the English/ASCII character set tend to use keyboards that use it. Japanese and Chinese enter transliterated expressions that the computer then interprets as glyphs. Doesn't have to be that way, they could different keyboards, some of them rather large depending on the character set being used, but they don't. It is easy and convenient to just use the smaller, widely used, character set.

    Now none of this means that you can't use Unicode in code, that strings can't be stored using it, that programs can't display it. Indeed most programs these days can handle it, just fine. However to start coding in it? To try and design languages to interpret it? To make things more complex for their own sake? Why?

    I am just trying to figure out what he thinks would be gained here. Also remembering that the programming languages, the compilers, would need to be changed at the low level. Compilers do not take ambiguity, if a command is going to change from a string of ASCII characters to a single unicode one, that has to be changed in the compiler, made clear in the language specs and so on.

  16. ASCII art is cool! by Joe+The+Dragon · · Score: 4, Insightful

    ASCII art is cool!

  17. What about Sun's Fortress language by philgross · · Score: 4, Informative

    Sun's Fortress language allowed you to use real, LaTeX-formatted math as source code. They reasoned, correctly I think, that for the mathematically literate, this would make the programs far clearer. Google for Fortress Programming Language Tutorial.

  18. Fortress allows Unicode, but has ASCII equivalent by thisisauniqueid · · Score: 3, Interesting

    Fortress allows you to code in UTF-8. However it has a multi-char ASCII equivalent for every Unicode mathematical symbol that you can use, so there is a bijective map between the Unicode and ASCII versions of the source, and you can view/edit in either. That is the only acceptable way to advocate using Unicode anywhere in programming source other than string constants. Programming languages that use ASCII have done well over those that don't, for the same reason that Unicode has done well over binary formats.

  19. Haskell by kshade · · Score: 3, Interesting
    Haskell supports various unicode characters as operators and it makes me wanna to puke. http://hackage.haskell.org/trac/haskell-prime/wiki/UnicodeInHaskellSource IMO one of the great things about programming nowadays is that you can use descriptive names without feeling bad. Single character identifiers from different alphabets are something that rub me the wrong way in mathematics. Keep 'em out of my programming languages!

    Bullshit from the article:

    Unicode has the entire gamut of Greek letters, mathematical and technical symbols, brackets, brockets, sprockets, and weird and wonderful glyphs such as "Dentistry symbol light down and horizontal with wave" (0x23c7). Why do we still have to name variables OmegaZero when our computers now know how to render 0x03a9+0x2080 properly?

    OmegaZero is at least something everybody will recognize. And why would you name a variable like that anyway? It's programming, not math, use descriptive names.

    But programs are still decisively vertical, to the point of being horizontally challenged. Why can't we pull minor scopes and subroutines out in that right-hand space and thus make them supportive to the understanding of the main body of code?

    Because we're not using the same IDE?

    And need I remind anybody that you cannot buy a monochrome screen anymore? Syntax-coloring editors are the default. Why not make color part of the syntax? Why not tell the compiler about protected code regions by putting them on a framed light gray background? Or provide hints about likely and unlikely code paths with a green or red background tint?

    ... what?

    For some reason computer people are so conservative that we still find it more uncompromisingly important for our source code to be compatible with a Teletype ASR-33 terminal and its 1963-vintage ASCII table than it is for us to be able to express our intentions clearly.

    ... WHAT? If you don't express your intentions clearly in a program it won't work!

    And, yes, me too: I wrote this in vi(1), which is why the article does not have all the fancy Unicode glyphs in the first place.

    vim does Unicode just fine. And from the Wikipedia entry on the author (http://en.wikipedia.org/wiki/Poul-Henning_Kamp):

    A post by Poul-Henning is responsible for the widespread use of the term bikeshed colour to describe contentious but otherwise meaningless technical debates over trivialities in open source projects.

    Irony? Why does this guy come off as an idiot who got annoyed by VB in this article when he clearly should know better?

  20. Re:Perl 6 by russotto · · Score: 2, Interesting

    Sure, but Perl is often derided as a "write only language", and Perl 6 is simply continuing the tradition.

  21. Idiocracy Hospital Keyboard by theodp · · Score: 2, Interesting
  22. Wingdings of Disease by theodp · · Score: 2, Funny
  23. Author seems to be high or something by Tridus · · Score: 5, Insightful
    He comes up with a bunch of ideas at the end that are out to lunch. Let's take a look:

    Unicode has the entire gamut of Greek letters, mathematical and technical symbols, brackets, brockets, sprockets, and weird and wonderful glyphs such as "Dentistry symbol light down and horizontal with wave" (0x23c7). Why do we still have to name variables OmegaZero when our computers now know how to render 0x03a9+0x2080 properly?

    Well, let's think. Possibly because nobody knows what 0x03a9+0x2080 does without looking it up, and nobody seeing the character it produces would know how to type said character again without looking it up? I know consulting a wall-sized "how to type X" chart is the first thing I want to do every 3 lines of code.

    While we are at it, have you noticed that screens are getting wider and wider these days, and that today's text processing programs have absolutely no problem with multiple columns, insert displays, and hanging enclosures being placed in that space? But programs are still decisively vertical, to the point of being horizontally challenged. Why can't we pull minor scopes and subroutines out in that right-hand space and thus make them supportive to the understanding of the main body of code?

    If you actually look at word processing programs, the document is also highly vertical. The horizontal stuff is stuff like notes, comments, revisions, and so on. Putting source code comments on the side might be a useful idea, but putting the code over there won't be unless the goal is to make it harder to read. (That said, widescreen monitors suck for programming.)

    And need I remind anybody that you cannot buy a monochrome screen anymore? Syntax-coloring editors are the default. Why not make color part of the syntax? Why not tell the compiler about protected code regions by putting them on a framed light gray background? Or provide hints about likely and unlikely code paths with a green or red background tint?

    So anybody who has some color-blindness (which is not a small number) can't understand your program? Or maybe we should make a red + do something different then a blue +? That's great once you do it six times, then it's just a mess. (Now if you want to have the code editor put protected regions on a framed light gray background, sure. But there's nothing wrong with sticking "protected" in front of it to define what it is.) It seems like he's trying to solve a problem that doesn't really exist by doing something that's a whole lot worse.

    --
    -- "So they told me that using the download page to download something was not something they anticipated." - Bill Gates
  24. Re:Examples? by izomiac · · Score: 2, Interesting

    From TFA apparently he wants to be able to use (Omega) to name a variable, and ÷ (Division Sign) as an operator. My interpretation of his opinion is that a descriptive name for a variable is inferior to using greek letters, and that using mathematical operators that take an extra five or so keystrokes are superior to the standard +-*/^ set that people have become accustomed to.

    IMHO, if you use more than 26 single letter variables something is seriously wrong, and trying to make mathematical formulas pretty in code isn't practical without a whole lot of unneeded complexity. Sure, having an eight line formula with fractions within fractions and tiny exponent numbers might be (slightly) better than five layers of parenthesis, but you aren't going to get that with just unicode (AFAIK), and the pain of dealing with a slightly misplaced term confounding the unicode to math converter isn't one I'd like to experience. Unicode or even LaTeX code for comments might be useful though.

  25. Would it be less tedious to have 10,000+ keys? by Sycraft-fu · · Score: 5, Insightful

    Because that's what you find in JIS X 0213:2000. Even if you simplify it to just what is needed for basic literacy, you are talking 2000 characters. If you have that many characters your choices are either a lot of keys, a lot of modifier keys, or some kind of transliteration which is what it done now. There is just no way around this. You cannot have a language that is composed of a ton of glyphs but yet also have some extremely simple, small, entry system.

    You can have a simple system with few characters, like we do now, but you have to enter multiple ones to specify the glyph you want. You could have a direct entry system where one keypress is one glyph, but you'd need a massive amount of keys. You could have a system with a small number of keys and a ton of modifier keys, but then you have to remember what modifier, or modifier combination, gives what. There is no easy, small, direct system, there cannot be.

    Also, is it any more tedious than any Latin/Germanic language that only uses a small character set? While you may enter more characters than final glyphs, do you enter more characters than you would to express the same idea in French or English?

    1. Re:Would it be less tedious to have 10,000+ keys? by SnapShot · · Score: 2, Informative

      When it was first announced (5 years ago now?), I thought the Optimus Maximus keyboard was going to solve this problem. With a little smarts built into the keyboard I wouldn't mind esoteric key combinations if the result was displayed directly on the keyboard. Something like this might, someday, be the solution but at $1500 dollars it's going to be a while and assuming a direct-brain interface doesn't come first.

      --
      Waltz, nymph, for quick jigs vex Bud.
    2. Re:Would it be less tedious to have 10,000+ keys? by MichaelSmith · · Score: 2, Interesting

      But few people really look at keyboards. Our fingers know where the button will be. I don't want to hunt and peck for special characters.

    3. Re:Would it be less tedious to have 10,000+ keys? by siride · · Score: 4, Funny

      Why bother? We already have machines that are good at that: two year olds. Two year olds aren't good at doing trend analysis on a million data points, which is why we have computers. We'd gain pretty much nothing from making a silicon-based two year old. It'd probably be just as slow and would cost considerably more than a two year old.

    4. Re:Would it be less tedious to have 10,000+ keys? by The+Mighty+Buzzard · · Score: 3, Insightful

      Hunt and peck? I don't even want to have to remember that many glyphs exist, much less where to find them. If it can't be expressed with a standard qwerty keyboard and one (shift) modifier key, it's too fucking complicated to bother with as general text entry.

      --
      Violence is like duct tape. If it doesn't solve the problem, you didn't use enough.
    5. Re:Would it be less tedious to have 10,000+ keys? by the_womble · · Score: 3, Funny

      You do not have kids do you? I can assure you that the cumulative cost of a two year old, starting from the first pre-natal medical costs, including lost work and productivity, food, drink, accommodation, etc. is considerable.

      Unlike computers, kids get more expensive every year, and there are laws about getting them to do useful work.

    6. Re:Would it be less tedious to have 10,000+ keys? by gknoy · · Score: 3, Informative

      It'll be interesting when you go to write some Perl code with your pen+tablet. The text recognition assumes you're writing in a natural language, so braces and punctuation are often tedious to get right. Write some basic Perl (with hashes, arrays, and some scalars) on your local handwriting-recognizing device, and let us know how amusing it is.

    7. Re:Would it be less tedious to have 10,000+ keys? by TheLink · · Score: 4, Insightful

      So how are you going to tell the difference between:
      a) a hyphen
      b) a dash
      c) a minus sign

      And worse the different unicode versions of hyphens and dashes:

      http://en.wikipedia.org/wiki/Hyphen#Unicode
      http://en.wikipedia.org/wiki/Dash#Common_dashes

      Yes, there's more than one unicode hyphen and dash! There are plenty of confusing characters like that too.

      So for programming you're still going to have to stick to a subset for keywords and symbols, and not use the full "tons of glyphs". Or at least you're going to need an entry system that allows you to switch.

      Maybe that Poul guy just wants a few extra symbols for some stuff. Good luck with that, many already complain about perl :).

      --
  26. it's not ASCII to blame by lkcl · · Score: 4, Insightful

    the point has been entirely missed, and blame placed on ASCII [correlation is not causation]. when you look at the early languages - FORTH, LISP, APL, and later even Awk and Perl, you have to remember that these languages were living in an era of vastly less memory. FORTH interpreters fit into 1k with room to spare for goodness sake! these languages tried desperately to save as much space and resources as possible, at the expense of readability.

    it's therefore easy to place blame onto ASCII itself.

    then you have compiled languages like c, c++, and interpreted ones like Python. these languages happily support unicode - but you look at free software applications written in those languages and they're still by and large kept to under 80 chars in length per line - why is that? it's because the simplest tools are not those moronic IDEs; the simplest programming tools for editing are straightfoward ASCII text editors: vi and (god help us) emacs. so by declaring that "Thou Shalt Use A Unicode Editor For This Language" you've just shot the chances of success of any such language stone dead: no self-respecting systems programmer is going to touch it.

    not only that, but you also have the issue of international communication and collaboration. if the editor allows Kanji, Cyrillic, Chinese and Greek, contributors are quite likely to type comments in Kanji, Cyrillic, Chinese and Greek. the end-result is that every single damn programmer who wants to contribute must not only install Kanji, Cyrillic, Chinese and Greek unicode fonts, but also they must be able to read and understand Kanji, Cyrillic, Chinese and Greek. again: you've just destroyed the possibility of collaboration by terminating communication and understanding.

    then, also, you have the issue of revision control, diffs and patches. by moving to unicode, git svn bazaar mercury and cvs all have to be updated to understand how to treat unicode files - which they can't (they'll treat it as binary) - in order to identify lines that are added or removed, rather than store the entire file on each revision. bear in mind that you've just doubled (or quadrupled, for UCS-4) the amount of space required to store the revisions in the revision control systems' back-end database, and bear in mind that git repositories such as linux2.6 are 650mb if you're lucky (and webkit 1gb) you have enough of a problem with space for big repositories as it is!

    but before that, you have to update the unix diff command and the unix patch command to do likewise. then, you also have to update git-format-patch and the git-am commands to be able to create and mail patches in unicode format (not straight SMTP ASCII). then you also have to stop using standard xterm and standard console for development, and move to a Unicode-capable terminal, but you also have to update the unix commands "more" and "less" to be able to display unicode diffs.

    there are good reasons why ASCII - the lowest common denominator - is used in programming languages: the development tools revolve around ASCII, the editors revolve around ASCII, the internationally-recognised language of choice (english) fits into ASCII. and, as said right at the beginning, the only reason why stupid obtuse symbols instead of straightforward words were picked was to cram as much into as little memory as possible. well, to some extent, as you can see with the development tools nightmare described above, it's still necessary to save space, making UNICODE a pretty stupid choice.

    lastly it's worth mentioning python's easy readability and its bang-per-buck ratio. by designing the language properly, you can still get vast amounts of work done in a very compact space. unlike, for example java, which doesn't even have multiple inheritance for god's sake, and the usual development paradigm is through an IDE not a text editor. more space is wasted through fundamental limitations in the language and the "de-facto" GUI development environment than through any "blame" attached to ASCII.

  27. Re:The thing with ASCII [COBOL 2.0?] by Tablizer · · Score: 5, Interesting

    This proposal isn't about giving programmers more power to code, it's about making it easier for non-english speakers who aren't coders to read the code that their programmers write.

    COBOL was originally designed so that managers and customers could read it. But in practice they rarely did because programming logic is typically too low-level and requires knowing the technical context to understand by a non-programmer and/or non-team member anyhow. Being "English-like" or grammatically proper didn't really help that goal in practice. This is why the idea was abandoned in later languages.

    Perhaps it's comparable to legalese. Making it proper English doesn't necessarily improve readability by non-lawyers. It's still gibberish to most of us without a legal background.

    It's not worth-while to slow down production programmers in a trade for the rare case where non-programmers will want to read code for an actual need (not just curiosity). Thus, it's an uneconomical requirement as long as there is such a trade-off.

  28. Grep on ascii rules by goombah99 · · Score: 2, Insightful

    Grep on ascii is more than 100x faster for complex string expressions. THere's a lot of good reasons not to use unicode.

    --
    Some drink at the fountain of knowledge. Others just gargle.
  29. Article author didn't read spec by kongtomorrow · · Score: 2, Informative
    Mr. Pike _did_ tear down the wall. The author didn't read the spec for Pike's language. From the article:

    Unicode has the entire gamut of Greek letters, mathematical and technical symbols, brackets, brockets, sprockets, and weird and wonderful glyphs such as "Dentistry symbol light down and horizontal with wave" (0x23c7). Why do we still have to name variables OmegaZero when our computers now know how to render 0x03a9+0x2080 properly?

    The go spec is defined in terms of unicode, and specifically gives non-ascii characters as example identifiers. Go source code is defined to be UTF-8.

  30. Re:Go Cry at the Romans by Zobeid · · Score: 2, Informative

    I've read that story before, and it's very neat. It's just too bad there's so little truth to it. Here's an example where it really falls apart: "As the railroads were built they were built using the same standard width of all the wagons since the tools had been standardized to that width." Anybody with casual knowledge of railway history should remember the crazy profusion of different -- widely varying -- gauge standards in the early days.

  31. vim, svn, etc. can handle utf8 just fine ... by MadMaverick9 · · Score: 2, Insightful
    From TFA:

    And, yes, me too: I wrote this in vi(1), which is why the article does not have all the fancy Unicode glyphs in the first place.

    Excuse me - vim can handle utf-8 just fine. utf-8 file names and utf-8 content. on a vanilla slackware 13.1.
    http://www.cl.cam.ac.uk/~mgk25/unicode.html#apps [cam.ac.uk]
    # Vim (the popular clone of the classic vi editor) supports UTF-8 with wide characters and up to two combining characters starting from version 6.0.
    # Emacs has quite good basic UTF-8 support starting from version 21.3. Emacs 23 changed the internal encoding to UTF-8.
    And svn can handle utf-8 as well - http://svnbook.red-bean.com/en/1.4/svn.advanced.l10n.html [red-bean.com].

    The repository stores all paths, filenames, and log messages in Unicode, encoded as UTF-8.

    All it requires is ... set your locale and lang. "export LANG=en_DK.utf8" in "/etc/profile.d/lang.sh" (Slackware 13.1) and add some better fonts maybe.

    I apologize for repeating myself. I've written the same thing further down already in reply to another user's post. But I just read tfa and felt the need to reply to the author of tfa.

  32. Re:visual GUI-based programming by rubycodez · · Score: 2, Funny

    visual programming has stagnated because it produces crap. Exhibit A, Microsoft Windows. Exhibit B, all Microsoft Applications not acquired by Microsoft.

    GUI code wizard 'tards, hated to have them on my coding teams....

  33. Re:visual GUI-based programming by MadMaverick9 · · Score: 2, Insightful

    I blame the cult of Unix/Linux to some degree. The whole OS and all its tools and standards are based on ASCII text

    you ever heard of the nls_utf8 kernel module? ever seen the "LANG" environment variable? set it to "en_DK.utf8" for example and you're ready to go.
    vim, svn, rm, mv, cp can handle utf8 just fine. this being on slackware 13.1.

  34. French and English are quite different by Pezbian · · Score: 2, Informative

    I worked for a Canada-based company and one of the magazines in the break room was Forces Quebec. It was something about packaging technology and had the articles written in both English and French, as is standard in Canada.

    The bilingual nature isn't what caught my eye, though. What caught my eye was the fact that the typeface for the French articles was just plain smaller in order to fit more text in a certain space. It looked to me like the same page real estate was dedicated to each language, but the typeface for the French text was set to a smaller point size with tight kerning and spacing.

    No wonder French people talk so fast. They have to!

    In fact, when I mentioned the same thing to one of my coworkers, a Mexico native, he wasn't surprised at all. He said the same is true for Spanish as well.

    When he told me that, I remembered Cheech Marin's "Born in East L.A." where he sings about being deported to Mexico despite being a US citizen "Next thing I know I'm in a foreign land. People talkin so fast I could not understand."

    --
    In a world of the blind, the one-eyed man is king--and the two-eyed man is a heretic.
  35. Re:visual GUI-based programming by santax · · Score: 4, Insightful

    Visual programming isn't big for the same reason people talk and not use drawings to communicate in day to day life. A decent well explained and understood language is faster, universal and more convenient. Drawings are used in situations where you can't communicate true a spoken or written language. As a replacement tool. It's very basic since with a spoken or written language you can uniformly have so much more precise interpretation of your intentions. Same goes for visual programming at this moment in time. I won't say there isn't a future for it, but as a replacement tool for the tried and tested programming environments it has a long way to go. Come up with a visual programming system for writing actually sophisticated code and you might have yourself a winner. Only party that comes in mind is Labview from NI.

  36. very bad idea by t2t10 · · Score: 2, Insightful

    Using full Unicode for programming causes lots of problems; even string equality is a tricky proposition for Unicode, let alone precise parsing. Most people don't even know how to enter Unicode characters not found in their own language. And once you allow Unicode, people will do things like they did in APL.

    The only place Unicode should be allowed--if at all--is in comments. Everything else should be in ASCII.

  37. I wouldn't consider Mr. Pike an authority on by melted · · Score: 2, Funny

    I wouldn't consider Mr. Pike an authority on programming language design. At Google, he's known for designing Sawzall (described here: http://static.googleusercontent.com/externIal_content/untrusted_dlcp/research.google.com/en/us/archive/sawzall-sciprog.pdf) - a language that's so feature poor, esoteric, and ass-backwards, that Google engineers curse at length every time they have to use it. And use it they have, since it's darn near impossible, for various reasons, to do certain things without it. Try as I may, I don't see anything in Go that would make it better than half a dozen existing alternatives. It's like reinventing the bicycle again, but this time with square wheels and without the saddle. Yes, you guessed it right, that's where that pipe goes on this particular bicycle.

  38. If ASCII was good enough for Jesus Christ by garethw · · Score: 2, Funny

    ... it should be good enough for anyone. Just sayin'...

    --
    garethw
  39. pros? by Charliemopps · · Score: 2, Insightful

    Ok, so everyone agrees this is a stupid idea... but are there ANY pros? I just don't understand the premiss at all...

  40. The trouble with huge character sets. by Animats · · Score: 2, Interesting

    This has come up in the context of domain names, where a long, painful set of rules has been devised to try to prevent having two domain names which look similar but are different to DNS. If exact equality of text matters, it's helpful to have a limited character set for identifiers.

    There's currently a debate underway on Wikipedia over whether user names with unusual characters should be allowed. This isn't a language question; the issue is willful obfuscation by users who choose names with hard-to-type characters.

    As for having more operators, it's probably not worth it. It's been tried; both MIT and Stanford had, at one time, custom character sets, with most of the standard mathematical operators on the keys. This never caught on. In fact, operator overloading is usually a lose. Python ran into this. "+" was overloaded for concatenation. Then somebody decided that "*" should be overloaded, so that "a" + "a" was equivalent to 2*"a". The result is thus "aa". This leads to results like 2*"10" being "1010". The big mistake was defining a mixed-mode overload.

    In C++, mixed-mode overloads are fully supported by the template system and a nightmare when reading code.

    In Mathematica, the standard representation for math uses long names for functions, completely avoiding the macho terseness the math community has historically embraced.

  41. Don't take everything so seriously by glassware · · Score: 3, Insightful

    I'm truly saddened to see so many people took this article summary so literally. If you read TFA, it's actually a very bright, intelligent, humorous example of programming insight. I found it a very delightful read and I wholeheartedly felt that the article presented its thoughts lightheartedly and without expectation of seriousness. To hear all the commenters here, it's as if the article ran puppies over with a steamroller.

    Please guys - I'm all for silly commentary. But read the article if you're going to pretend to write something clever. It's thoroughly tongue-in-cheek.

  42. Unicode in C, C++ and Perl by rl117 · · Score: 2, Informative

    One thing many people aren't aware of is that for several years now (since GCC3), GCC and G++ accept UTF-8 as their default input encoding, and internally store narrow and wide strings as UTF-8 and UTF-32, respectively. It's recoded to the output stream locale when you do any output. This means you can write your source code in Unicode (in strings and comments at least) and it all works perfectly. It has full support in the C and C++ standard libraries. I've been using it for years; it works perfectly. It would be nice to get support for UTF-8 symbols in the linker, so we can have UTF-8 variable names as well. The same applies to Perl, though perl6 even gives you the ability to have Unicode operators, and possibly variable names.

    I do routinely use UTF-8 symbols in R (example: "deltaCt" can be replace with the actual Delta symbol [Slashdot ate the Unicode--seriously poor!]). It makes the code more readable, and entry isn't the massive issue people make it out to be. AltGr/compose keys handle the common symbols, and you can look up the few odd ones that aren't in the compose tables.

    Having the ability to use Unicode does not in any way detract from the ability to use ASCII. Since ASCII is a strict Unicode subset, the ability to use Unicode imposes zero overhead on those who wish to stick with ASCII, so the extent of the hate seen for wanting a bit of progress is a bit shocking. People pointed out how unreadable code could be made, but the reality is that when used sensibly and judiciously, it can make code more concise and readable.

    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522776 for information about some of the issues.
    Having native Unicode support end-to-end by default is still a goal we want to achieve; the ASCII C locale is the last holdout. Getting a UTF-8 C locale is the last remaining step, though it'll take a few years to get there.

    Regarding editing Unicode sources, both Emacs and vim have pretty decent Unicode support, and Linux distributions have had unicode support for a decade now, and really good support for at least six years. Broken tools are no longer an excuse for not using Unicode.

    Regards,
    Roger

  43. Re:BASIC translations by arth1 · · Score: 2, Insightful

    One problem is that word-for-word translations don't work. Other languages have both cases and genders applied to words, and often a different sentence structure too. Should "LET A=10" become "A=10 LASSEN"? What about Russian where the gender is significant? Or Japanese, where the status between speaker and listener determines the word? And what about right-to-left languages? Or top-to-bottom ones? But the biggest problems are, of course, compatibility and maintainability. You can't hire consultants who don't speak the language. And what if you branch out from Iceland to Sweden? Will you hire Swedes who speak Icelandic, or port all your apps to Swedish and maintain two different versions and prohibit unported e-mail attachments? Ask yourself why Microsoft doesn't have localized Office Basic anymore.

  44. Hardly - more like the different JVM languages by ZmeiGorynych · · Score: 2, Insightful

    I really, really don't think so. Different tools for different jobs - a language for writing reliable infrastructure should look very very different from a language for exploration of datasets, for example - the first one must place emphasis on reliability and performance, the second on flexibility. Eg adding members to data structures on the fly is a great idea in the second case, but not in the first.

    Sure you can try to sweep that under 'different paradigms', and indeed you could mix two arbitrary languages in the same file using some delimited blocks for example, and call it 'one language with different paradigms', but why would you want to? The convoluted multi-paradigm monstrosity that is C++ is a terrible example to us all there, in my opinion.

    I think instead the shape of the future will be more like all those different languages that compile on the JVM - jython, Scala, Lua, and whatnot. They compile into interoperable modules without extra hassle, so in each module you can use the right tool for the job at hand.