Slashdot Mirror


Mr. Pike, Tear Down This ASCII Wall!

theodp writes "To move forward with programming languages, argues Poul-Henning Kamp, we need to break free from the tyranny of ASCII. While Kamp admires programming language designers like the Father-of-Go Rob Pike, he simply can't forgive Pike for 'trying to cram an expressive syntax into the straitjacket of the 95 glyphs of ASCII when Unicode has been the new black for most of the past decade.' Kamp adds: 'For some reason computer people are so conservative that we still find it more uncompromisingly important for our source code to be compatible with a Teletype ASR-33 terminal and its 1963-vintage ASCII table than it is for us to be able to express our intentions clearly.' So, should the new Hello World look more like this?"

8 of 728 comments (clear)

  1. Fortress allows Unicode, but has ASCII equivalent by thisisauniqueid · · Score: 3, Interesting

    Fortress allows you to code in UTF-8. However it has a multi-char ASCII equivalent for every Unicode mathematical symbol that you can use, so there is a bijective map between the Unicode and ASCII versions of the source, and you can view/edit in either. That is the only acceptable way to advocate using Unicode anywhere in programming source other than string constants. Programming languages that use ASCII have done well over those that don't, for the same reason that Unicode has done well over binary formats.

  2. Haskell by kshade · · Score: 3, Interesting
    Haskell supports various unicode characters as operators and it makes me wanna to puke. http://hackage.haskell.org/trac/haskell-prime/wiki/UnicodeInHaskellSource IMO one of the great things about programming nowadays is that you can use descriptive names without feeling bad. Single character identifiers from different alphabets are something that rub me the wrong way in mathematics. Keep 'em out of my programming languages!

    Bullshit from the article:

    Unicode has the entire gamut of Greek letters, mathematical and technical symbols, brackets, brockets, sprockets, and weird and wonderful glyphs such as "Dentistry symbol light down and horizontal with wave" (0x23c7). Why do we still have to name variables OmegaZero when our computers now know how to render 0x03a9+0x2080 properly?

    OmegaZero is at least something everybody will recognize. And why would you name a variable like that anyway? It's programming, not math, use descriptive names.

    But programs are still decisively vertical, to the point of being horizontally challenged. Why can't we pull minor scopes and subroutines out in that right-hand space and thus make them supportive to the understanding of the main body of code?

    Because we're not using the same IDE?

    And need I remind anybody that you cannot buy a monochrome screen anymore? Syntax-coloring editors are the default. Why not make color part of the syntax? Why not tell the compiler about protected code regions by putting them on a framed light gray background? Or provide hints about likely and unlikely code paths with a green or red background tint?

    ... what?

    For some reason computer people are so conservative that we still find it more uncompromisingly important for our source code to be compatible with a Teletype ASR-33 terminal and its 1963-vintage ASCII table than it is for us to be able to express our intentions clearly.

    ... WHAT? If you don't express your intentions clearly in a program it won't work!

    And, yes, me too: I wrote this in vi(1), which is why the article does not have all the fancy Unicode glyphs in the first place.

    vim does Unicode just fine. And from the Wikipedia entry on the author (http://en.wikipedia.org/wiki/Poul-Henning_Kamp):

    A post by Poul-Henning is responsible for the widespread use of the term bikeshed colour to describe contentious but otherwise meaningless technical debates over trivialities in open source projects.

    Irony? Why does this guy come off as an idiot who got annoyed by VB in this article when he clearly should know better?

  3. Re:limiting? by Sycraft-fu · · Score: 3, Interesting

    For that matter, we could probably even get away with less letters. Some of them are redundant when you get down to it. What you need are enough letters that you can easily denote all the different sounds that are valid in a language. You don't have to have a dedicated letter for all of them either, it can be through combination (for example the oo in soothe) or through context sensitivity (such as the o in some in context with the e on the end). We could probably knock off a few characters if we tried. If that is worth it or not I don't know but we sure as hell shouldn't be looking at adding MORE.

    Also in terms of programming a big problem is that of ambiguity. Compilers can't handle it, their syntax and grammar is rigidly defined, as it must be. That's the reason we have programming languages rather than simply programming in a natural language: Natural language is too imprecise, a computer cannot parse it. We need a more rigidly defined language.

    Well as applied to unicode programming that means that languages are going to get way more complex if you want to provide an "English" version of C and then a "Chinese" version and a "French" version and so on where the commands, and possibly the grammar, differ slightly. It would get complex probably to the point of impossibility if you then want them to be able to be blended, where you could use different ones in the same function, or maybe on the same line.

  4. Re:huh by CensorshipDonkey · · Score: 4, Interesting

    Have you ever used a visual diagrammatic code language before, such as LabView? Every scientist I've ever met that had any experience writing code vastly prefers the C based LabWindows to the diagrammatic LabView - diagrammatic is simply a fucking pain in the ass. Reading someone else's program is an exercise in pain, and they are impossible to debug. Black and white, unambiguous plain text coding may not be pretty to look at but it is damn functional. Coding requires expressing yourself in an explicitly clear fashion, and that's what the current languages offer.

  5. Re:We've tried this before by SimonInOz · · Score: 4, Interesting

    Incredibly, I worked for a major investment company who had, indeed, done something useful in APL. In fact they had written their entire set of analysis routine in it, and deeply interwoven it with SQL. I had to untangle it all. (Would you beleive they had 6 page SQL stored procedures? No, nor did I - but they did).
    APL is great sometimes - especially if you happen to be a maths whizz and good at weird scripts. Not exactly easy to debug, though. Sort of a write-only language.

    For the last ten plus years, we have been steadily moving in the direction of more human readable data - the move to XML was supposed to be a huge improvement. It meant you could - sort of - read what was going on at ever level. It also meant we had a common interchange between multiple platforms.

    So you want to chuck all that away to get better symbols for programming? No, I don't think so.
    I must point out that the entire canon of English Literature is written in - surprise - English, and that's definitely ascii text. I don't think it has suffered due to lack of expressive capability.

    What does supriose me, though, is how fundementally weak our editors are. Programs, to me, are a collection of parts - objects, methods, etc, all with internal structure. We seem very poor at further abstracting that - why, oh tell me why, when I write a simple - trivial - bit of Java code, do I need to write funtions for getters and setters all over the place - dammit, just declare them as gettable and settable - or (to keep full source code compatibility) the editor could do it. Simply ,easily, tranparently. And why can't the editor hide everything except what I am concerned with?
    Microsoft does a better job of this in C#, but we could go much, much further. We seem stuck in the third generation language paradigm.

    --
    "Cats like plain crisps"
  6. Re:The thing with ASCII [COBOL 2.0?] by Tablizer · · Score: 5, Interesting

    This proposal isn't about giving programmers more power to code, it's about making it easier for non-english speakers who aren't coders to read the code that their programmers write.

    COBOL was originally designed so that managers and customers could read it. But in practice they rarely did because programming logic is typically too low-level and requires knowing the technical context to understand by a non-programmer and/or non-team member anyhow. Being "English-like" or grammatically proper didn't really help that goal in practice. This is why the idea was abandoned in later languages.

    Perhaps it's comparable to legalese. Making it proper English doesn't necessarily improve readability by non-lawyers. It's still gibberish to most of us without a legal background.

    It's not worth-while to slow down production programmers in a trade for the rare case where non-programmers will want to read code for an actual need (not just curiosity). Thus, it's an uneconomical requirement as long as there is such a trade-off.

  7. Re:The thing with ASCII by Chrisq · · Score: 3, Interesting

    Plus the fact that a spoken language changes - good chance you would not be able to understand English as it was spoken say 500 years ago. They would not only have used different words, also used a different pronunciation.

    That depends on what accents you are used to. Many Northern British and lowland Scotts accents were not changed by the "Great Vowel Shift" nearly as much as Southern English, Received Pronunciation, or General American.

    Being your slave, what should I do but tend
    Upon the hours and times of your desire?

    Will have an immediately obvious meaning when read in a lowland Scots accent

  8. Re:The thing with ASCII by TheRaven64 · · Score: 4, Interesting

    Apple's documentation in HTML form has a few of the standard ASCII characters replaced with other unicode characters. If you copy and paste into a text editor, you get compiler warnings which seem to be saying that they're expecting the character that is there. They also sometimes contain ligatures, which you don't notice unless you look one character at a time. One of the most irritating problems I found was on the Nouveau wiki a load of constants have 0x prefixes where the x is actually a unicode multiplication symbol. Copy them into the code and it looks right, but the compiler rejects it as an invalid constant type.

    --
    I am TheRaven on Soylent News