Slashdot Mirror


Mr. Pike, Tear Down This ASCII Wall!

theodp writes "To move forward with programming languages, argues Poul-Henning Kamp, we need to break free from the tyranny of ASCII. While Kamp admires programming language designers like the Father-of-Go Rob Pike, he simply can't forgive Pike for 'trying to cram an expressive syntax into the straitjacket of the 95 glyphs of ASCII when Unicode has been the new black for most of the past decade.' Kamp adds: 'For some reason computer people are so conservative that we still find it more uncompromisingly important for our source code to be compatible with a Teletype ASR-33 terminal and its 1963-vintage ASCII table than it is for us to be able to express our intentions clearly.' So, should the new Hello World look more like this?"

19 of 728 comments (clear)

  1. The thing with ASCII by enec · · Score: 5, Insightful

    The thing with ASCII is that it's easy to write on standard keyboards, and does not require a specialized layout. Once someone can cram the necessary unicode symbols into a keyboard so that I don't have to remember arcane meta-codes or fiddle with pressing five different dead keys to get one symbol, I'm all for it.

    --
    I'm sorry, I only accept criticism in the form of sed expressions.
    1. Re:The thing with ASCII by MichaelSmith · · Score: 5, Informative

      Japanese is typed using a more-or-less standard QWERTY keyboard.

      Tediously.

    2. Re:The thing with ASCII by arth1 · · Score: 5, Insightful

      Once you've had to do an ad-hoc codefix through a serial console or telnet, you appreciate that you can write the code in 7-bit ASCII.

      It's not about being conservative. It's about being compatible. Compatibility is not a bad thing, even if it means you have to run your unicode text through a filter to embed it, or store it in external files or databases.

      It'd also be hell to do code review on unicode programs. You can't tell many of the symbols apart. Is that a hyphen or a soft hyphen at the end of that line? Or perhaps a minus? And is that a diameter sign, a zero, or the DaNo letter "Ø" over there? Why doesn't that multiplication work? Oh, someone used an asterisk instead of the multiplication symbol which looks the same in this font.

      No, thanks, keep it compatible, and parseable by humans, please.

    3. Re:The thing with ASCII by Kagetsuki · · Score: 5, Informative

      I'm Japanese, so let me clarify how entering Japanese works here: Japanese is composed of two sets of Kana (characters with no meaning but they have a sound) and Kanji (characters with meaning). To enter a word in Japanese, let's say the word "Me/I" you would hit hit a key to activate your IME [input method editor] - usually the key on the top left of the keyboard, then type "watashi", just like that, and you would get in kana (hiragana). Next hit the space key, that converts it to kanji. Now hit enter to finish input or just start typing your next word. You can also enter multiple words, hit space, and then break up and convert the sentence all at once. It is not difficult, you don't actually need a special keyboard, and I've never heard of anybody capable of using a keyboard using voice recognition because they found the act of entering in words laborious.

    4. Re:The thing with ASCII by Angst+Badger · · Score: 5, Insightful

      Funny you mention it, but the first thing I thought of was Japanese text entry, followed by the autocorrect/text-expansion facility that most word processors have, which is much the same thing applied to western languages. I've also thought it would be good to be able to make use of mathematical symbols for, you know, mathematics. The same could be said of word processor-like formatting for comments. I'm dubious about using it for actual code, but I'm open to having my mind changed about that.

      (Color-as-syntax has already been done in Chuck Moore's latest implementation of Forth. It's not a bad idea, though I suspect it works better with low-level languages like Forth than it would with a higher level language.)

      The second thing I thought of was what I always think when someone starts complaining about what languages should and shouldn't have, which is this: Quit bitching and go implement it, smart boy. Come up with something good, and I'll use it, but I am not about to run out and implement someone else's ideas. I have a day job where I get to do that all fucking day long, and they actually pay me. And contrary to popular belief, ideas are cheap and plentiful, including good ideas. The time, effort, and dedication that it takes to actually implement them are what's in short supply.

      --
      Proud member of the Weirdo-American community.
    5. Re:The thing with ASCII by Z34107 · · Score: 5, Funny

      Typing Japanese is exactly like typing in English - you press the "space" key between words. The IMEs are pretty smart, and usually the first kanji is the one you want. If it's not you might have to press "space" a second or third time, but it's rare to have to dig through a giant list of kanji to get what you want.

      So, you might have to hit the space key more often if you're typing Japanese. Or, you might not - you can space-to-kanji entire sentences at once, whilst the romance languages are stuck hitting space between every word like shmucks. Except for the Germans. I don't think their language uses spaces.

      The Japanese keyboard layout also types produces kana (most of which are romanized with two latin characters) rather than individual letters. Instead of typing w-a-t-a-s-h-i-space, you type wa-ta-shi-space.

      So, it's really not that bad. What's worse is the irony of seeing an article on slashdot complain about the persistence of ASCII. I mean, really now, slashdot.jp manages to display non-ASCII characters.

      --
      DATABASE WOW WOW
    6. Re:The thing with ASCII by Jurily · · Score: 5, Insightful

      If he really wants to go into creative writing, we might remind him that the 26 letters of the alphabet were good enough for Shakespeare.

      Exactly. Completely Missing The Point at its best.

      1. The idea behind modern programming is reducing complexity. That can't really be done by using symbols no other programmer has ever seen before.
      2. Most programming fonts go out of their way to make those symbols look distinct. You simply have to know if that's a zero or an upper-case O. Imagine trying to figure out if that there is a Greek upper-case Omega or a "Dentistry symbol light down and horizontal with wave" (taken from TFA).
      3. APL died for a reason.
      4. Author cites C++ operator overloading as a good thing. 'Nuff said.

    7. Re:The thing with ASCII by fyngyrz · · Score: 5, Insightful

      As a martial artist of many decades, I have learned to read Chinese. Both traditional characters and the nasty simplified ones. So I'm well aware of up side - the power, and even beauty, of high-speed recognition from a large symbol set.

      But writing Chinese through a keyboard or a GUI has many cautionary lessons for us here that transfer directly to the idea of a many-symbol programming language. Take Python, for instance. A beautiful language in almost every way; visually well structured, minimalist in its core tools, yet so well thought out that it is almost unlimited in what can be done with it.

      If you were, say, to create a symbol for each Python grammar atom, you'd soon have a symbol set equal to or surpassing that required for college in China... thousands of them. This takes your average Chinese person many years to learn, by the way -- and it's non-technical.

      Now, assuming you've learned these in the first place, and stipulating that somehow, you've made them as beautiful and intuitive as the language itself, how do you select these symbols when programming? Therein lies the rub, and as no one yet has come up with a good answer for Chinese, I suspect the idea desert is just as dry for Python, or any other language one might like to turn into a concise symbolic tool.

      Now, speech has very fast mapping (although you get into context a lot... for instance "ma" can mean quite a few different things) to Chinese symbols, and so one could reasonably assume that it could also have reasonably fast mapping to my hypothetical Python symbols, but speech recognition isn't ready for this yet; and a programmer speaking "Pythonese" into a microphone isn't going to be a very good cube-mate, either.

      In the meantime, I'm quite convinced that ASCII is an excellent character set for programming, and that UNICODE belongs inside quotes for use in input and output parsing, no more, no less.

      APL suffered from all of this. You needed a special keyboard, or a GUI or other mechanism to input the "simple" symbol. You had to learn the symbolic mapping. It really represents a huge extra load in aim of simplification. All of which is completely unnecessary if you simply use ASCII. And frankly... the time it takes me to type sin(x) is going to beat your mapped keyboard input time until you've been doing it for 50 years. In which time I will have leveraged my ASCII toolkit into innumerable languages, and your APL toolkit is still only enabling you to work in APL.

      So like I said... ASCII.

      --
      I've fallen off your lawn, and I can't get up.
  2. Project Gutenberg by symbolset · · Score: 5, Insightful

    Michael decided to use this huge amount of computer time to search the public domain books that were stored in our libraries, and to digitize these books. He also decided to store the electronic texts (eTexts) in the simplest way, using the plain text format called Plain Vanilla ASCII, so they can be read easily by any machine, operating system or software.

    - Marie Lebert

    Since its humble beginnings in 1971 Project Gutenberg has reproduced and distributed thousands of works to millions of people in - ultimately - billions of copies. They support ePub now and simple HTML, as well as robo-read audio files, but the one format that has been stable this whole time has been ASCII. It's also the format that is likely to survive the longest without change. Project Gutenberg texts can now be read on every e-reader, smartphone, tablet and PC.

    If you want to use Rich Text format, or XML, or PostScript or something else then fine - please do. But don't go trying to deprecate ASCII.

    --
    Help stamp out iliturcy.
    1. Re:Project Gutenberg by shutdown+-p+now · · Score: 5, Insightful

      If you want to use Rich Text format, or XML, or PostScript or something else then fine - please do. But don't go trying to deprecate ASCII.

      This is false dichotomy. Plain text can be non-ASCII, and ASCII doesn't necessarily imply plain text. All the formats you've listed allow to add either visual or semantic markup to text, whereas ASCII is simply a way to encode individual characters from a certain specific set. They do not propose to move to rich text for coding, but to move away from ASCII.

      There are still many reasonable arguments against it, but this isn't one of them.

    2. Re:Project Gutenberg by Netbrian · · Score: 5, Informative

      This is untrue.

      First off, Simplified and Traiditional characters are separated in Unicode.

      Second off, Cyrillic characters and Latin characters have always been considered two different scripts, while Chinese logographs are considered to be the same script, used in different contexts.

      See http://unicode.org/notes/tn26/.

      In any event, it would make good sense for programming environments to be able to handle Unicode source.

    3. Re:Project Gutenberg by pz · · Score: 5, Insightful

      When I was a young graduate student building my first experimental setup, a professor who was older and wiser than me suggested that data should be saved in ASCII whenever possible because space was relatively inexpensive and time is always scarce. Although I thought that a bit odd, I did follow his advice.

      The result? I can use almost any editor to read my data files from the very start of my career, closing in on 30 years ago. Just this past week, that was an important factor in salvaging some recently-collected data. In contrast, I can't always read the MS Word files -- an example of an extended character set -- from even a few years ago, and I sure as hell can't view them in almost any editor. Sure, with enough time, I can or could, figure out how to read them, but, as the wise professor rightly pointed out, time is scarce.

      Thus, compatibility is important, and the most compatible data and document format is human-readable plain ASCII.

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
  3. It ain't broke! by webbiedave · · Score: 5, Insightful

    Let's take our precious time on this planet to fix what's broken, not break what has clearly worked.

  4. Author seems to be high or something by Tridus · · Score: 5, Insightful
    He comes up with a bunch of ideas at the end that are out to lunch. Let's take a look:

    Unicode has the entire gamut of Greek letters, mathematical and technical symbols, brackets, brockets, sprockets, and weird and wonderful glyphs such as "Dentistry symbol light down and horizontal with wave" (0x23c7). Why do we still have to name variables OmegaZero when our computers now know how to render 0x03a9+0x2080 properly?

    Well, let's think. Possibly because nobody knows what 0x03a9+0x2080 does without looking it up, and nobody seeing the character it produces would know how to type said character again without looking it up? I know consulting a wall-sized "how to type X" chart is the first thing I want to do every 3 lines of code.

    While we are at it, have you noticed that screens are getting wider and wider these days, and that today's text processing programs have absolutely no problem with multiple columns, insert displays, and hanging enclosures being placed in that space? But programs are still decisively vertical, to the point of being horizontally challenged. Why can't we pull minor scopes and subroutines out in that right-hand space and thus make them supportive to the understanding of the main body of code?

    If you actually look at word processing programs, the document is also highly vertical. The horizontal stuff is stuff like notes, comments, revisions, and so on. Putting source code comments on the side might be a useful idea, but putting the code over there won't be unless the goal is to make it harder to read. (That said, widescreen monitors suck for programming.)

    And need I remind anybody that you cannot buy a monochrome screen anymore? Syntax-coloring editors are the default. Why not make color part of the syntax? Why not tell the compiler about protected code regions by putting them on a framed light gray background? Or provide hints about likely and unlikely code paths with a green or red background tint?

    So anybody who has some color-blindness (which is not a small number) can't understand your program? Or maybe we should make a red + do something different then a blue +? That's great once you do it six times, then it's just a mess. (Now if you want to have the code editor put protected regions on a framed light gray background, sure. But there's nothing wrong with sticking "protected" in front of it to define what it is.) It seems like he's trying to solve a problem that doesn't really exist by doing something that's a whole lot worse.

    --
    -- "So they told me that using the download page to download something was not something they anticipated." - Bill Gates
  5. Would it be less tedious to have 10,000+ keys? by Sycraft-fu · · Score: 5, Insightful

    Because that's what you find in JIS X 0213:2000. Even if you simplify it to just what is needed for basic literacy, you are talking 2000 characters. If you have that many characters your choices are either a lot of keys, a lot of modifier keys, or some kind of transliteration which is what it done now. There is just no way around this. You cannot have a language that is composed of a ton of glyphs but yet also have some extremely simple, small, entry system.

    You can have a simple system with few characters, like we do now, but you have to enter multiple ones to specify the glyph you want. You could have a direct entry system where one keypress is one glyph, but you'd need a massive amount of keys. You could have a system with a small number of keys and a ton of modifier keys, but then you have to remember what modifier, or modifier combination, gives what. There is no easy, small, direct system, there cannot be.

    Also, is it any more tedious than any Latin/Germanic language that only uses a small character set? While you may enter more characters than final glyphs, do you enter more characters than you would to express the same idea in French or English?

  6. Re:Learn2code by Noughmad · · Score: 5, Funny

    I don't know about you, but I have a pile-of-shit key on my keyboard, right between the left Ctrl and Alt.

    --
    PlusFive Slashdot reader for Android. Can post comments.
  7. Re:huh by ScrewMaster · · Score: 5, Insightful

    diagrammatic is simply a fucking pain in the ass.

    Amen.

    Every scientist I've ever met that had any experience writing code vastly prefers the C based LabWindows to the diagrammatic LabView

    Well, I'm not a scientist, just a humble software engineer, and back in my contract coding days I was always faced by managers that would try to push me to use LabView. They had this mistaken belief that because it was "visual" they could a. understand it and b. thought it was simpler and c. thought I should charge less if I used it.

    I told them that a. it's still programming, and beyond a certain level of complexity understanding still requires sufficient knowledge and b. refer to a. and c. if they were going to force me to waste time fighting such an environment up 'til the point where I found something critical that it couldn't do (such as run fast enough) and would end up re-coding the right way anyway, they damn well weren't going to pay me less.

    --
    The higher the technology, the sharper that two-edged sword.
  8. Re:The thing with ASCII [COBOL 2.0?] by Tablizer · · Score: 5, Interesting

    This proposal isn't about giving programmers more power to code, it's about making it easier for non-english speakers who aren't coders to read the code that their programmers write.

    COBOL was originally designed so that managers and customers could read it. But in practice they rarely did because programming logic is typically too low-level and requires knowing the technical context to understand by a non-programmer and/or non-team member anyhow. Being "English-like" or grammatically proper didn't really help that goal in practice. This is why the idea was abandoned in later languages.

    Perhaps it's comparable to legalese. Making it proper English doesn't necessarily improve readability by non-lawyers. It's still gibberish to most of us without a legal background.

    It's not worth-while to slow down production programmers in a trade for the rare case where non-programmers will want to read code for an actual need (not just curiosity). Thus, it's an uneconomical requirement as long as there is such a trade-off.

  9. Re:Yes, Unicode is "the new black" by icebraining · · Score: 5, Insightful

    I'm Portuguese and our language uses accents, but if I ever get a source code file with accents in variable names I'll insult the person. Writing with accents in programming serves absolutely no purpose and it only causes problems. It's slower (two key presses instead of one), it's less compatible, it can be troublesome if I need to send the code to someone without accents in the keyboard, etc.

    In fact, not only I disagree with accents in programming, but I prefer writing all the names in English. Where would OSS be if all the Gnome devs had to learn Spanish to contribute to De Icaza's code, or Finnish to contribute to Linux?