Mr. Pike, Tear Down This ASCII Wall!
theodp writes "To move forward with programming languages, argues Poul-Henning Kamp, we need to break free from the tyranny of ASCII. While Kamp admires programming language designers like the Father-of-Go Rob Pike, he simply can't forgive Pike for 'trying to cram an expressive syntax into the straitjacket of the 95 glyphs of ASCII when Unicode has been the new black for most of the past decade.' Kamp adds: 'For some reason computer people are so conservative that we still find it more uncompromisingly important for our source code to be compatible with a Teletype ASR-33 terminal and its 1963-vintage ASCII table than it is for us to be able to express our intentions clearly.' So, should the new Hello World look more like this?"
The thing with ASCII is that it's easy to write on standard keyboards, and does not require a specialized layout. Once someone can cram the necessary unicode symbols into a keyboard so that I don't have to remember arcane meta-codes or fiddle with pressing five different dead keys to get one symbol, I'm all for it.
I'm sorry, I only accept criticism in the form of sed expressions.
Michael decided to use this huge amount of computer time to search the public domain books that were stored in our libraries, and to digitize these books. He also decided to store the electronic texts (eTexts) in the simplest way, using the plain text format called Plain Vanilla ASCII, so they can be read easily by any machine, operating system or software.
- Marie Lebert
Since its humble beginnings in 1971 Project Gutenberg has reproduced and distributed thousands of works to millions of people in - ultimately - billions of copies. They support ePub now and simple HTML, as well as robo-read audio files, but the one format that has been stable this whole time has been ASCII. It's also the format that is likely to survive the longest without change. Project Gutenberg texts can now be read on every e-reader, smartphone, tablet and PC.
If you want to use Rich Text format, or XML, or PostScript or something else then fine - please do. But don't go trying to deprecate ASCII.
Help stamp out iliturcy.
so we should start coding in Chinese?
Seems easier to spell words with a small set of symbols than to learn a new symbol for every item in a huge set of terms.
It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
I can express my intentions just fine with ASCII. They have cunningly invented a system for that. It's called language and it comes in very handy. The only thing I would consider missing is a pile of shit-character. I could use that one right now.
Let's take our precious time on this planet to fix what's broken, not break what has clearly worked.
Programming languages usually have too much syntax and too much expressiveness, not too little. We don't need them to be even more cryptic and even more laden with hidden pitfalls for someone who is new, or imperfectly vigilant, or just makes a mistake.
If anything, programming needs to be less specific. Tell the system what you're trying to do and let the tools write the code and optimize it for your architecture.
We don't need longer character sets. We don't need more programming languages or more language features. We need more productive tools, software that adapts to multithreaded operation and GPU-like processors, tools that prevent mistakes and security bugs, and ways to express software behavior that are straightforward enough to actually be self-documenting or easily explained fully with short comments.
Focusing on improving programming languages is rearranging the deck chairs.
ASCII art is cool!
Unicode has the entire gamut of Greek letters, mathematical and technical symbols, brackets, brockets, sprockets, and weird and wonderful glyphs such as "Dentistry symbol light down and horizontal with wave" (0x23c7). Why do we still have to name variables OmegaZero when our computers now know how to render 0x03a9+0x2080 properly?
Well, let's think. Possibly because nobody knows what 0x03a9+0x2080 does without looking it up, and nobody seeing the character it produces would know how to type said character again without looking it up? I know consulting a wall-sized "how to type X" chart is the first thing I want to do every 3 lines of code.
While we are at it, have you noticed that screens are getting wider and wider these days, and that today's text processing programs have absolutely no problem with multiple columns, insert displays, and hanging enclosures being placed in that space? But programs are still decisively vertical, to the point of being horizontally challenged. Why can't we pull minor scopes and subroutines out in that right-hand space and thus make them supportive to the understanding of the main body of code?
If you actually look at word processing programs, the document is also highly vertical. The horizontal stuff is stuff like notes, comments, revisions, and so on. Putting source code comments on the side might be a useful idea, but putting the code over there won't be unless the goal is to make it harder to read. (That said, widescreen monitors suck for programming.)
And need I remind anybody that you cannot buy a monochrome screen anymore? Syntax-coloring editors are the default. Why not make color part of the syntax? Why not tell the compiler about protected code regions by putting them on a framed light gray background? Or provide hints about likely and unlikely code paths with a green or red background tint?
So anybody who has some color-blindness (which is not a small number) can't understand your program? Or maybe we should make a red + do something different then a blue +? That's great once you do it six times, then it's just a mess. (Now if you want to have the code editor put protected regions on a framed light gray background, sure. But there's nothing wrong with sticking "protected" in front of it to define what it is.) It seems like he's trying to solve a problem that doesn't really exist by doing something that's a whole lot worse.
-- "So they told me that using the download page to download something was not something they anticipated." - Bill Gates
Because that's what you find in JIS X 0213:2000. Even if you simplify it to just what is needed for basic literacy, you are talking 2000 characters. If you have that many characters your choices are either a lot of keys, a lot of modifier keys, or some kind of transliteration which is what it done now. There is just no way around this. You cannot have a language that is composed of a ton of glyphs but yet also have some extremely simple, small, entry system.
You can have a simple system with few characters, like we do now, but you have to enter multiple ones to specify the glyph you want. You could have a direct entry system where one keypress is one glyph, but you'd need a massive amount of keys. You could have a system with a small number of keys and a ton of modifier keys, but then you have to remember what modifier, or modifier combination, gives what. There is no easy, small, direct system, there cannot be.
Also, is it any more tedious than any Latin/Germanic language that only uses a small character set? While you may enter more characters than final glyphs, do you enter more characters than you would to express the same idea in French or English?
Unicode has been around for, what, over 15 years now? It's part of countless specifications from W3C and ISO. All modern OSes and DEs (Windows, OS X, KDE, Gnome) use one or another encoding of Unicode as the default representation for strings. No, it's not going away anytime soon.
And yet major vendors like Microsoft still get Unicode wrong. A couple of examples:
the point has been entirely missed, and blame placed on ASCII [correlation is not causation]. when you look at the early languages - FORTH, LISP, APL, and later even Awk and Perl, you have to remember that these languages were living in an era of vastly less memory. FORTH interpreters fit into 1k with room to spare for goodness sake! these languages tried desperately to save as much space and resources as possible, at the expense of readability.
it's therefore easy to place blame onto ASCII itself.
then you have compiled languages like c, c++, and interpreted ones like Python. these languages happily support unicode - but you look at free software applications written in those languages and they're still by and large kept to under 80 chars in length per line - why is that? it's because the simplest tools are not those moronic IDEs; the simplest programming tools for editing are straightfoward ASCII text editors: vi and (god help us) emacs. so by declaring that "Thou Shalt Use A Unicode Editor For This Language" you've just shot the chances of success of any such language stone dead: no self-respecting systems programmer is going to touch it.
not only that, but you also have the issue of international communication and collaboration. if the editor allows Kanji, Cyrillic, Chinese and Greek, contributors are quite likely to type comments in Kanji, Cyrillic, Chinese and Greek. the end-result is that every single damn programmer who wants to contribute must not only install Kanji, Cyrillic, Chinese and Greek unicode fonts, but also they must be able to read and understand Kanji, Cyrillic, Chinese and Greek. again: you've just destroyed the possibility of collaboration by terminating communication and understanding.
then, also, you have the issue of revision control, diffs and patches. by moving to unicode, git svn bazaar mercury and cvs all have to be updated to understand how to treat unicode files - which they can't (they'll treat it as binary) - in order to identify lines that are added or removed, rather than store the entire file on each revision. bear in mind that you've just doubled (or quadrupled, for UCS-4) the amount of space required to store the revisions in the revision control systems' back-end database, and bear in mind that git repositories such as linux2.6 are 650mb if you're lucky (and webkit 1gb) you have enough of a problem with space for big repositories as it is!
but before that, you have to update the unix diff command and the unix patch command to do likewise. then, you also have to update git-format-patch and the git-am commands to be able to create and mail patches in unicode format (not straight SMTP ASCII). then you also have to stop using standard xterm and standard console for development, and move to a Unicode-capable terminal, but you also have to update the unix commands "more" and "less" to be able to display unicode diffs.
there are good reasons why ASCII - the lowest common denominator - is used in programming languages: the development tools revolve around ASCII, the editors revolve around ASCII, the internationally-recognised language of choice (english) fits into ASCII. and, as said right at the beginning, the only reason why stupid obtuse symbols instead of straightforward words were picked was to cram as much into as little memory as possible. well, to some extent, as you can see with the development tools nightmare described above, it's still necessary to save space, making UNICODE a pretty stupid choice.
lastly it's worth mentioning python's easy readability and its bang-per-buck ratio. by designing the language properly, you can still get vast amounts of work done in a very compact space. unlike, for example java, which doesn't even have multiple inheritance for god's sake, and the usual development paradigm is through an IDE not a text editor. more space is wasted through fundamental limitations in the language and the "de-facto" GUI development environment than through any "blame" attached to ASCII.
I'm Portuguese and our language uses accents, but if I ever get a source code file with accents in variable names I'll insult the person. Writing with accents in programming serves absolutely no purpose and it only causes problems. It's slower (two key presses instead of one), it's less compatible, it can be troublesome if I need to send the code to someone without accents in the keyboard, etc.
In fact, not only I disagree with accents in programming, but I prefer writing all the names in English. Where would OSS be if all the Gnome devs had to learn Spanish to contribute to De Icaza's code, or Finnish to contribute to Linux?
Dilbert RSS feed
Visual programming isn't big for the same reason people talk and not use drawings to communicate in day to day life. A decent well explained and understood language is faster, universal and more convenient. Drawings are used in situations where you can't communicate true a spoken or written language. As a replacement tool. It's very basic since with a spoken or written language you can uniformly have so much more precise interpretation of your intentions. Same goes for visual programming at this moment in time. I won't say there isn't a future for it, but as a replacement tool for the tried and tested programming environments it has a long way to go. Come up with a visual programming system for writing actually sophisticated code and you might have yourself a winner. Only party that comes in mind is Labview from NI.
I'm truly saddened to see so many people took this article summary so literally. If you read TFA, it's actually a very bright, intelligent, humorous example of programming insight. I found it a very delightful read and I wholeheartedly felt that the article presented its thoughts lightheartedly and without expectation of seriousness. To hear all the commenters here, it's as if the article ran puppies over with a steamroller.
Please guys - I'm all for silly commentary. But read the article if you're going to pretend to write something clever. It's thoroughly tongue-in-cheek.