Mr. Pike, Tear Down This ASCII Wall!
theodp writes "To move forward with programming languages, argues Poul-Henning Kamp, we need to break free from the tyranny of ASCII. While Kamp admires programming language designers like the Father-of-Go Rob Pike, he simply can't forgive Pike for 'trying to cram an expressive syntax into the straitjacket of the 95 glyphs of ASCII when Unicode has been the new black for most of the past decade.' Kamp adds: 'For some reason computer people are so conservative that we still find it more uncompromisingly important for our source code to be compatible with a Teletype ASR-33 terminal and its 1963-vintage ASCII table than it is for us to be able to express our intentions clearly.' So, should the new Hello World look more like this?"
The thing with ASCII is that it's easy to write on standard keyboards, and does not require a specialized layout. Once someone can cram the necessary unicode symbols into a keyboard so that I don't have to remember arcane meta-codes or fiddle with pressing five different dead keys to get one symbol, I'm all for it.
I'm sorry, I only accept criticism in the form of sed expressions.
"Syntactic sugar causes cancer of the semicolon" - Alan Perlis.
Michael decided to use this huge amount of computer time to search the public domain books that were stored in our libraries, and to digitize these books. He also decided to store the electronic texts (eTexts) in the simplest way, using the plain text format called Plain Vanilla ASCII, so they can be read easily by any machine, operating system or software.
- Marie Lebert
Since its humble beginnings in 1971 Project Gutenberg has reproduced and distributed thousands of works to millions of people in - ultimately - billions of copies. They support ePub now and simple HTML, as well as robo-read audio files, but the one format that has been stable this whole time has been ASCII. It's also the format that is likely to survive the longest without change. Project Gutenberg texts can now be read on every e-reader, smartphone, tablet and PC.
If you want to use Rich Text format, or XML, or PostScript or something else then fine - please do. But don't go trying to deprecate ASCII.
Help stamp out iliturcy.
so we should start coding in Chinese?
Seems easier to spell words with a small set of symbols than to learn a new symbol for every item in a huge set of terms.
It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
I can express my intentions just fine with ASCII. They have cunningly invented a system for that. It's called language and it comes in very handy. The only thing I would consider missing is a pile of shit-character. I could use that one right now.
Yes, it's the next fad that just _everyone_ has to wear. this season. Within 5 years, it will be something else, and given the ability of major vendors like Microsoft to get Unicode _wrong_, it's not stable for mission critical applications. If you want your code to remain parseable and cross-platform compatible and stable in both large and small tools, write it in flat, 7-bit ASCII. You also get a significant performance benefit from avoiding the testing and decoding and localization and most especially the _testing_ costs for multiple regions.
Look up "microsoft unicode error" on Google for hundreds if not thousands of examples. ASCII for code is like flat text for email. It assures that you're not simply publishing coding spam, and actually wrote what you meant.
Everyone who tried to do something useful in APL, put up your hand.
It was called APL. It never really caught on all that well.
The World Wide Web is dying. Soon, we shall have only the Internet.
...the character set isn't the problem.
And I say this as an old APL coder.
(There aren't many new APL coders.)
-=Maggie Leber=-
And more than 10 years ago, in Bjarne Stroustrup's "Generalizing Overloading for C++2000". PDF can be donwloaded here:
www2.research.att.com/~bs/whitespace98.pdf
Pages 4-5 delve with this.
It was also a joke paper. Like I hope this article is.
So, what are his ideas?
How silly of us to be compiling to binary all this time!
We've been relegating ourselves to only two different options for decades!
I reckon that a memory cell and single bit of a processor opcode should have --at least-- 7000 different possibilities. Think of everything a computer could accomplish *then*!
Seriously, someone tell this guy you're allowed to use more than one character to represent a concept or action, and that these groups of characters represent things rather well.
Let's take our precious time on this planet to fix what's broken, not break what has clearly worked.
On serious note, this article reminded me of this project I saw the other day: http://github.com/ehamberg/vim-cute-python. It makes vim show various Unicode characters for Python keywords, such as "alpha" and "not".
Kinda neat :)
but fuck no.
I eagerly await comments saying how anglo-centric, racist, bigoted, culturally-imperialist the insistence of using ASCII is.
The nuanced indignation is salve for my frantic masturbation.
(If my post is the only one that mentions this, all the better)
the chinese have problems to learn his own language, because have all that signs, it make it unncesary complex.
26 letter lets you write anything, you dont need more letters, really. ask any novelist.
also, programming languages are something international, and not all keyboards have all keys, even keys like { or } are not on all keyboards, so tryiing to use funny characters like ñ would make programming for some people really hard.
all in all, this is not a very smart idea , imho
-Woof woof woof!
Yes, I want all the keys on my expensive LCD screen keyboard to look like it came straight from Fisher-Price just so I can do some programming.
:rolleyes: ascii character on this traditional keyboard...
Now, where's the
This is lame. If you can't program using just the keyboard in front of you, GTFO
Programming languages usually have too much syntax and too much expressiveness, not too little. We don't need them to be even more cryptic and even more laden with hidden pitfalls for someone who is new, or imperfectly vigilant, or just makes a mistake.
If anything, programming needs to be less specific. Tell the system what you're trying to do and let the tools write the code and optimize it for your architecture.
We don't need longer character sets. We don't need more programming languages or more language features. We need more productive tools, software that adapts to multithreaded operation and GPU-like processors, tools that prevent mistakes and security bugs, and ways to express software behavior that are straightforward enough to actually be self-documenting or easily explained fully with short comments.
Focusing on improving programming languages is rearranging the deck chairs.
We like economy and precision in programming languages. You may have many complaints about English, but it's pretty damn good common language due to its slutty tendency - it soaks in whatever useful from other languages.
In general, I don't want poetry in coding. I definitely don't want Egyptian glyphs or Chinese ideograms.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
I like how he mentions perl, but completed neglects to mention Perl6.
One of the most derided or most lauded features (depending on your POV) in perl6 is the copious use of additional syntax operators in the interests of further Huffman coding. There are certain operators (for example, the "hyper" operators that are defined in terms of unicode symbols ("") and use ASCII digraphs as an alternate form (">>").
So, it's there now in a mostly stable form... you can program in unicode-laced form all you like at this point.
Hire a Linux system administrator, systems engineer,
Because I don't want to have to own a 2000 key keyboard, or alternatively learn a shitload of special key combos to produce all sorts of symbols. The usefulness of ASCII, and just of the English/Germanic/Latin character set and Arabic numerals in general is that it is fairly small. You don't need many individual glyphs to represent what you are talking about. A normal 101 key keyboard is enough to type it out and have enough extra keys for controls that we need.
To see the real absurdity of it, apply the same logic to the numerals of the character set. Let's stop using Arabic numerals, let's use something more. Let's have special symbols to denote commonly used values (like 20, 25, 100, 1000). Let's have different number sets for different bases so that a 3 can be told what base its in just by the way it looks! ...
Or maybe not. Maybe we should stick with the Arabic numerals. There's a reason they are so widely used: The Indians/Arabs got it right. It is simple, direct, and we can represent any number we need easily. Combining them with simple character indicators like H to indicate hex works just fine for base as well.
You might notice that even languages that don't use the English/ASCII character set tend to use keyboards that use it. Japanese and Chinese enter transliterated expressions that the computer then interprets as glyphs. Doesn't have to be that way, they could different keyboards, some of them rather large depending on the character set being used, but they don't. It is easy and convenient to just use the smaller, widely used, character set.
Now none of this means that you can't use Unicode in code, that strings can't be stored using it, that programs can't display it. Indeed most programs these days can handle it, just fine. However to start coding in it? To try and design languages to interpret it? To make things more complex for their own sake? Why?
I am just trying to figure out what he thinks would be gained here. Also remembering that the programming languages, the compilers, would need to be changed at the low level. Compilers do not take ambiguity, if a command is going to change from a string of ASCII characters to a single unicode one, that has to be changed in the compiler, made clear in the language specs and so on.
Let's go back to APL!
ASCII art is cool!
I don't get it.
When coding, I already am annoyed by the placement of on my keyboard, on a key that I don't reach easily (good thing i don't do html, hehe). Using lots of symbols, that require me to do a two-key combination, slow me down.
Now I'm supposed to use Unicode? Is that guy insane?
How am I supposed to type out unicode expressions on my keyboard, without typing in the whole 4 digit number?
And if I want to address a unicode-named variable, but I forgot the magical number to make it appear.. then what? Copy paste?
Must be a joke then, right.
Sun's Fortress language allowed you to use real, LaTeX-formatted math as source code. They reasoned, correctly I think, that for the mathematically literate, this would make the programs far clearer. Google for Fortress Programming Language Tutorial.
While I agree that compatibility with ASR-33 should be tossed to the side, replacing ASCII alone isn't going to solve this problem. The article argues that language developers have had to squeeze reliable syntax out of a small character set, but this is a result of the problem, not the cause of it. Extensibility is the key. Where we are trapped is in syntax definition. As mentioned with C/C++ being unable to define custom operators. If a problem need be solved here (which, IMHO this isn't really a problem), then its solution is making every keyword and type user extensible. However, doing this sort of thing can and would have major repercussions across the business world. When types and basic math become a matter of contention things can get ugly really quick. We'd spend the first 5 years hoping that the market would pan out an interface library of common custom types.
But I digress. Sane localization of syntax via unicode isn't too horrible an idea. 1 to 1 translation of words should be fairly easy to implement without loss of meaning when re-localized. However, development is about logic, not necessarily math. While mathematics does define a whole slew of operators we don't have the option of typing in 1 character, typing/reading their names works well for logic.
Full Disclosure: While I develop in many languages my day to day development is done in Visual Studio, and I'm therefore one of those bastards that's at least a bit spoiled by Intellisense.
Fortress allows you to code in UTF-8. However it has a multi-char ASCII equivalent for every Unicode mathematical symbol that you can use, so there is a bijective map between the Unicode and ASCII versions of the source, and you can view/edit in either. That is the only acceptable way to advocate using Unicode anywhere in programming source other than string constants. Programming languages that use ASCII have done well over those that don't, for the same reason that Unicode has done well over binary formats.
Sure, strings and other items that can be seen on the screen would benefit from an expanded character set, but otherwise, why bother?
The only advantage I can think of is so that variable names, function names, and other user-defined non-display values can be in languages other than English or other Latin-letter languages. However, as English is currently the lingua franca of the technology world, encouraging fragmentation in this area is not a good idea.
Besides, nothing stops you from writing your code in Chinese or whatever other Unicode character set you want and using a preprocessor to convert it into ASCII before it hits the compiler or interpreter. The only "gotcha" is that there isn't a standardized way of doing the conversion, which can make it hard to link to binary libraries unless you use the same pre-processor.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Sorry, meant to say "for the same reason XML [not Unicode] has done well over binary formats".
Bullshit from the article:
OmegaZero is at least something everybody will recognize. And why would you name a variable like that anyway? It's programming, not math, use descriptive names.
Because we're not using the same IDE?
... WHAT? If you don't express your intentions clearly in a program it won't work!
vim does Unicode just fine. And from the Wikipedia entry on the author (http://en.wikipedia.org/wiki/Poul-Henning_Kamp):
Irony? Why does this guy come off as an idiot who got annoyed by VB in this article when he clearly should know better?
you're just trying to scare us ... right? ... right?
I really can't think of a lot of coding that would usefully be done with a more 'expressive' character set. The output of the code often has to be expressive but that isn't the same.
The most popular programming languages are Java, C, C++ http://langpop.com/ They aren't popular because they are easy to use. They are used because they are effective. The innovative languages are well down the list.
You can read many reasons why the more innovative languages are better; in theory. C is either the most popular or second most popular language. There's a reason for that. Theory be damned.
Perl 6 has guillemets in its standard syntax (equivalent to "<<" and ">>"). These are non-ASCII symbols. It will also be possible to declare new operators using whatever character you want (e.g. a snowman operator, see: http://perl6advent.wordpress.com/2009/12/17/day-17-making-snowmen/).
Perl 5.8 and above have native Unicode string and I/O support, per the first chapter of the most current rev of the Perl Cookbook, and you can use utf8 as well to write your scripts in Unicode.
We should all use trinary systems instead of binary!
From Idiocracy: Keyboard for hospital admissions
Wired: Wingdings of Disease
all you really need is two keys, 1 and 0
Unicode has the entire gamut of Greek letters, mathematical and technical symbols, brackets, brockets, sprockets, and weird and wonderful glyphs such as "Dentistry symbol light down and horizontal with wave" (0x23c7). Why do we still have to name variables OmegaZero when our computers now know how to render 0x03a9+0x2080 properly?
Well, let's think. Possibly because nobody knows what 0x03a9+0x2080 does without looking it up, and nobody seeing the character it produces would know how to type said character again without looking it up? I know consulting a wall-sized "how to type X" chart is the first thing I want to do every 3 lines of code.
While we are at it, have you noticed that screens are getting wider and wider these days, and that today's text processing programs have absolutely no problem with multiple columns, insert displays, and hanging enclosures being placed in that space? But programs are still decisively vertical, to the point of being horizontally challenged. Why can't we pull minor scopes and subroutines out in that right-hand space and thus make them supportive to the understanding of the main body of code?
If you actually look at word processing programs, the document is also highly vertical. The horizontal stuff is stuff like notes, comments, revisions, and so on. Putting source code comments on the side might be a useful idea, but putting the code over there won't be unless the goal is to make it harder to read. (That said, widescreen monitors suck for programming.)
And need I remind anybody that you cannot buy a monochrome screen anymore? Syntax-coloring editors are the default. Why not make color part of the syntax? Why not tell the compiler about protected code regions by putting them on a framed light gray background? Or provide hints about likely and unlikely code paths with a green or red background tint?
So anybody who has some color-blindness (which is not a small number) can't understand your program? Or maybe we should make a red + do something different then a blue +? That's great once you do it six times, then it's just a mess. (Now if you want to have the code editor put protected regions on a framed light gray background, sure. But there's nothing wrong with sticking "protected" in front of it to define what it is.) It seems like he's trying to solve a problem that doesn't really exist by doing something that's a whole lot worse.
-- "So they told me that using the download page to download something was not something they anticipated." - Bill Gates
I'm also wondering why we insist that:
The truth is that we programmers prefer to be able to type things quickly without having to memorize character codes for a variety of Unicode characters; we want to be able to type simple variable and function names using a standard set of glyphs and not have to worry about remembering which variation of a Chinese pictograph was used.
If it comes down to it, we could all just use Ook and not worry about language barriers (or getting much of anything done for that matter).
"Ubuntu" - an African word meaning "Slackware is too hard for me."
Now, what kind of marvelous and innovative language the author of the article will propose?
though this is just a programmer's dream, I always wished that we have a character solely for the purpose of escaping other characters. This will have a few benefits:
1. You won't need to escape this escape-character.
2. makes it easier for different languages to use the same way to escape stuffs. I won't need to worry about this string that gets escaped in SQL, ASP then JavaScript.
3. Having a new escaping character shouldn't impact the old code. It just gives the user another option.
Microsoft Visual C++ and C# allow Unicode identifiers; that is, variable and function names. Visual C++ allows this:
int meow()
{
int áéíóú = 1;
return áéíóú;
}
"Screw Sun, cross-platform will never work. Let's move on and steal the Java language." - Visual J++ Product Manager
Because that's what you find in JIS X 0213:2000. Even if you simplify it to just what is needed for basic literacy, you are talking 2000 characters. If you have that many characters your choices are either a lot of keys, a lot of modifier keys, or some kind of transliteration which is what it done now. There is just no way around this. You cannot have a language that is composed of a ton of glyphs but yet also have some extremely simple, small, entry system.
You can have a simple system with few characters, like we do now, but you have to enter multiple ones to specify the glyph you want. You could have a direct entry system where one keypress is one glyph, but you'd need a massive amount of keys. You could have a system with a small number of keys and a ton of modifier keys, but then you have to remember what modifier, or modifier combination, gives what. There is no easy, small, direct system, there cannot be.
Also, is it any more tedious than any Latin/Germanic language that only uses a small character set? While you may enter more characters than final glyphs, do you enter more characters than you would to express the same idea in French or English?
Unicode/UTF(8) compatibility as a base feature of the language - very good, I fight constantly with languages and code conversion because some dipshit didn't realize some people want to use mulitbyte strings. What's worse is people like Microsoft who assume they can just add crap to files to specify they contain multibyte strings (like their "BOM" for UTF8 - add that and you'll never read the file properly again in anything but Visual Studio).
Unicode/UTF(8) compatibility within the language (function names, variable names) - questionable, but it would be nice sometimes. Some languages already do this (I think I've seen it in ruby even?). You would make your code unreadable to someone who didn't ready your language but sometimes that could be a good thing, and hey worst case scenario run the code through a translator.
Unicode/UTF(8) is required to enter the language - NO. WHY WOULD YOU DO THIS?
You cannot attach a royalty payment to ASCII so clearly, in this enlightened age when even implementing public APIs risks copyright litigation, we need to move away from this dangerously socialist encoding. We need a new encoding so that the relevant "owners" of the "intellectual property" embodied in the computer language can bill appropriately for your level of use of their "artistic endeavours". Each ideogram should contain encoded information on the rights owner so that corporate publisher birth-rights can be honoured in perpetuity on a per-instance basis. Of course, such encoding should be preserved through compilation to enable the collection of royalty payments for each end-use of the system also. After all, it's only fair.
Patent litigation: A doctrine of Mutually Assured Destruction... in which everyone seems willing to push the button
About 15 years ago I worked in a Japanese office where the database had its own scripting language. The company that created the database had translated all the keywords into Japanese and made it so that it would display correctly, so IF --> , etc. Further, you could flip back and forth between English and Japanese versions easily and not have problems with the compiler. But not one of the Japanese programmers used the Japanese version. They thought it was just weird, and they'd already learned how to use IF in English anyway. I suspect using non-ASCII symbols is a solution without a problem.
In the words of Ozzy ... "This looser should just fuck off and stop this insane shit man.
What an idiot this kemp guy. He'd be better off and a lot happier as a gay transvestite.
Come on, What a pervert the guy.
So 'e wants the Bank of Britan to scrap the old and reliable PDPs. Fat chance pervert
boy Poul-Henning Kemp. Fuck off man! Why not get a jpb giving male-massuage at
the local parlor man. Oh, excuse me... Kenp thingy got not balls. There's the story."
Thompson invented UTF-8 and Pike and others implemented UTF-8 in Plan 9. I think your Unicode fetish owes a debt to Pike and his colleagues!
“Common sense is not so common.” — Voltaire
now we need to discuss if this is a good thing or a bad thing. I'm going to cast my vote for Bad Thing.
“Common sense is not so common.” — Voltaire
The only place i really want unicode is directly in the strings i type. When i type a wstring in C/C++ it'd be nice to type or copy-paste the Kanji directly in between my quotation marks rather than the unicode codes. The actual language keywords and standard library function can be in any language as long as my keyboard can type them.
I'll starting coding with Unicode when Americans can spell COLOUR correctly.
Well maybe it did for a couple decades after WW II, but ASCII brought it right back again.
Rule, Britannia! ASCII rules the waves...
the point has been entirely missed, and blame placed on ASCII [correlation is not causation]. when you look at the early languages - FORTH, LISP, APL, and later even Awk and Perl, you have to remember that these languages were living in an era of vastly less memory. FORTH interpreters fit into 1k with room to spare for goodness sake! these languages tried desperately to save as much space and resources as possible, at the expense of readability.
it's therefore easy to place blame onto ASCII itself.
then you have compiled languages like c, c++, and interpreted ones like Python. these languages happily support unicode - but you look at free software applications written in those languages and they're still by and large kept to under 80 chars in length per line - why is that? it's because the simplest tools are not those moronic IDEs; the simplest programming tools for editing are straightfoward ASCII text editors: vi and (god help us) emacs. so by declaring that "Thou Shalt Use A Unicode Editor For This Language" you've just shot the chances of success of any such language stone dead: no self-respecting systems programmer is going to touch it.
not only that, but you also have the issue of international communication and collaboration. if the editor allows Kanji, Cyrillic, Chinese and Greek, contributors are quite likely to type comments in Kanji, Cyrillic, Chinese and Greek. the end-result is that every single damn programmer who wants to contribute must not only install Kanji, Cyrillic, Chinese and Greek unicode fonts, but also they must be able to read and understand Kanji, Cyrillic, Chinese and Greek. again: you've just destroyed the possibility of collaboration by terminating communication and understanding.
then, also, you have the issue of revision control, diffs and patches. by moving to unicode, git svn bazaar mercury and cvs all have to be updated to understand how to treat unicode files - which they can't (they'll treat it as binary) - in order to identify lines that are added or removed, rather than store the entire file on each revision. bear in mind that you've just doubled (or quadrupled, for UCS-4) the amount of space required to store the revisions in the revision control systems' back-end database, and bear in mind that git repositories such as linux2.6 are 650mb if you're lucky (and webkit 1gb) you have enough of a problem with space for big repositories as it is!
but before that, you have to update the unix diff command and the unix patch command to do likewise. then, you also have to update git-format-patch and the git-am commands to be able to create and mail patches in unicode format (not straight SMTP ASCII). then you also have to stop using standard xterm and standard console for development, and move to a Unicode-capable terminal, but you also have to update the unix commands "more" and "less" to be able to display unicode diffs.
there are good reasons why ASCII - the lowest common denominator - is used in programming languages: the development tools revolve around ASCII, the editors revolve around ASCII, the internationally-recognised language of choice (english) fits into ASCII. and, as said right at the beginning, the only reason why stupid obtuse symbols instead of straightforward words were picked was to cram as much into as little memory as possible. well, to some extent, as you can see with the development tools nightmare described above, it's still necessary to save space, making UNICODE a pretty stupid choice.
lastly it's worth mentioning python's easy readability and its bang-per-buck ratio. by designing the language properly, you can still get vast amounts of work done in a very compact space. unlike, for example java, which doesn't even have multiple inheritance for god's sake, and the usual development paradigm is through an IDE not a text editor. more space is wasted through fundamental limitations in the language and the "de-facto" GUI development environment than through any "blame" attached to ASCII.
The big problem here is that language designers want their languages to get used.
There is a difference between telling your users that they *can* use unicode and telling them that they *have to*. Every language I can think of that said you *had to* use non-ASCII characters is dead: APL, Algol. I don't know about the detailed reasons why nobody actually codes in Algol (maybe just because it was mainly meant as a language for describing algorithms, not for writing practical programs), but APL's absurdly inconvenient character set was surely a reason that it expanded to fill a tiny niche and then quickly died even in that niche.
*Allowing* programmers to use non-ASCII characters is a lot more reasonable, but this is not exactly the world's biggest innovation. Perl allows you to use unicode characters inside string literals, but it also allows you to use, e.g., Chinese characters as names of variables. Is this a good thing? I guess so, in the sense that choice is good. But what happens when someone who doesn't speak Chinese wants to maintain code that uses Chinese variable names? Sure, we shouldn't be cultural chauvinists, but realistically, every literate Chinese person can recognize the letters of the Latin alphabet, whereas the converse isn't true -- coders in New York or Mumbai can't read Chinese characters.
There is also a nontrivial issue of look-alike characters, which could be a source of errors. For example, do I really want to be able to have one variable named Y (upper-case Latin Y) and another named Y (upper-case Greek upsilon)?
Find free books.
Any programming language expands until every available set of brace characters is valid in every context.
() {} []
take C#... say you have an indentifier 'x', x() is for method calls, x[] is for indexing, {} is reserved for code blocks, and x is for generics.
I think unicode would be nice for non-english native developers to use indentifiers in their native language, but would lead to an explodion of operators and braces, neighter of which would help readablility of code.
You could define a language with compd braces, just as C derived languages have += == !=, etc. you could define combo braces, f vs f vs f could each represent different things.
But it'll all boil down to invoking a method with some paramenters, it's all syntax sugar, just like x[n] to access an indexed item could be x.Lookup(n)
XML is interesting as something written with XML basically has an unlimited set of braces, "" allowing virtually infinite ways to expand the definition of objects. however, XML would make a very painful base for a programming language.
The article borders on the ridiculous. Colour coding blocks of code to mark them private? Yeah, that is much more readable than, say using a sequence of pre-historic ASCII characters like 'private'.
Nothing wrong with some food for thought and the article certainly gives some. I believe languages can be more verbose as typing is no longer a slow process over a TTY, and source code size is no longer an issue. This does not require new characters, just more actual words.
Laziness is good. Why would I waste time and effort on something that doesn't matter in the slightest, when I could instead do something useful? Or for that matter, do nothing?
-- "So they told me that using the download page to download something was not something they anticipated." - Bill Gates
The thing is, we need to get rid of programming applications altogether. With proper adaptive systems, one should be able to tell the computer what to do, and not worry about the details of how to do it. That is work I started on when at Brooks Automation about 10 years ago (in the division now part of Applied Materials). At an internal developers' conference I once said that my job was to make my job obsolete by the time I retired. Unfortunately, I got RIF'd five years after that, about 10 years before I would be ready to retire... :-)
Sometimes, real fast is almost as good as real-time.
Give me a keyboard with the symbols in question on it directly and I'll agree with him. But if I've got to remember arcane multi-key combinations for symbols not printed on keycaps or immediately obvious from what's printed (eg. dead keys for accents and such), or if I've got to remember 3-digit codes for characters, then it's a no-go and I'll stick to what's on the keyboard.
COBOL was originally designed so that managers and customers could read it. But in practice they rarely did because programming logic is typically too low-level and requires knowing the technical context to understand by a non-programmer and/or non-team member anyhow. Being "English-like" or grammatically proper didn't really help that goal in practice. This is why the idea was abandoned in later languages.
Perhaps it's comparable to legalese. Making it proper English doesn't necessarily improve readability by non-lawyers. It's still gibberish to most of us without a legal background.
It's not worth-while to slow down production programmers in a trade for the rare case where non-programmers will want to read code for an actual need (not just curiosity). Thus, it's an uneconomical requirement as long as there is such a trade-off.
Table-ized A.I.
It would be nice to have new symbols for some programming functions.
For example, there are assembly language mnemonics for things like a 8/16/32 bit rotate left while moving the top bit to the bottom.
However, they are difficult to express in higher level languages, and the compiler might not code it efficiently depending on the compiler and underlying CPU.
When I use this instruction to create a shift register, I can code it easier in assembler than in C.
Sure; not knocking laziness!
I'm just saying with one HARD thing to do, which needs to be mastered before you count on it,versus an easy thing you do all the time...that's not being conservative.
--- For a good time mail uce@ftc.gov
Grep on ascii is more than 100x faster for complex string expressions. THere's a lot of good reasons not to use unicode.
Some drink at the fountain of knowledge. Others just gargle.
Well I don't think anyone here has much of an issue with writing their source code in ASCII - as it's been pointed out ASCII is simple, well understood, sufficient for our current languages and extremely portable.
But what about comments? What I'd like to get my hands on is an editor that:
1) Understands utf-8 source code (so we can get nice characters in comments)
2) Allows diagrams to be embedded in source code as comments. ASCII may be fine for code, but it sure sucks for diagrams.
Does such a thing exist?
Source code is chock full of inherent structure. Why confine ourselves to flat text that has to be parsed? If we're going to invent yet another new programming language that forces us to throw out all our old code, then we may as well go for broke. Make it some binary format that encapsulates all the structure, work with using an IDE that understands the format and represents it visually. We don't even to all agree on the visualization.
We don't need yet another new programming language. Let's just pick an existing language and fix its flaws.
Restricting digital storage to ones and zeros is needlessly polarizing and limiting. Why not allow a 0.5 bit value?
Why should code be tied to text only anyway? I know there have been some experiments that never really took off, but even if we could expand programs to more than simple text just for comments that would be a huge help. A diagram or picture can often more accurately, and quickly, convey how a piece of code should work than a long piece of text. It would also be nice if we could reference non-code files from a code file. How about linking a class or method to a specification document (or part of it)? It would also be nice if you were alerted to check correctness of the linking code if the relevant section of the specification document changed.
We currently write source code as the compiler is the only consumer of the file that matters and that humans are some inconvenient aspect that we begrudgingly make the code accessible to. Thinking of people as first class consumers of source code may have a significant impact on programming.
The problem is I/O that (still) isn't 8-bit clean and the setlocale, wchar bullshit in the C std library (compare with Plan-9).
We can talk about using non-ascii glyphs for syntax when we can easily and reliably display UTF-8 everywhere.
This seems like one of the least important issues about today's programming languages. Is anyone having problems because their source code uses ASCII? The guy even suggests making color a part of the language syntax, such as marking protected regions with gray frames. The problems with these ideas (which are not original) are almost an entire article themselves. An amusing Sunday night article, but no thanks.
If a character set from the 60's is the only legacy standard we carry forwards in programming, we're doing pretty good. Look at how axle length of Roman chariots has dominated transportation systems -- http://www.associatedcontent.com/article/390903/how_the_romans_influenced_the_space.html
Weapons of Mass Analysis
I tell you what, Poul-Henning Kamp... when you can write your argument concisely and clearly in Unicode symbols instead of English language using "plain ASCII text" I'll consider it.
Hypocrite.
- For the complete works of Shakespeare: cat
You've not seen Hello World until you've seen it in the original Klingon!
The original post completely misses the point.
Sure, we could replace "if" with a cute icon representing puzzlement and "for" with some kind of circular arrow - but that wouldn't change the nature of the programming language. ASCII permits us to generate a nearly infinite number of "symbols" - we call them "words" - adding more symbols into the character set would do very little for the actual nature of the language we're working in.
What makes programming hard is not the typing of the actual characters - but the logical thinking behind those characters. Most programmers can easily type faster than they can think (and those who claim to be able to think more quickly than they type are to be avoided since they are most likely thinking shallowly and turning out poor code).
Speeding up the entry of the "symbols" by replacing words with icons or new characters simply doesn't help.
I also have my doubts that it would speed our code entry anyway. Modern keyboards are about the perfect size for two hands. They allow input at the maximum possible bandwidth by allowing all ten fingers to reach as many symbols as possible with reasonable motion distance. If you have more symbols in your character set - then you need more multi-key operations - and you don't gain bandwidth. Picking symbols with a mouse is also ineffectual - it's like typing on an on-screen keyboard - and we know how much that sucks compared to the real thing.
Finally, this is far from a new idea. The language APL uses a wild profusion of non-ASCII characters...an that's the single feature that can be blamed for it's failure to become more popular.
What if, as suggested partially in posts above, we display source code using Unicode, but allow editing it in ASCII?
I have used APL on a keyboard manufactured specifically for that purpose (IBM, in the 1980s, on a 3277 terminal). While the language was terse, it was comprehensible. Where it failed was that I had to use a special terminal to edit the code. If I wasn't on that terminal, I was effectively locked out. That wasn't good. Worse, in my opinion, was the time I spent hunting for the right key to press to entry a particular character - sure, I learnt the frequent characters very quickly, but the less frequent demoted me to a hunt-and-peck typist.
What I suggest is that we use Unicode to represent our code on display, but we enter it using the keyboards we have (or special ones, if we have them). Let me type for right arrow, and so forth. Allow HTML or XML type shortcuts for more obscure characters - let me type if I need a left up arrow - don't make me type a meaningless sequence like \u123 (but allow it if I happen to know it).
The idea that the source code I see and the source code I enter have to be the same is old-fashioned. I edit source code on a specialised editor that barely uses the resources of the PC or Mac it is running on. The cost of parsing code back and forth between reading form and editing form would be minimal - consider that many source code editors provide instant source code error detection already - that requires parsing of the code on the fly.
All up, I think altering the paradigm between what I type and what I see is an appropriate solution to this problem. What do you think?
I don't ever want to be stuck maintaining a system written by some dork who thought it was a great idea to write crucial components in Unicode Ogham runes.
September 2011: Looking for Cocoa/iOS work in Boston area Cocoa Programmer Quincy, MA
Unicode has the entire gamut of Greek letters, mathematical and technical symbols, brackets, brockets, sprockets, and weird and wonderful glyphs such as "Dentistry symbol light down and horizontal with wave" (0x23c7). Why do we still have to name variables OmegaZero when our computers now know how to render 0x03a9+0x2080 properly?
The go spec is defined in terms of unicode, and specifically gives non-ascii characters as example identifiers. Go source code is defined to be UTF-8.
...a language full of symbols to represent lots of different stuff: perl a shame it's write only
It's like a geek version of "What Not To Wear." Only code.
The reason we have so many human languages is that for most of human history, people couldn't communicate with others who lived more than a few miles away. That problem has been solved, so eventually we'll have one language that everyone speaks.
But the reason we have so many progamming languages is that each one represents a different set of tradeoffs between expressiveness, efficiency, portability, high- or low-level constructs, etc.
We have so many programming languages for the same reason that a woodworker has so many different tools: they are each useful for different things. Sure, you might be able to use a generic chisel in place of several other more specialized tools, but its not *optimal* for any of the tasks that those specialized tools are designed for. And thus it is with programming languages. C/C++ are good for low-level apps, Java for big bloated enterprise apps, Python or Ruby for clever apps that need to be written in a hurry and don't have to be very efficient, and so on.
If you think ASCII is a straightjacket, you're not going to break out of it merely by moving to a larger character set. You have to grow beyond character-based, text-based programming. The way you do that is with a GUI IDE.
I could easily point to the old CanDo programming environment on Amiga, or to Smalltalk (including Squeak), or Hypercard, or various visual GUI programming tools starting with Apple's and moving forward from there. The point being. . . All of them included ASCII-based program code, but they supplanted it to varying degrees with GUI-based structure. In the more advanced examples (such as CanDo), you could create simple-but-useful programs using only the mouse, whereas typing code was required only for advanced features.
I'm disappointed, actually, by how visual programming has stagnated. I blame the cult of Unix/Linux to some degree. The whole OS and all its tools and standards are based on ASCII text, and it's very hard for coders to get out of that mindset after growing up with it. The internet too, which was built on a foundation of Unix and HTML, is a pretty backwards place when it comes to GUI operation. Large parts of it still need to catch up with the late 1980s, to say nothing of the 21st Century.
Eh, they sell stickers you can stick on your keys.
Are you adequate?
And, yes, me too: I wrote this in vi(1), which is why the article does not have all the fancy Unicode glyphs in the first place.
Excuse me - vim can handle utf-8 just fine. utf-8 file names and utf-8 content. on a vanilla slackware 13.1.
http://www.cl.cam.ac.uk/~mgk25/unicode.html#apps [cam.ac.uk]
# Vim (the popular clone of the classic vi editor) supports UTF-8 with wide characters and up to two combining characters starting from version 6.0.
# Emacs has quite good basic UTF-8 support starting from version 21.3. Emacs 23 changed the internal encoding to UTF-8.
And svn can handle utf-8 as well - http://svnbook.red-bean.com/en/1.4/svn.advanced.l10n.html [red-bean.com].
The repository stores all paths, filenames, and log messages in Unicode, encoded as UTF-8.
All it requires is ... set your locale and lang. "export LANG=en_DK.utf8" in "/etc/profile.d/lang.sh" (Slackware 13.1) and add some better fonts maybe.
I apologize for repeating myself. I've written the same thing further down already in reply to another user's post. But I just read tfa and felt the need to reply to the author of tfa.
I agree. I use them in everyday writing, but never in programming!
Dilbert RSS feed
This kind of thinking makes all the sense of the US trying to force countries to go back to the old standards of measure. While some in the US would be happy with that most of us can see how straight up stupid this is. For those who refuse to adapt to ASCII? Fuck 'em. We don't need them. Standards have made societies thrive for thousands of years.
The core programming language is still mainly ASCII constrained. However, mathematical and logical expressions can be written in TeX-style, publishable format. Makes for easy to read functions and expressions.
I like Lisp a lot (well, elisp anyway and scheme which is really where I've had a lot of exposure).
But when you reduce typing, the problem is that you quickly develop a DSL - Domain Specific Language. That's great for you, as long as you really understand the domain well. But almost never is someone else's abstraction of a domain the same as your own so it's hell to maintain. And if you didn't understand the domain well you can end up with a DSL that is a poor way to express what needs to get done.
Mainstream languages stay mainstream exactly because they impose a certain level of impediment to so easily expressing yourself, that others can get confused with what you meant to say...
"There is more worth loving than we have strength to love." - Brian Jay Stanley
I worked for a Canada-based company and one of the magazines in the break room was Forces Quebec. It was something about packaging technology and had the articles written in both English and French, as is standard in Canada.
The bilingual nature isn't what caught my eye, though. What caught my eye was the fact that the typeface for the French articles was just plain smaller in order to fit more text in a certain space. It looked to me like the same page real estate was dedicated to each language, but the typeface for the French text was set to a smaller point size with tight kerning and spacing.
No wonder French people talk so fast. They have to!
In fact, when I mentioned the same thing to one of my coworkers, a Mexico native, he wasn't surprised at all. He said the same is true for Spanish as well.
When he told me that, I remembered Cheech Marin's "Born in East L.A." where he sings about being deported to Mexico despite being a US citizen "Next thing I know I'm in a foreign land. People talkin so fast I could not understand."
In a world of the blind, the one-eyed man is king--and the two-eyed man is a heretic.
You don't need a glyph for "=>" for instance. Anyone who knows what = and > mean individually can discern the meaning.
=> != >=
I don't believe in time. It's a grand conspiracy designed to sell watches.
Using full Unicode for programming causes lots of problems; even string equality is a tricky proposition for Unicode, let alone precise parsing. Most people don't even know how to enter Unicode characters not found in their own language. And once you allow Unicode, people will do things like they did in APL.
The only place Unicode should be allowed--if at all--is in comments. Everything else should be in ASCII.
n/t
In a world of the blind, the one-eyed man is king--and the two-eyed man is a heretic.
If you can write programs with just 8 characters, there is NO NEED to go beyond the base ASCII set.
Browsing at +1 - no ACs, I ignore their posts. So refreshing!
I've always wondered why nobody made compilers to write code in non-english languages. Are we ever going to see a Hindi version of BASIC?
I wouldn't consider Mr. Pike an authority on programming language design. At Google, he's known for designing Sawzall (described here: http://static.googleusercontent.com/externIal_content/untrusted_dlcp/research.google.com/en/us/archive/sawzall-sciprog.pdf) - a language that's so feature poor, esoteric, and ass-backwards, that Google engineers curse at length every time they have to use it. And use it they have, since it's darn near impossible, for various reasons, to do certain things without it. Try as I may, I don't see anything in Go that would make it better than half a dozen existing alternatives. It's like reinventing the bicycle again, but this time with square wheels and without the saddle. Yes, you guessed it right, that's where that pipe goes on this particular bicycle.
... it should be good enough for anyone. Just sayin'...
garethw
Is that a standard keyboard only has ascii, or at least not much else.
and the human mind is unlikely to cope well by adding more characters.
Using a small set of the ASCII characters is likely the only way to make a language that anyone can program in efficiently.
You guys realize that Rob invented unicode, don't you?
This is how I became so completely wapuro-baka...
For reference: "Wapuro Baka" is a phrase that means 'Word Processor Stupid'. This describes someone who can write in Japanese on a computer by typing the Romaji (English letter) sound-only syllables, but cannot write the harder meaning words by hand because they do not know the symbols in detail.
If the only way you can accept an assertion is by faith, then you are conceding that it can't be taken on its own merits
... for either Han or Sanskrit characters in programming languages.
The way the s/w development market is going, ASCII support for Latin character sets is becoming pointless.
Have gnu, will travel.
Ok, so everyone agrees this is a stupid idea... but are there ANY pros? I just don't understand the premiss at all...
C++ already supports Unicode for identifiers and in comments. What more do you expect? If the operators and keywords were supposed to support Unicode wouldn't the the programming language be encumbered with many different translations for each code set (or language)? I think identifier support is plenty. Let's not make it harder to develop and maintain correct apps, it's already hard enough.
I can see value in characters that improve readability, or that appear often enough that their absence is a nuisance.
Left-arrow for assignment, so that the equal sign can be reserved for comparisons.
Single characters for .NE. .LE. .GE.
Floor and ceiling symbols
A "degree" symbol
An upward arrow for exponentiation, so that caret always means xor.
Something new to make the declaration and use of pointers clearer, the way C does it is just too confusing.
But these are just my pet peeves; I'd be surprised to see many people agreeing with me.
Contribute to civilization: ari.aynrand.org/donate
A natural language isn't one spoken by humans, it is one that came about naturally. It grew up from usage by humans, the rules formed from long convention, and one that is living, changing. If you invent a language for a special purpose, be it computer programming or clear communication, it isn't a natural language. Also you'll notice that nobody is going around speak Lojban. It hasn't taken off, at all, in society.
For that matter even if it did, it probably wouldn't work as a programming language. Programming must be unambiguous in a way that is just hard for most humans to understand. Everything must be precise, everything must be spelled out (and done so correctly). Trying to construct a spoken language like that would be a waste because people would never want to use it. Humans can get multiple levels of meaning, analogies, metaphors, and so on that computers can't handle. Useful in human communication though.
It was called APL.
This guy is typical of the modern generation of programmers who think software should be made for _people_ to read !
Pity the dude trying to write a compiler to interpret his (or worse, my) pictures...
Software is primarily made for computers to read, it is the job of a programmer to translate real world problems into the machine world.
Damn idiot money men think that using high level languages so idiots can write software is going to lead somewhere other than lots of idiotic software...
GET OFF MY LAWN !!!
OR HE MIGHT HAVE NOTICED THE LACK OF A CHUNK OF THE ASCII TABLE. (No, it's not like yelling, it's like an ASR-33 you insensitive clod! *sigh*)
One line blog. I hear that they're called Twitters now.
I'm all for languages which allow Unicode characters in their source (Unicode strings, Unicode comments). That simply makes it easier for foreign developers and foreign language strings. Luckily, most modern languages (including Go) do allow this.
But Unicode syntax is a nightmare to type. It should be perfectly possible for me to type an entire program using only the symbols I see on my standard US keyboard.
This has come up in the context of domain names, where a long, painful set of rules has been devised to try to prevent having two domain names which look similar but are different to DNS. If exact equality of text matters, it's helpful to have a limited character set for identifiers.
There's currently a debate underway on Wikipedia over whether user names with unusual characters should be allowed. This isn't a language question; the issue is willful obfuscation by users who choose names with hard-to-type characters.
As for having more operators, it's probably not worth it. It's been tried; both MIT and Stanford had, at one time, custom character sets, with most of the standard mathematical operators on the keys. This never caught on. In fact, operator overloading is usually a lose. Python ran into this. "+" was overloaded for concatenation. Then somebody decided that "*" should be overloaded, so that "a" + "a" was equivalent to 2*"a". The result is thus "aa". This leads to results like 2*"10" being "1010". The big mistake was defining a mixed-mode overload.
In C++, mixed-mode overloads are fully supported by the template system and a nightmare when reading code.
In Mathematica, the standard representation for math uses long names for functions, completely avoiding the macho terseness the math community has historically embraced.
Key calculations for the design of the first implosion-type atomic bomb, which involved solving nonlinear three-dimensional differential equations to make sure the little booms that caused the big boom reached the core at the same time were solved by punching octal code into paper tape and running it on a mechanical computer.
we only used 1's and 0's and we got along fine.
when i use inkscape (SVG program) and i want to pick a color for my square, so my square looks nice and pretty, i have 4 color choosing tools to use, 3 that let me select the color based on some color rules and sliding bars, and a color picker that lets me select a color that i've used on some other part of my drawing. i could also manually enter a symbol/value representing the color on 2 or 3 of the color tools. the document saves the color information in XML/SVG format, and i can find it with the raw-xml editor in inkscape and change the color value from there. i can also write my own SVG/XML in the xml tool in inkscape.
SVG is a language for making graphics, almost a DSL (it's in XML, so it can't be a pure DSL). the SVG language is pretty complex and is pretty hard to write by hand. it's a bit hard to read and edit without a good xml editor.
anyway, an individual color is a value, it is represented by multiple values, for anything other than simple colors you use a tool to pick it. this is idiomatic, and has been for many years. sometimes this way of coding works very well. the best example i can think of is sikuli. in sikuli you program by taking screenshots and your program is like a state-machine where the input is detecting images that the programmer has extracted from their screenshots. the IDE shows thumbnails of the screenshots and text. it would be less effective to replace the thumbs with the pathnames of the image files they represent (which is how they are in the code files).
i think that most people's arguments are about separation of the programming language and the IDE, but i think there is something good to a language that has interfacing with an IDE, or general interfacing, in mind. I think there are benefits for having the language be a bit less human readable in order for it to better interface with things, like an IDE. examples: javadoc, java annotations. these are hacks, make code less readable, but are there just for other programs to use.
so, do we blame IDE makers for not giving us IDEs that help us to understand our code in the best way possible, or do we blame the language makers who don't include rich meta-code constructs in their languages?
why the hell do i have to write in HTML to get separated paragraphs for posting on this form?
I once wrote a programming language which was syntax free. For example, here is a program which calculates square roots using newton's method, written in japanese:
http://www.youtube.com/watch?v=vwgvVpCRecE
The types and function names are in Japanese, The variables are in english, but this needn't be the case.
For those of you who want to see more, this video shows me writing a calculator application:
http://www.youtube.com/watch?v=SSZBc2ohR2o
For more information, please see the following page:
https://sites.google.com/site/rathereasy/eastwest
#include <stdio.h>
int main(char** argc, int argv) {
int x=–5;
printf("%d\n", x);
}
The errors are fairly obvious if you compile, but it's not easy to see. Now you could tell me that C isn't designed to be written in Unicode, and you'd be right, but at least it's pretty clear which characters are wrong. A language designed for unicode would be even worse, since the characters wouldn't be illegal outright, and it might try to convert emdashes to - for a subtraction, etc.
Bad idea. Code is terse by design. Ever noticed how much harder it is to say precisely what you mean in, say, Applescript? Adding a character set of 100k is a terrible idea.
I have developed a truly marvelous proof of this comment, which this signature is too narrow to contain.
You don't need a glyph for "=>" for instance. Anyone who knows what = and > mean individually can discern the meaning.
If slashdot didn't eat it, you could be seeing ⇒ displaying that glyph here... ""
You can't take the sky from me...
Do you people even read before posting, Someone dares to posit that the emperor has no clothes and everyone begins throwing stones. The core of his statement is that: "Programming languages can be more information dense by not being confined to ASCII" And what does the community at large do? Begins stuffing an effigy and mounting it on his front lawn. I've got one for you; how about a helio centric solar system? Or a system of matter based on discrete atoms? Or how about a biological system based on cells with inheritance? Burn the witch! Burn it!
Still with me? How about a little use of the forebrain and a little less of the midbrain and moltov coctails?
Could a programming language be less visually and conceptually obtuse if the information density per character is increased? The answer is a so obviously yes that any naysayer must be racist. Yes, I said racist. Look at your keyboard, now look at a globe. Know what? Over 90 percent of that globe doesn't use that character set. Doesn't matter who you are. Look I even used percent instead of 0/0 because I don't know how to enter that character.
*sigh* Yah know, sometime it will be time to kill the golden calf and move on.
Poul-Henning Kamp seems to know his languages. Good for him.
But I take he's never worked in a financial institution where application programmers have to produce working code. Cursed are the idiots that write programs using glyphs outside the intersection of ASCII and EBCDIC sets. IMHO it is a sin to even put "fancy" characters even in comment.
Ever tried outsourcing code support to Hyderabad? How do you fancy your chances of being efficient or effective without limiting yourself to English in ASCII? Would you be pleased to see method and variable names in Urdu?
The god given language for programming is English, encoded in ASCII or in EBCDIC. Programming languages needn't support any other spoken language. To further emphasise this point, I speak as a non-native English speaker, not living in an English speaking country.
I hadn't the slightest objection to his spending his time planning massacres for the bourgeoisie... (P.G. Wodehouse)
I don't think it should be that difficult to translate BASIC, just as a teaching tool for non-English societies.
Say, takes something like http://www.freebasic.net/ , and change the string constants for FOR, WHILE, PRINT, etc. to something in your own language.
MS used to have localized Office Basic.
I'm not a lawyer, but I play one on the Internet. Blog
How about coming up with a new alphabet with just 10 characters, or fewer. That would make my keyboard smaller. Just so long as the letters U, K, C and F are included, I'll be able to express myself with my usual panache and I might even be able to find the keys on my phone.
Nullius in verba
ASCII will be with us forever, because it's enshrined in the bottom 7 bits of Unicode! U+0000 through U+007F are ASCII . The most common/useful encoding of Unicode is UTF8, which is backwards compatible with ASCII. Your ASCII data is Unicode, in UTF8 guise.
I'm truly saddened to see so many people took this article summary so literally. If you read TFA, it's actually a very bright, intelligent, humorous example of programming insight. I found it a very delightful read and I wholeheartedly felt that the article presented its thoughts lightheartedly and without expectation of seriousness. To hear all the commenters here, it's as if the article ran puppies over with a steamroller.
Please guys - I'm all for silly commentary. But read the article if you're going to pretend to write something clever. It's thoroughly tongue-in-cheek.
Or even Algol 68
If not, no way. .. all those underscore numbers meaning something weird and a new symbol for a new operation.
I Don't like math expressions
In a sense math vs program language syntax (yes i know there is some exceptions) is the same than for example written Finnish vs written Chinese.
We have a new mathematical operation, what to do? Invent a new symbol? No Fucking Way! NO!
You can assemble it to binary and then diassemble it to any mnemonic set you like.
There's no reason to be locked to an array of arrays of characters as the only code format. Program code is inherently tree-structured.
Recent Blizzard editors (WC3, SC2), they have a tree-based system for their code. I'm not sure if there are many other examples, but IMO this is the way forward.
Pluses:
- No compilation errors.
- Faster compilation since parsing is already done.
- No typos.
- Perfect code-completion / syntax-highlighting.
- No arguments about style guidelines because there's nothing but content. No whitespace etc. Style can be set from your end in your editor.
- Changing all copies of an identifier name is instant and flawless.
- Smaller files.
Downsides:
- Can't use regular text editors.
- Difficult to work backwards (can't write a variable before its declared).
- Catch 22 when there's no editors that deal with the format. Also, format wars.
- Included files need to be read by editor.
There'll be more pluses and minuses I'm sure.
There's nothing stopping anyone from making a C++ (or C++-esque) frontend like this.
There's nothing stopping these editors from being as quick to use as text editors. In fact, they should be way faster.
You agree with me.
Unicode programming?! I get pissed when programming languages include shift-accessed characters in their standard syntax. Like PHP using '->' instead of '.' . Unicode programming sounds about as irrational as the natural language programming ideals of COBOL.
slashdot.binspam.add(this);
Quick, concise, logical, objective... Too bad parenthesis and curly brackets are shifted...
The entire point of a programming language is to write something in a language sufficiently native to both the programmer and the compiler. Programming languages use the glyph set many humans are familiar with in order to provide a rigid framework in which a compiler can write machine language to lead the machine into performing the desired task(s).
If a larger character set is part of your native language, great. Use some version of your favorite programming language that gets along well with what you're trying to say. Really, this only applies to variable names as commands and other reserved words are quite finite and well defined.
Expanding the number of glyphs used to represent a command is a solution to a problem that doesn't exist. Do I really need to learn an otherwise meaningless language to express the command "cout" as a single character instead of a combination of 4 glyphs? Once you reach a certain number of glyphs (a couple hundred if I remember correctly, depending on your end instruction set / hardware arch) the advantage of using a compiler becomes more of an optimization question than a translation question. If some future generation generation is going to be forced to learn a glyph set with higher count than the machine itself, than why the hell should we have anything but machine language with code optimizers instead of compilers?
from the article:
Unicode has the entire gamut of Greek letters, mathematical and technical symbols, brackets, brockets, sprockets, and weird and wonderful glyphs such as "Dentistry symbol light down and horizontal with wave" (0x23c7). Why do we still have to name variables OmegaZero when our computers now know how to render 0x03a9+0x2080 properly?
uh yeah, Go allows you to use the full unicode just fine; on identifiers and everything - people should read the spec before making sensationalist comments that waste everyones time.
That the linked ASCII chart is SVG... which can render Unicode and is encoded in an explicitly Unicode medium...
No - I think we're already in the future, sorry PKH.
you had me at #!
When you need him.
And somebody else said something about 26 soldiers of lead conquering the world but the interwebs can't seem to decide who, or if. That's progress!
you had me at #!
Oh YES YES Mr. Poul-Henning Kamp can you buy me a space cadet keyboard where I have to press Alt-Shift-Ctrl-Meta-Cokebottle to get a Q?
He should get his head examined... If he says what he says he certainly isn't a programmer.. If you need more characters as what's possible with ASCII for creating your code you should really get back to basic or get out of the business altogether.. there is no need for even more characters to create your code..
For one, you have to be good at handwriting. Many people, like me, aren't. Computers do not have an easy time with character recognition. They are much better these days, but they still lack a good bit behind humans. So whatever trouble a person has with recognizing your writing, a computer will have more. Next, this is even more problematic when you have a language with lots of glyphs because so many are very similar. It can be real difficult for them to tell the difference, and many of the tricks they use that have made them better won't work for those kind of languages. Then there's the fact that you have to learn a new writing skill, looking at a screen that you aren't writing on. Difficult for people to do, to not look at the hand you are writing with. Decreases your penmanship further. Finally there's the fact that it is much slower. Even a fast scribe has nothing on a normal typist.
The current transliteration solution we have works well, and that is why it continues to be used. Has the other advantage that the typing skills you learn apply just as well to Western languages, you don't have to learn a new set of skills.
What about novice or occasional programmers? I've done a few things in Python over the years, and I've found that even with this fairly simple language, my knowledge of the syntax etc. leaks away with disuse. I recently had to write a program after about 1 year of not using Python, and I spent half the time relearning the language. Terse, non-English languages like C have a higher barrier to entry than Python because of this.
and 1400 different ways of spelling them . If there were 42 characters , it would vastly simplify english spelling .
Wikipedia claims that ASCII grew the backslash [\] specifically to support ALGOL's /\ and \/ Boolean operators. No source is provided for the claim. ftfa
Here's one of the two sources that Wikipedia cites, straight from the inventor of the backslash: HOW ASCII GOT ITS BACKSLASH citing his book [ R.W.Bemer, "A view of the history of the ISO character code", Honeywell Computer J. 6, No. 4, 274-286, 1972 ]
"I had called a joint meeting of IBM, SHARE, and GUIDE, to regularize the IBM 6-bit set to become the standard BCD Interchange Code [76]. Frequency studies of symbol occurrence had been prepared, particularly from ALGOL programs. The meeting of 1961 July 6 produced general agreement on a basic 60-64-character set, which included the two square brackets and the reverse slant, which was chosen in conjunction with "/" to yield 2-character representations for the AND and OR of early ALGOL. This is reflected in the set I proposed to ANSI X3.2 on 1961 September 18."
(Note: I had put the backslash in position 5/15. It enabled the ALGOL "and" to be "/\" and the "or" to be "\/".)
Apparently he also invented ten other ASCII codepoints (called himself the father of ASCII), timesharing, escape sequences, the Y2K bug, word processors... and COBOL.
The Norges are sure fond of their umlauts.
Funny given Rob Pike's involvement with the creation of UTF-8.
Sad that as has has become common, everyone and their dog want their pet feature in Go, totally missing the point of the language which is: a small and very carefully selected set of features that work well together and don't interfere with each other in unexpected ways.
Sad also that ken's involvement in the creation of both UTF-8 and Go goes unmentioned.
In any case, is there people out there have forgotten what a huge pain it was to program in APL?
There are reason why modern successors of APL, like K (which by the way is a super cool language) stick to ASCII: you can actually write code without going insane!
"When in doubt, use brute force." Ken Thompson
Seriously, it's all about 1's and 0's. Does it really matter what language and syntax abstraction you use to enter them?
If it ain't broken, why fix it? - And is ASCII broken? - No, it works just fine for expressing the programming languages we've used so far, and as far as I know they works just fine in solving all our algorithmic needs.
"For every complex problem, there is a solution that is simple, neat, and wrong." -- H.L. Mencken (1880-1956) --
Except for the Germans. I don't think their language uses spaces.
...richtig, wir Deutschen benutzen Tabs anstatt Spaces. ;)
Translates to: "That's right, Germans use tabs instead of spaces."
ASCII Wall what ASCII wall ? This ASCII is simple way to encode messages. What are the alternatives ? 32Bit Unicode ? No, surely not, thing about all the characters that look equal and are not. That will result in a confusion exploidable by spammer and skimmer. There is a reason why programming languages are resticted to ascii because it represents a workable set. Going beyond will mean nobody (except computers) will ever read the programms. Than you can go directly to binary. In short ascii solves more probles than it creats. it is a universal standard theses days, do not make the worls more complicate it already is.
One thing many people aren't aware of is that for several years now (since GCC3), GCC and G++ accept UTF-8 as their default input encoding, and internally store narrow and wide strings as UTF-8 and UTF-32, respectively. It's recoded to the output stream locale when you do any output. This means you can write your source code in Unicode (in strings and comments at least) and it all works perfectly. It has full support in the C and C++ standard libraries. I've been using it for years; it works perfectly. It would be nice to get support for UTF-8 symbols in the linker, so we can have UTF-8 variable names as well. The same applies to Perl, though perl6 even gives you the ability to have Unicode operators, and possibly variable names.
I do routinely use UTF-8 symbols in R (example: "deltaCt" can be replace with the actual Delta symbol [Slashdot ate the Unicode--seriously poor!]). It makes the code more readable, and entry isn't the massive issue people make it out to be. AltGr/compose keys handle the common symbols, and you can look up the few odd ones that aren't in the compose tables.
Having the ability to use Unicode does not in any way detract from the ability to use ASCII. Since ASCII is a strict Unicode subset, the ability to use Unicode imposes zero overhead on those who wish to stick with ASCII, so the extent of the hate seen for wanting a bit of progress is a bit shocking. People pointed out how unreadable code could be made, but the reality is that when used sensibly and judiciously, it can make code more concise and readable.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522776 for information about some of the issues.
Having native Unicode support end-to-end by default is still a goal we want to achieve; the ASCII C locale is the last holdout. Getting a UTF-8 C locale is the last remaining step, though it'll take a few years to get there.
Regarding editing Unicode sources, both Emacs and vim have pretty decent Unicode support, and Linux distributions have had unicode support for a decade now, and really good support for at least six years. Broken tools are no longer an excuse for not using Unicode.
Regards,
Roger
Sorry, this idea is utter bullshit.I'd like to see that guys face when he tries to include a library by some european programmer hand he doesn't even have the keys on his keyboard to spell out the function names!
Here, have some Umlauts: äüöß
Just in case you need to copy and paste them to include fahrvergnügen.h
bickerdyke
should have no bearing on my ability to expresse myself. If you are unable to make your intentions clear within the language system that makes up most of what our species reads and writes, it's you own fault, not that of the language. Have you considered that you're just not telented?
why, oh tell me why, when I write a simple - trivial - bit of Java code, do I need to write functions for getters and setters all over the place - dammit, just declare them as gettable and settable
Python is exactly like that.
There is a reason that elegant alphabets won out over heiroglyphics. I'm always amused at how many people wish to take us back.
"That's a simple one.
Bird. Man with spear. Sideways fish. Beetle. Vase.
It means, and this is just a rough translation,
'A man with a spear trapped a bird and a sideways fish in a vase.'
And there was also a beetle.
That's just one possible translation."
-- Teddy Roosevelt
Of course, the typical denizen of slashdot (such as myself) is of two minds on problems like this.
The geek recognizes the simplicity of keeping the existing solution, but also recognizes the inherent attraction of the difficult problem of going with the technilogical solution.
(Did I just say that?)
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
I have my own bug for this on the Red Hat Bugzilla, which made it blocker for me, but I wonder how somebody could write in the 21st century a groupware server which is capable of working only with windows-1251 charset.
Perhaps it's comparable to legalese. Making it proper English doesn't necessarily improve readability by non-lawyers. It's still gibberish to most of us without a legal background.
It's not worth-while to slow down production programmers in a trade for the rare case where non-programmers will want to read code for an actual need (not just curiosity). Thus, it's an uneconomical requirement as long as there is such a trade-off.
Agreed. There's these things called "documentation" and "specification" to communicate programmatic ideas where actual code cannot be applied: useful for both manager-types and new coders on a project.
Whoopee
ECMAScript has been composed of Unicode characters since at least ECMA-262-3. The first version of JavaScript was UCS-2, so any version can use the basic multilingual plane. And even IE-6 runs edition 3.
Do daemons dream of electric sleep()?
You mean it stinks and nobody likes it?
One example, using Japanese and the typical Romaji (latin) input method filters for the word "apple".
English, well, there it is: "apple". four different keys, one repeated stroke, and space or punctuation to delimit -- six strokes, and, with practice, you don't have to look at either the keyboard or the screen.
The Japanese word Latinizes to "ringo", and that is what you usually type when using the IME in Romaji mode. But (problem 1) there is no standard small set of delimiters. So you reach for a conversion (henkan) key, which is usually either the space bar or a small key next to a shrunken space bar.
Some methods have several henkan keys, depending on whether you want to just dump the conversion buffer (probably hiragana) or force to the other kana (probably to katakana) or get a list of candidate Kanji (Han characters). More often, there is only one (effective) conversion key that pops up a list of candidates, and the method assumes, for each candidate vocabulary element, a preferred conversion which it puts at the top of the list.
So most people will end up having to check the list of candidates and make sure the desired one is selected. If not, the conversion key is repeated to select the next in the list. (Cursor keys can be used to scroll through the list in many IMEs.)
Oh, and, incidentally, you need to be able to decide for yourself whether the current reference to "apple" is best represented with hiragana (for usual words in modern times), katakana (for foreign words, or emphasis, or to indicate that there is something special -- foreign? -- about this apple), or Kanji (in many cases, you may have an option on which Kanji). Or, maybe you actually want to use Romaji for some reason, although, in that case, you might have typed "appuru" instead). Options, options, options.
Does this sound like something you're going to have easy time of touch typing?
It gets worse. In order to improve efficiency, the method "learns" from the user and reorders the candidate list for you.
Professional data entry operators have professional input methods that do allow touch typing, but they take a lot of learning and training.
Oh, there is kana mode for the keyboard, where the 46 (erm, 50 plus or minus) kana are layed out on the qwerty keyboard, on the right-hand side of the keys (I ought to put a link to something here, but I won't.) If you noticed that we have just laid kana out where there are numbers, that is correct. I am trying to learn to touch type the kana keyboard, but there are several versions with minor variations between them, and, really, the prevailing common sense is to use Romaji mode.
In kana mode, "ri-n-go" is three keys, but you have to hit a modifier key after the "ko" to voice it (to "go"), so that's four keystrokes. Even if I do learn to touch-type in kana mode, it is not that much more efficient. I'm just being a typing geek trying to learn to do that.
The efficiencies are eseentially leveled by the conversion step.
Chinese is getting a phonetic conversion, but, historically, a stroke-radical input method has been preferred. Kanji (Hanji or Xanji when talking about Chinese) are constructed of moderately regular parts (radicals), but there are around 300 of those. That list of 300 is broken down for the keyboard. (Not unreasonable, most of the radicals are composites of simpler stroke sets.)
But you still end up de-parsing at the character and word level, where in Latin you mostly de-parse at the word level.
I'm not sure about China, but the typical Japanese attitude towards computer keyboards is that they would rather write and edit on paper and then type stuff in when they've got it fixed so that they can minimize typing. It takes considerable experience to get past the perceived inconveniences.
One of the reasons the English context worked well in developing computers was the paucity of characters. A small set of glyphs is generally an advantage, even at the cost of overloading the punctuation and such.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
That extra layer of parsing makes it much more difficult to touch type while looking at a source document, especially with the stock (ahem, Microsoft) IME.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
That's what the Unicode consortium wants you to believe.
(Yeah, I beg to differ. I work with this stuff. There are issues in ideographs that the Unicode Consortium is still either ignoring or not aware of.)
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
Japanese is typed using a more-or-less standard QWERTY keyboard.
Tediously.
I've seen my wife going at it (and Japanese people in all types of computerized businesses on Japan). They certainly don't have much of a problem. Kanji writing is a unique and complex task not easily amenable for typing on a keyboard. Hiragana and katagana are much more amenable and most Japanese know to write with "Romanji". The software they use simply finds the appropriate Kanji, Hiragana and Katagana after they type the corresponding Romanji.
So in essence, they are typing just as we do using a Roman alphabet with the software doing the translation automatically just as modern word processors automatically correct misspells for you (and in both cases, the software gets it right most of the time.)
This is more like a badly thought of solution looking for a problem no one practical wants to even touch with a 10-foot pole.
I really, really don't think so. Different tools for different jobs - a language for writing reliable infrastructure should look very very different from a language for exploration of datasets, for example - the first one must place emphasis on reliability and performance, the second on flexibility. Eg adding members to data structures on the fly is a great idea in the second case, but not in the first.
Sure you can try to sweep that under 'different paradigms', and indeed you could mix two arbitrary languages in the same file using some delimited blocks for example, and call it 'one language with different paradigms', but why would you want to? The convoluted multi-paradigm monstrosity that is C++ is a terrible example to us all there, in my opinion.
I think instead the shape of the future will be more like all those different languages that compile on the JVM - jython, Scala, Lua, and whatnot. They compile into interoperable modules without extra hassle, so in each module you can use the right tool for the job at hand.
I have the same issue with Perl, I use it for a big project about once a year, even though I write small scripts (50 lines) with it regularly. And yes, I spend about half the time relearning or finding new ways to do things for the job. This also means my "style" has been very fluid over the years. I go back now to make changes to a 5000 line Perl program I made 7 or 8 years ago, and I'm like "wtf did I do it this way?".
That said, Unicode would make it even harder since 99.9% of my programming is through a ssh shell. What I don't need is more characters and to have to remember odd character key combinations.
Tequila: It's not just for breakfast anymore!
It seems to me that this is an editor problem. And a lot of the blame for the parlous state of editors at the moment can be laid at the feet of Cpp, the C preprocessor.
"In retrospect, maybe the worst aspect of Cpp is that it has stifled the development of programming environments for C. The anarchic and character-level operation of Cpp makes nontrivial tools for C and C++ larger, slower, less elegant, and less effective than one would have thought possible." - Stroustrup, Design and Evolution of C++.
We should have a much better view of a program than a bunch of files containing characters.
Sean Ellis
Follow OfQuack's antics on Twitter.
Sir, Please Step Away from the APL keyboard.
I'll give you my ASCII when you pry it from my cold, dead hands!
But seriously. EBCDIC would work just as well.
ASCII is bad enough with hidden characters and where tabs and spaces look the same.
Where 1 & l & I or 0 & O or ' & ` are nearly identical in the wrong fonts.
How many times have I tried to compile only to get errors related to some invisible character that was imported from DOS or some guy's weird editor in Korea.
Really we want simpler. 8 Bit's is 2 bits to many already, This guy wants 16 Bit characters or 24 Bit.
Imaging 20 varieties of A that all look the same but behave differently!
I'm tell you right now. I will be doing ASCII for the rest of my life. I don't even like GUI IDE's I still prefer VI!
Imagine what a mess when you have 20 A's that all look identical but
I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
I used to be an APL guru, programming in VSAPL, APLSV' and some other APL versions. APL is super fast development for many problem solving issues. It could have a rebirth if we could extend ASCII,.
Leslie Satenstein Montreal Quebec Canada
I'm going to go out and say the problem this guy has with ascii is probably with editors and GUIs, as you mention in passing, then with ASCII.
:)
Editors can and SHOULD go much, much farther in mating code (functions written as they are now) with structure, that is, functions grouped and abstracted (and available to be edited) in groups that do not need a top to bottom representation. We ARE NOT talking diagrams here - that implies a tree or flow of direction. The flow is defined via function calls in ascii, like normal, and no connecting lines are needed. What is needed is the idea of groups (or aspects, or categories) that can reveal structure on a less restricted plane (pun intended) then a top to bottom file structure. Files are an unnecessary and evil hold over from people building what the they could not what they wanted to. Think about it; the flow of a program in general depends NOT AT ALL on the linear presentation of code that's rampant (let's ignore python's use of files-as-module, that can be tweaked).
The reason I bring this up is that the argument here is that the manipulation of symbols using ASCII is tedious, slow, constrictive. I submit the time spent and problems encountered ARE how symbols are presented and analyzed (esp.for people learning a code base for the first time) -- but on the function or class level, not the individual character.
I strongly believe (and will build eventually if I'm not beaten to it) are editor environments that succesfully understand code to deliver us from the ultimate tyranny of the file itself.
And that's my rant
CS majors know the time/space tradeoff, but they never get taught the 3rd, crucial, tradeoff of the set: comprehension!
There are some esoteric non-English based programming languages out there. Just imagine the fun porting this and similar OSS programs!
cpghost at Cordula's Web.
You're a bit confused---Classical Chinese had the 'one word one character' thing, and Japanese has three character sets (five if you include Arabic numerals and the extensive use of the Roman alphabet).
Less confused than imprecise, I think. Both have a 'one word one character' set, with the confusion that in China you also had multiple spoken languages all using the same written language. China also ended up with phonetic characters as well, so there's a different character set. Thus my use of 'like 3 character sets', because how many sets they have depends on how you or your school defines them.
Book sales are amongst the highest in the world, and Japanese newspapers have the highest circulations in the world. The extra time spent reading in school must be paying off for the Japanese.
You have a point there. I used to be a book a day guy. Eyes can't take that anymore, unfortuantly, I actually find a monitor easier to read off from these days, but so many e-books are so annoying with the software I prefer (free) fanfiction. Sure, there's a lot of dreck out there, but there's enough that the best 1% rivals commercial books.
I don't read AC A human right
I can see it now: code written in fonts that were only used once, and then no one ever wanted to use them again. I've got friends who did that, but they got better....
So, they're not teaching logic and functions anymore, it's all Magic! what computers do....
mark
Michael Kaplan has an interesting blog.
As unlikely as it sounds from context, he seems to care a great deal about correctness. It also paints a vivid picture of how hard Unicode is to get right.
you had me at #!
by moving to unicode, git svn bazaar mercury and cvs all have to be updated to understand how to treat unicode files - which they can't (they'll treat it as binary) - in order to identify lines that are added or removed, rather than store the entire file on each revision. bear in mind that you've just doubled (or quadrupled, for UCS-4) the amount of space required to store the revisions in the revision control systems' back-end database
There is this thing called UTF-8 which VCS already handle just fine (including even humble CVS, afaik).
Not appreciably larger. No larger at all, for characters in ASCII set.
you had me at #!
Railways, like character sets, are one of those situations where "close" doesn't quite cut it.
you had me at #!
The author admits writing the article in vi. Can we mod the TFA as -1 Hypocritical
> Everyone who tried to do something useful in APL, put up your hand.
APL is a wonderful language.
> Restricting digital storage to ones and zeros is needlessly polarizing
> and limiting. Why not allow a 0.5 bit value?
Word is the Russians tried to build trinary computers but the
magnetic cores wouldn't stay unmagnetized.
My stupid keyboard has redundant keys for the digits and a few others,
but no Umlaut, no Eszett and no Greek letters. Who designs this crap?
Some things can't even handle plain ASCII. Can anyone explain how
to google for "DVD-RW" or for "DVD+RW" without getting a gazillion
false hits? Google would be *so* much more useful it it handled
regular expressions.
A better way to go would be to *reduce* the number of symbols allowed in computer programs. This would reduce the number of errors, making programmers more productive.
Well, if Perl can get along with pure ASCII... And humans are definitely not descended from octopuses, so no.
The same thing we already do for user applications - use I8n mappings.
This could be done quite simply. Starting with the predefined symbols in the language ('+', 'sin', etc.), provide a translation table to any human language. Then at the top of the file, provide a DEFINE or equivalent for the language that says what human language the code is stored in (e.g, Greek).
Then the reader can open it in his/her own language, for example in English. Then it will look like it was originally coded in English, can be edited in English, then resubmitted. Then the person working in Greek would be able to open it and it would be in Greek for that person.
This can be extended to function names, etc. without too much additional work - the major work would be that the original coder or someone in the group that supports this open source (of course!! :) ) body of work, would have to build the proper translation tables.
This should actually be easier at this level than human languages, because at this level, programming languages have regular syntax, have fixed semantic content and lack idiomatic expressions.
Since there is already so much infrastructure for supporting and defining I8n translations, it should be relatively easy to modify the language compilers and interpreters to perform this step in a pre-parser.
I was going to suggest this for PHP6 but I don't know if I got around to posting it to the PHP6 team.
It's easier to be a result of the past, but more fun to be a cause of the future! http://www.spacefinancegroup.com/
If we want to go non-ASCII we could always switch to programming in APL (or maybe ObjectAPL), and have completely unreadable programs. Or else learn to program in Chinese.
By the taping of my glasses, something geeky this way passes
I vote for FIELDATA. Upper case letters only, just six bits per character. Aaaah, the days of core memory...