Why Haven't Special Character Sets Caught On?

\n cryptic? by Anonymous+Crowhead · 2005-10-17 08:55 · Score: 2, Insightful

And special characters wouldn't be?

Why? by wowbagger · 2005-10-17 08:55 · Score: 5, Insightful

Why are we not using characters that are:

Hard to generate on a standard keyboard
Not standardized in the specifications of the language.
Not standardized in the character sets of most non-bitmapped displays.
Not standardized in HTML markup.

Gosh, I don't know!

Now, if you will excuse me, I need to create a local variable named <The Symbol for the Artist Formerly Known as "The Artist Formerly Known As Prince">

--
www.eFax.com are spammers

Re:Why? by usrusr · 2005-10-17 12:14 · Score: 2, Interesting

fewer bugs?

i think you remember what happened when "they" introduced extended characters in the DNS: the only people who really used them were the phishers who could now create domain names with the new characters that looked very much the same as the names they were trying to imitate so browsers had to make a 180 for security reasons.

source code is a slightly different environment, but there it can already be different enough to visually distinguish between l and 1 in many fonts, or the various variations of an empty circle (oO0). You may argue that nobody willingly uses variable names like ll1O0, l1100 and lllOO, but rare those bugs _will_ happen more often once people start using exotic characters in source. source code has to be as unambigous as possible on every level.

ps: and while standard encondings exist it is still commonly seen that nonstandard characters get lost for example in cvs, "why waste time on configuring when only some umlauts in comments get lost?", but well, there are "standards", not "one standard", that makes great difference.

--
[i have an opinion and i am not afraid to use it]

Input method simplicity by Kelson · 2005-10-17 08:56 · Score: 3, Insightful

In programming? Most languages seem to be designed with ASCII in mind, so you have to stick with what's available there.

In general? I think it's a matter of input methods. Give me an input method where it takes only two keystrokes to type "" and I'll use it instead of "NE" or "". If I need to use a vulcan death grip, remember a code, or find it in a character map, I'm only going to bother when I have motivation: either making a point, like earlier in this paragraph, or making a polished document. Why go to the effort in a casual email, or a forum post, when it's much easier to type "" instead?

Re:Input method simplicity by Eric+Giguere · 2005-10-17 09:03 · Score: 2, Insightful

If I need to use a vulcan death grip

If you think emacs editing sequences are obscure now, imagine how much more fun they'd be with all those "special characters"...

If you're a touch typist, you really want to minimize the number of keys you have to press simultaneously to get something done, especially if you can't use hands separately to do it. Typing two or more normal characters together is much easier.
Eric
Get some stroller advice here

Why haven't dvorak keyboards caught on? by Neil+Blender · 2005-10-17 08:58 · Score: 2

I mean, they're better, right?

Argh! Here's another reason! by Kelson · 2005-10-17 08:59 · Score: 4, Funny

I entered an actual not-equal sign in that post, and Slashcode stripped it out!

YOU might be by DrSkwid · 2005-10-17 09:02 · Score: 2, Funny

my OS is where UTF8 [pdf] was invented.

--
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter

Take a step back and look at this question again by LeninZhiv · 2005-10-17 09:06 · Score: 4, Insightful

\n is cryptic and APL isn't?

I'd say it's more a question of 'choose your poison'. There is a learning curve whether one aims at mathematics-based notation schemes or historical computer science notations, and the market has already chosen (30 years ago) which one it prefers.

And not without cause. Human language looks a lot more like modern programming languages than mathematical notation, and a major goal of programming language design is to make it as straightforward as possible to tell the computer what you want it to do. One might object that by that argument Cobol is better than C, but humans, especially experts working in a specific domain, like abbreviations too. Cobol is hated because it doesn't allow you to abbreviate, not because it is hard to read, after all. APL or other such specialised syntaxes are hard to read and they don't fit closely enough with the way non-mathematicians think to be intuitive.

Listen to me by Profane+MuthaFucka · 2005-10-17 09:08 · Score: 5, Interesting

Now sonny, sit down a second and listen to grandpa rant about the good old days. The truth is, when I talk about the good old days, it's not because the days were actually good. It's because I have a sucky memory and questionable taste.

Now it is TRUE that I once did do programming in APL. This was on an old Zenith 8088 based PC clone with 640K of memory, a CGI display, and a 20 meg hard drive. The system itself worked rather well. If you could work a line editor, the development environment was all you could want. The problem was all the little stickers that went on the keys. Every key mapped to about three other symbols besides the normal ones, and just about every key had a little sticker on it. It was NOT fun. Just because your computers can display characters that look like Chinese doesn't mean that it's a good idea.

--
Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!

Efficient by Threni · 2005-10-17 09:09 · Score: 2, Insightful

Because we don't need to change for the sake of it, to a system which isn't supported by a lot of software and hardware. Why not just change your software to interpret the characters as an image, like some already does with smilies?

Simple by fm6 · 2005-10-17 09:11 · Score: 4, Insightful

Same reason the Dvorak keyboard has never caught on -- nobody wants to learn to type all over again.

Display was never the issue with APL. There are implementations of APL that use keywords instead of symbols. It's just that turning everything into an operator makes for really dense, hard-to-maintain code.

I'm reminded of Forth, which lacks APL's weird symbols, but shares its reputation for dense code. In its heyday, Forth programmers justified using it by claiming it made them more productive. And that's true — if you define "productivity" as "number of lines of new code hacked out per day". But code isn't just written, it's maintained, and dense languages are not maintenance friendly.

Because of old and crappy software, and laziness by metamatic · 2005-10-17 09:13 · Score: 2, Insightful

Because standardization of extended character sets, via Unicode, is a relatively recent development. Hence, there's a lot of software around that still doesn't handle Unicode.

For example, I switched to bash because tcsh didn't cope with Unicode. Mozilla's Unicode support is incomplete--card symbols defined in the HTML 4.01 standard don't show up properly on the Mac, even though it definitely has them in its standard fonts. Many text editors don't support Unicode. And so on.

In fact, it's only recently that Slashdot was fixed to allow us to use words like "cliché" and enter amounts of money in Pounds Sterling like £5.99, even though those 'special' characters were part of HTML 1.0. Forget about using the aforementioned card symbols on Slashdot—we got 1996's CSS a couple of months ago, maybe we'll get 1999's HTML 4 in 2008?

Next you add in the fact that most people are too lazy to even learn to spell correctly, far less learn how to type an e with an acute accent, and you have a recipe for today's state of the web.

--
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak

it is really a deep one. by torpor · 2005-10-17 09:17 · Score: 4, Insightful

{disclaimer: i'm a closet fontographer.}

i've thought about this question since 1978, as i have encountered over the years since then a grand litany of different ways of describing symbols in such a way that they can be standardly used, and i have come to a very simple answer. humans are stuck on a symbol treadmill with infinitely smooth bearings.

fontography is a lesson of symbols .. and the description of these symbols is limited by strict hardware limits: economic, social, cultural elements all have a part to play in the definition of input devices. where i say QWERTYZXCV, you say QWERTZYXCV.

we haven't seen terribly wide-spread specialization of symbols because of the producer-/consumer- cults of USKEY101, and peoples unfamiliarity with alt-numkeypad chops, and Mac vs. PC, and ASCII vs. UTF-8, and XML vs. .bin, and "X" vs. "Y", blah blah, ad infinitum..

the fact is, perhaps deep down inside we know we should be grateful for what we've got, and let the "!=" and ">=" expressions, 2 lonely bytes in a vast nasty sea, stand as testament to the human desire to at least, a little bit, get along on the same key. they may not be pretty, but pretty much everyone can get to those two bytes and use them when they need to .. its only a tiny clique can do the alt-numpad thing, and even fewer who choose to jump out of the ASCII pool and towel off..

--
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --

data entry by TheSHAD0W · 2005-10-17 09:30 · Score: 2, Insightful

I think a large part of it is because, even if we have the ability to display the characters, we don't have a convenient way to enter them. The keyboard doesn't have a Sine symbol key. Further, expanding the keyboard to include these symbols will just make it unwieldy. I suppose one could have the display automatically convert sequences into special characters, much like modern word processors perform auto-superscript, but this might cause problems when editing. I personally prefer it as-is.

Lowest common demoninator by Craig+Maloney · 2005-10-17 09:31 · Score: 4, Insightful

It's pretty simple: Lowest common denominator. Creating special character sets creates incompatibilities with other machines out there. That's why ASCII was such a boon, and why character sets like PETASCII, ATASCII, and others fell by the wayside. (And if you really want some character set fun, try EBCDIC sometime).

Re:Which characters by AuMatar · 2005-10-17 10:27 · Score: 2, Funny

No, you have the right font. Thats what perl always looks like.

--
I still have more fans than freaks. WTF is wrong with you people?

Take no RISCs by mnmn · 2005-10-17 10:35 · Score: 2, Funny

Have a large number of individual characters rather than a few characters than can be combined in many ways?

Why you sound like youre in favor of CISC.

--
"Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky

And music is hard to read for non-musicians by Quarters · 2005-10-17 10:47 · Score: 3, Insightful

\r\n, =, !=, etc... make sense to programmers. They understand the language. Just like the design of 32nd, 16th, 8th, 1/4, 1/2, and whole notes, along with extra notation to modify their true length of play and volume, makes sense to musicians. Why waste time and effort to make it readable for the masses when the masses probably don't care? If they did they'd learn to read the language.

Special Characters by russ_allegro · 2005-10-17 12:00 · Score: 2, Interesting

What might be interesting is if you can have your keyboard switch modes.

I could put the keyboard in math notation and automatically the keys on the keyboard display math symbols in a standardised pattern (like QWERTY is for letters but for math). Other modes could be added later.

On slashdot a few months back there was a keyboard in which the labels on the keys are dynamic. I think that is going in the interesting direction.

It reminds me of maybe how the computers in Star Trek Next Gen might behave. Where the terminal/key layout is specific to what you are doing (Engineering, medical, etc).

Basically instead of having a stupid windows dialogs where you click on stuff, you can use the keyboard designed to do your task.

It also amazes me seeing asian languages being typed in a computer.

droolings of an idiot by epine · 2005-10-17 14:12 · Score: 2, Interesting

The concept that APL code is "hard to maintain" is correct to first approximation, but it's more myth than reality when one digs deeper into the question. Most of the densest lines of code I once concocted in APL were 100% maintenance free: efficient and correct over the entire usable operand range. The density of the code squeezed out many degrees of freedom for making stupid errors even before you began.

There were other factors, having little to do with code density, that made APL systems hard to maintain. One was the psychological feeling that written twenty lines of comments to describe one line of code was somehow ungainly. I overcame this feeling within myself rather early. In fact, I wrote so many lines of comments for each line of code that my first work-term supervisor wrote a program to crawl through all the functions in one of my workspaces to *remove* every line of comment I had written, because he somehow thought he would understand my code better if he could fit it all onto the screen at once. His problem was that he didn't understand the *concepts* in my code. One of the things about raw APL code is that there are few surface markers that distinguish necessary manipulations from deep concepts. In my C language code, the necessary manipulations are largely gathered together in the initialization statements for local variables. Well, how much a language design should be based on protecting the programming team from supreme idiocy?

The second factor that made APL hard to maintain is that that it tried to force every concept to become a nail. The array primitive was surprisingly powerful, but it just didn't handle certain kinds of data aggregates at all well. And neither could you push this structure in the lexical direction, because there was no regex facility either.

And finally, the notion of a "workspace" was itself suspect. Every function was it's own text. There was no text anywhere that declared or described or controlled all the global variables that the workspace would necessarily include. There was no textual grouping of related functions into a higher-order interface or language facility. These decisions were made because APL originated in the teletype era. It had nothing to do with expressive density.

I think there is also an illusion at work that if you spend the day performing a maintenance task by visiting twenty source files and pawing through several thousand lines of code, that you are working with some greater efficiency than the guy who spent the whole day staring at single screenful of starkly beautiful APL scratchings. That's not so obvious to me having been there.

OK, here's the thing. In APL if a programmer decided to cut corners and forsake the "stark beauty" that made APL a workable language, there was precious little left behind on the surface to betray the sloppy work standard. Take one look at a C program written by a programming in a sloppy mindset, you know right away you are maintaining the droolings of an idiot. In APL, it could take you an hour to parse beneath the surface to find that same incompetence leaping out. The C language has far more expressive scope for the droolings of idiots and I guess those markers are worth a lot at the end of the day.

Re:Questioning the Status Quo by TheSkepticalOptimist · 2005-10-17 14:16 · Score: 2, Informative

It's not the compiler programmers have to worry about.

Sure, a compiler could in essense sort out a file written in a dozen different programming languages, but imagine a team of developers all with different programming backgrounds trying to figure out what each coded? Software design would cease to work.

Software language is like spoken language in general, we all need a set of syntax and grammar rules so we can simply understand each other and effectively communicate. If you write a book using a random assortment of languages, is anybody going to read it? Talk to someone in 4 different languages and are you going to have an understandable conversation.

Also, in reality, a compiler CAN'T actually be designed to parse and compile a file written in a variety of languages. Symbols and characters mean differnt things in different languages. How do you know when a code statement is complete? C uses the ; character, other languages rely on a linefeed or some other delimiting character. Some languages impose restrictions on how you place tabs in code even. How is a compiler going to know your intending a line to be written in Java while the next is in Pascal, then the next in C. How will a Java object be referenced by a C pointer or a FORTRAN formula or interact with other languages in the same program. Compilers need a rigid set of rules in order to parse code properly, and the syntax is important in order to ensure there are no errors while writing code or generating the machine language.

There are really no specialized languages these days. Fortran may have been used for math/science based projects, COBOL for business, but most people could easily link a library of C functions that offer enhanced math capabilities even though C wasn't specialized for math. A good generalized language allows you to write the specialized code using it without limitations.

Finally, I don't want to have to learn a dozen different languages to get the job done. I speciallize in C++ because I find it an effective way to develop the applications I am commissioned to write. While I can easily adapt to other languages or scripts for specific purposes, I don't want to have to learn FORTRAN simply to write some math formulas into my app. And I would shudder the day LISP makes a comeback in any way shape or form. Learning one language is easier then a dozen, and keeping specialized means your more effective and adept in using that one language. Its the old "Jack of all trades, master of none" addage, specialize or get left behind.

Lastely, we could use single characters to represent those "klunky" two character symbols. But why? What is the beef with writing != for inequality. Would using single characters make the language easier to write or understand? Your assuming that all people can easily recognize a single character symbol as meaningful. Looking at the APL programming language, I couldn't understand half the characters and I have been programming for 12 years. Again, as someone mentioned, if your a software developer != and such are easily understandable and I don't think we need to rewrite software tools and keyboards to make a few people happy, yes, even if Apple released a GUI OS so many years ago (what that has to do with anything, I don't know)

--
I haven't thought of anything clever to put here, but then again most of you haven't either.

Re: \n as newline by some+guy+I+know · 2005-10-17 19:44 · Score: 3, Insightful

And for non-visual characters like 'newline'.... what other idea, exactly, did you have?

How about U+2424?
Actually, that's the symbol for a graphic representing a newline (a slightly raised N next to a slightly lowered L, shrunk and crammed together into an area approximately a single em-space wide), so maybe that's not such a good idea (as how would you represent the graphic itself in a string?).
OTOH, a \ followed by U+2424 could better represent a newline graphically in a string.

The reason that \n seems "pretty straightforward" is that most of us are used to it.
The concept of backslash followed by a letter representing a control character started in C in the 1960s (or possibly even in earlier languages), and has been copied into dozens of other languages, along with other things like using % in printf strings to format variables (although some languages, like Ruby, are starting to offer alternative representations to %).
Note that, in Common LISP, a newline is represented by ~% and ~& in formatting strings, and #\Newline (spelled just that way) represents a newline character outside of formatting strings.
In Object Pascal/Delphi, a newline is represented by its decimal or hexadecimal equivalent, #10 or #$0A.
Some languages, like Python and sh/ksh/bash/etc., allow an actual newline in a string itself, so no representation is necessary (although Python allows \n as well, in its non-raw strings).
Other representations that I have seen in the past include ^J and ^M^J (for line feed and carriage return/line feed as control characters) and $ (for end-of-line in regular expressions (although the $ doesn't (usually) match the actual newline itself)) and in "list" mode in vi.

--
Those who sacrifice security to condemn liberty deserve to repeat history or something. - Benjamin Santayana

I disagree by gurps_npc · 2005-10-18 03:31 · Score: 2, Insightful

Most of you people are listing problems related to keyboards.

That demonstrates a lack of vision.

MAKE A NEW KEYBOARD.

Not that hard to do. Almost all computers have function keys on top. The majority of users DON'T USE THEM.

Just print up some new keyboards that have single symbols representing the major programer stuff, such as >=, To use them, print them above the F1,F2,F3, etc. access them by typeing shift F1, etc. etc. Allow them to be over-riden by programs that want to over-ride it.

If Apple did this, it would catch on instantly. In one year, Microsoft would steal the idea.

--
excitingthingstodo.blogspot.com

Re:I disagree by gurps_npc · 2005-10-18 06:55 · Score: 2, Insightful

Our job is nto to design a new keyboard for all languages. Non-english speakers already make their own keyboards. But for English speakers, there are a bunch of simple symbols that should definitely go in.
...Math...
Greater than or equal to
Less than or equal to.
Not equal to.
...Programming...
New line symbol.
Is it Alphabetically equal to (does not set, only used for asking. Equivelent to EQ, could co-opt the wavy equal sign)
Is it Numberically equal to (does not set, only used for asking, Equivelent to == in many computer languages, could co-opt the triple line equal sign.)
Then there are the common symbols that are not on the keyboard. These include
paragraph mark
pound mark
the cross used to signify footnotes
The copyright mark
the registered trademark
the small circle indicating temperature.
These 12 symbols are used throught the english world. Again, the idea is NOT to make an english keyboard useable by for other languages, but instead to expand the use of the keyboard to include the 12 most common symbols used within the english world. Non-english language keyboards should of course expand their own keyboard, but that is up to them, not those of us that speak english.

--
excitingthingstodo.blogspot.com

Re:Take a step back and look at this question agai by crmartin · 2005-10-18 10:27 · Score: 2, Informative

No, you're not my son, you're just another young moron who thinks links reflect knowledge. Of course, if you read your links you'll see that COBOL was driven by FLOW-MATIC; Java wasn't designed by a committee, but the version of C you've most certainly used was; and that LISP, FORTRAN, and COBOL are in fact exactly contemporary.

If you had much deep knowledge of programming languages --- or had read the links you posted --- you'd also realize that Java has more in common with Smalltalk than pretty much any other conventional language in any way except syntax.

What you probably don't know is that I stopped sleeping with your mother when I realized she'd have children that were ugly and dress funny.

Slashdot Mirror

Why Haven't Special Character Sets Caught On?

26 of 117 comments (clear)