Unicode 6.1 Released
An anonymous reader writes "The latest version of the Unicode standard (v. 6.1.0) was officially released January 31. The latest version includes 732 new characters, including seven brand new scripts. It also adds support for distinguishing emoji-style and text-style symbols and emoticons with variation selectors, updates to the line-breaking algorithm to more accurately reflect Japanese and Hebrew texts, and updates other algorithms and technical notes to reflect new characters and newly documented text behaviors."
Unicode seems to break everything and is completely unnecessary.
13 new emoticons1!1! http://www.unicode.org/charts/PDF/Unicode-6.1/U61-1F600.pdf
Take a good look at glyph 27cb aka \diagup part of the Misc Math Symbols. People are gonna try embedding that in html now. Can't wait.
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
has got to be the Love Hotel.
Does anyone know why this is even there?
Before anyone chimes in complaining that Slashdot doesn't even support an old version of Unicode, this is for several reasons. For one thing, there was once a fad of posting pornographic ASCII art on Slashdot, so it appears Slashdot disallows any character that would be more useful for glyph art than for English text. For another, there was once a fad of using bidirectionality override control characters for turning text backwards, which would break the layout and allow spoofing a comment's moderation score.
Seriously, emoticons? Who ever thought it a good idea to include those in a standard? Should we have an encoding for hearts as dots over lower case i as well? And little horseys, too? And y with a big tail that wraps around to the front of the word?
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
Raise your hand if you couldn't code a parser that detects those characters and takes appropriate action, such as popping bidi characters.
🙋 If I were writing such a parser, I don't know how I'd get it to automatically check for the release of a new version of the standard and determine which code points are new bidi characters to be popped.
I'd love to be able to write IPA when discussing pronunciation
It'd be nice but not necessary: X-SAMPA.
or actually write out words in other languages
I guess the rationale is that most moderators would not be able to read foreign words without transliteration into Latin characters.
pound and yen signs for currency
£ is Alt+0163 on a Windows machine, and ¥ is Alt+0165. They're probably Ctrl+Shift+U A 3 Enter and Ctrl+Shift+U A 5 Enter on a Linux machine, but I don't have one in front of me right this minute with which to test.
Trolls gonna troll; that's what moderation is for.
At one point, ASCII art spammers were filling pages with sexually explicit ASCII art, such as Goatse, male masturbation, and birds perched on a penis, so fast that moderators could not keep up.
So filter those character ranges.
Blacklisting doesn't work because the next version of the standard, such as Unicode 6.1, may introduce more undesirable character ranges.
Standardise the world on English. It'll be easier. It's already the second-most-spoken language, and Chinese is a real nightmare of character encoding in itsself. Then we can go back to good old ASCII.
Seriously, emoticons? Who ever thought it a good idea to include those in a standard?
Unicode had to be able to round-trip (losslessly encode and decode) all old popular encodings. This includes encoding now called "code page 437", introduced with the first IBM PC, which includes a smile emoticon at code value 0x01. It also includes the encodings associated with the widely distributed system fonts Zapf Dingbats and Wingdings.
Because my browser doesn't support Unicode 6.1 yet...
all the Tetris pieces
The polyominoes up to five squares can be composed from U+2580 (upper half block), U+2584 (lower half block), and 2588 (full block) characters. Unicode tends not to introduce precomposed ligatures except when needed for round-tripping with pre-Unicode encodings.
glyphs of game pieces of all well known games
A lot of well-known pre-1923 tabletop games' game pieces already exist in Unicode. Chess is U+2654 through U+265F, and Checkers is U+26C0 through U+26C3. A lot of game pieces are simple enough in form that the Geometric Shapes (U+25A0 through U+25FF) represent them just fine. For example, Othello is U+25CB and U+25CF, as is Connect Four. Even the enemy in Fast Eddie for Atari 2600 is in Miscellaneous Technical (U+237E) as is home plate in Baseball (U+2302).
heck, instead of just the suit symbols why not 52 glyphs for a standard deck of cards
Those can already be composed from a Basic Latin letter or number and a suit symbol. Unicode tends not to introduce precomposed ligatures except when needed for round-tripping with pre-Unicode encodings.
throw the Major Arcana tarot cards in there too
I don't know about Tarot, but all twelve signs of the zodiac are in Miscellaneous Symbols, even the "69" looking sign of Cancer (U+264B).
gang symbols
The symbol of "Folk Nation" gangs is similar to that of Judaism: a Star of David (U+2721). The symbol of "People Nation" gangs is similar to that of Islam: a 5-point star and crescent (U+262A).
continues toChew I type this. Insisted that they're gone KMac the mundane 3hores bombshell hit so that their personal rivalries
http://xkcd.com/927/
Unicode has different *pages*. You can filter by page.
New versions of Unicode introduce new pages. If you're blocking a page for some reason, the next version of Unicode might introduce another page that extends the functionality of the old page, reintroducing the behavior that led you to block the old page.
What's stopping us from just creating a Greasemonkey script that translates back and forth from HTML with square brackets and allows the full HTML set
Slashdot's lameness filter would probably confuse those square brackets with ASCII art, and even if not, the comment would likely draw negative moderations from moderators who haven't installed the Greasemonkey script.
by putting every message in its own e.g. IFRAME
There was a time when hundreds of <iframe> elements on a page would cause the browser to become unusably slow or even crash. I reported this to bugzilla.mozilla.org as Bug 103649, and a decade later it's still not RESOLVED FIXED. And are you going to put the subject of a comment in its own iframe too?
and force a maximum size on the comment content.
Until April 2014, when IE 6 passes out of extended support, one can't assume that all supported browsers support CSS max-width.
If they write a brilliant paragraph a day ago, then deleted it in the morning, they can view the document as it existed yesterday, copy the paragraph back out, and be done with it.
For one thing, an application that saves (and sends) a document's undo history along with the document can disclose things that the document's author did not want to disclose. I seem to vaguely remember scandals with Word's AutoRecover being used to recover redacted parts of a document. For another, how much of the limited space on the drive should be dedicated to saving a document's undo history since creation, especially when the document is a large layered picture or multitrack audio project?
And that's because people forget to save - why not have the OS do it for them?
I agree, but how often should the OS spin up the hard drive to do so?
Well said, that man. If you feel the desire to "write" with stick figures and squiggles use a bastarding graphic, for fuck's sake.
Eklinóringëon my arse.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Àçcênts aré easy (if you have Windows). See http://vulpeculox.net/ax.
Works for 'any' application. Free. No stupid picking or codes.
In a toolbar full of icons, the word "Save" or its localization without an icon will probably look out of place. Is this out-of-placeness somehow superior to the use of a floppy disk icon?
First Google Dart, then Mozilla Rust, and now this "Unicode"? Yet another attempt for a universal "one language for all uses" that is destined to fail.
They've got symbols for a love hotel, a horse, and a steaming pile of poo, along with emoticons, and they still haven't accepted the Tengwar draft that's been around since '93? Where are these people's priorities!?
"It needed to be flexible, so it's a VM now."
I fear this is the next step. The right to left and line wrapping BS is complicated enough that I'd welcome a specialized VM with loadable bytecode & glyph data. Yes, from a security standpoint this could create a wider attack surface. However, I'd argue it would be less attack surface considering that the VM for my unlimited precision scientific & programming calculator is smaller than my UTF-8 text display implementation.
I'd also argue that it would be faster to adopt new glyphs and behaviors if all I needed was to drop in a new batch of bytecode.
I'd also argue just to argue... because, well this IS Unicode we're talking about.
....oh there it is.
I'm sure we could have found some way to get along without "Mathematical Rising Diagonal" and "Kissing Face".
That is all.
A glyph for an ice cream cone but still no half-stars to do movie ratings?
We already have a two-character icon for wanker: Latin capital letter W (U+0057), followed by Anchor (U+2693).
Done as well.