Slashdot Mirror


Unicode 6.1 Released

An anonymous reader writes "The latest version of the Unicode standard (v. 6.1.0) was officially released January 31. The latest version includes 732 new characters, including seven brand new scripts. It also adds support for distinguishing emoji-style and text-style symbols and emoticons with variation selectors, updates to the line-breaking algorithm to more accurately reflect Japanese and Hebrew texts, and updates other algorithms and technical notes to reflect new characters and newly documented text behaviors."

29 of 170 comments (clear)

  1. 27cb appearing in HTML in 5.4.3.2.1... by vlm · · Score: 2

    Take a good look at glyph 27cb aka \diagup part of the Misc Math Symbols. People are gonna try embedding that in html now. Can't wait.

    --
    "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    1. Re:27cb appearing in HTML in 5.4.3.2.1... by BetterThanCaesar · · Score: 2

      Parsers of XML, HTML and SGML need and may only support U+002F SOLIDUS as "closing slash". If that weren't the case, we'd already have problems with people writing and .

      --
      "Stop failing the Turing test!" -- Dilbert
  2. Favourite unicode character by Cocodude · · Score: 3, Interesting

    has got to be the Love Hotel.

    Does anyone know why this is even there?

    1. Re:Favourite unicode character by vlm · · Score: 2

      As if http://www.fileformat.info/info/unicode/char/1f4be/index.htm makes sense to anyone under age 30. I demand the addition of a punchcard glyph...

      --
      "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    2. Re:Favourite unicode character by am+2k · · Score: 2

      The "don't bother me with those implementation details"-icon?

    3. Re:Favourite unicode character by DragonWriter · · Score: 2

      Back to my original question: If not a floppy disk, what icon should be used for this action of committing an edited document to the part of the file system viewable by other users and applications?

      The generic flowchart datastore symbol with an inbound arrow (retrieving something previously committed would use the same symbol with an outbound arrow.)

      For products with less technical audiences, a stone tablet with an etching instrument, since committing results in the data being "carved in stone".

    4. Re:Favourite unicode character by snowgirl · · Score: 2

      They have 14 planes of ~65,536 characters... even after including massive syllabaries, and the unified CJK ideographs, they still had really only used the first plane. Now they're presented with only using about 7% of the space available, and so they started chucking just about every pictograph that they could possibly come up with into it...

      I'm sorry, but while I'm down for having every script that is actually used, and every script that has been decoded, I don't see why we should have all of these pictographs, before we have something like tengwar, and cirth. Sure, tengwar and cirth are made up fantasy scripts, but they're more widely used than Linear B...

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  3. Why Slashdot won't adopt it by tepples · · Score: 5, Informative

    Before anyone chimes in complaining that Slashdot doesn't even support an old version of Unicode, this is for several reasons. For one thing, there was once a fad of posting pornographic ASCII art on Slashdot, so it appears Slashdot disallows any character that would be more useful for glyph art than for English text. For another, there was once a fad of using bidirectionality override control characters for turning text backwards, which would break the layout and allow spoofing a comment's moderation score.

    1. Re:Why Slashdot won't adopt it by BetterThanCaesar · · Score: 4, Insightful

      Raise your hand if you couldn't code a parser that detects those characters and takes appropriate action, such as popping bidi characters.

      I'd love to be able to write IPA when discussing pronunciation, or actually write out words in other languages, ohm character for discussing electronics, pound and yen signs for currency ... Hey, even a bigger whitelist than what we have now would be great!

      --
      "Stop failing the Turing test!" -- Dilbert
    2. Re:Why Slashdot won't adopt it by Hentes · · Score: 2

      For one thing, there was once a fad of posting pornographic ASCII art on Slashdot, so it appears Slashdot disallows any character that would be more useful for glyph art than for English text.

      If ASCII can be used for trolling just the same than there is little point in not implementing Unicode. The point of moderation is to prevent these issues.

      For another, there was once a fad of using bidirectionality override control characters for turning text backwards, which would break the layout and allow spoofing a comment's moderation score.

      That's because of a buggy/unsecure implementation. It doesn't mean it can't be done right.

  4. Re:Stick to ASCII by cc1984_ · · Score: 5, Funny

    Yeah but can you write a pile of poo in ASCII?

    http://www.fileformat.info/info/unicode/char/1f4a9/index.htm

  5. Re:Zomg by piripiri · · Score: 3, Insightful

    Yes, lolcats are a standard now.

  6. emoticons? by pz · · Score: 3, Insightful

    Seriously, emoticons? Who ever thought it a good idea to include those in a standard? Should we have an encoding for hearts as dots over lower case i as well? And little horseys, too? And y with a big tail that wraps around to the front of the word?

    --

    Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    1. Re:emoticons? by snowgirl · · Score: 3, Informative

      And little horseys, too?

      U+1F40E ... no, seriously...

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    2. Re:emoticons? by Hentes · · Score: 3, Funny

      The next thing will be teenagers building bigger emoticons out of emoticon characters. Then they will have to be included in the standard as well, and so on...

  7. Re:Zomg by fuzzyfuzzyfungus · · Score: 4, Funny

    I believe you mean to say that lolcats are in ur standardz, occupyin ur code-points; but not necessarily prescribing ur particular choice of glyph...

  8. Re:Checking for the release of a new version by Canazza · · Score: 5, Funny

    £ is Shift+3, what are you on about?

    --
    It pays to be obvious, especially if you have a reputation for being subtle.
  9. Re:The next version of the standard by StuartHankins · · Score: 4, Funny

    ...filling pages with sexually explicit ASCII art, such as Goatse, male masturbation, and birds perched on a penis...

    Yeah, the way they are going they might actually *have* these characters in the set now...

  10. Tetris, Chess, Baseball, and gang symbols by tepples · · Score: 4, Informative

    all the Tetris pieces

    The polyominoes up to five squares can be composed from U+2580 (upper half block), U+2584 (lower half block), and 2588 (full block) characters. Unicode tends not to introduce precomposed ligatures except when needed for round-tripping with pre-Unicode encodings.

    glyphs of game pieces of all well known games

    A lot of well-known pre-1923 tabletop games' game pieces already exist in Unicode. Chess is U+2654 through U+265F, and Checkers is U+26C0 through U+26C3. A lot of game pieces are simple enough in form that the Geometric Shapes (U+25A0 through U+25FF) represent them just fine. For example, Othello is U+25CB and U+25CF, as is Connect Four. Even the enemy in Fast Eddie for Atari 2600 is in Miscellaneous Technical (U+237E) as is home plate in Baseball (U+2302).

    heck, instead of just the suit symbols why not 52 glyphs for a standard deck of cards

    Those can already be composed from a Basic Latin letter or number and a suit symbol. Unicode tends not to introduce precomposed ligatures except when needed for round-tripping with pre-Unicode encodings.

    throw the Major Arcana tarot cards in there too

    I don't know about Tarot, but all twelve signs of the zodiac are in Miscellaneous Symbols, even the "69" looking sign of Cancer (U+264B).

    gang symbols

    The symbol of "Folk Nation" gangs is similar to that of Judaism: a Star of David (U+2721). The symbol of "People Nation" gangs is similar to that of Islam: a 5-point star and crescent (U+262A).

  11. Disclosure, drive space, and spinning up by tepples · · Score: 2

    If they write a brilliant paragraph a day ago, then deleted it in the morning, they can view the document as it existed yesterday, copy the paragraph back out, and be done with it.

    For one thing, an application that saves (and sends) a document's undo history along with the document can disclose things that the document's author did not want to disclose. I seem to vaguely remember scandals with Word's AutoRecover being used to recover redacted parts of a document. For another, how much of the limited space on the drive should be dedicated to saving a document's undo history since creation, especially when the document is a large layered picture or multitrack audio project?

    And that's because people forget to save - why not have the OS do it for them?

    I agree, but how often should the OS spin up the hard drive to do so?

  12. Re:Alternative proposal: by DragonWriter · · Score: 2

    Standardise the world on English. It'll be easier. It's already the second-most-spoken language, and Chinese is a real nightmare of character encoding in itsself. Then we can go back to good old ASCII.

    ASCII leaves off a lot of English punctuation, and accents that are, in fact, used in English (sure, in words of foreign origin, but they are still used.)

  13. Re:Alternative proposal: by snowgirl · · Score: 2

    English also has the second-worst spelling system on the planet (only outdone by Japanese).

    ??? WTF are _YOU_ on about? English does not have the worst spelling system on the planet, and Japanese certainly doesn't qualify as the worst. "But they have three different scripts: two syllabaries, and an ideographic set" but...

    Look, perhaps I better just demonstrate to you what a real bad spelling system looks like; go look at Irish.

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  14. Re:Alternative proposal: by snowgirl · · Score: 2

    ASCII leaves off a lot of English punctuation, and accents that are, in fact, used in English (sure, in words of foreign origin, but they are still used.)

    Some that aren't foreign as well. "Coöperate" is an archaic spelling. Basically, any prefix that ends in "o" that is attached to a word that starts with an "o" can archaically be spelled with a diaeresis, in the French/Dutch method of "this vowel should be pronounced separately, and not as part of a diphthong".

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  15. Re:Stick to ASCII by Pieroxy · · Score: 3, Informative

    ASCII is just 128 characters.

  16. Re:Stick to ASCII by metamatic · · Score: 3, Funny

    This is Slashdot, I'm sure you can find any number of examples of people who've written a pile of poo in ASCII.

    --
    GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
  17. Re:Stick to ASCII by Xtifr · · Score: 2

    Yeah but can you write a pile of poo in ASCII?

    As far as I know, Windows was originally written in ASCII... :)

  18. Re:No hoops... by icebraining · · Score: 2

    They're only "easy" if you have your system configured for ISO-8859-1. Those of us who use UTF-8 get this result: à é.

  19. Re:Stick to ASCII by petermgreen · · Score: 2

    I'm pretty sure in HTML5 like in HTML4 the document is considered to be made up of unicode characters and other charsets are considered as encodings of unicode. Of course the HTML5 spec doesn't include all unicode characters explicitly that would be insane.

    --
    note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  20. Re:Obligatory XKCD by marcosdumay · · Score: 4, Insightful

    You know that this is the exact situation that Unicode AVOIDED, doesn't you?

    Now we have one standard with 3 different representation. Those replaced literaly thousands of standards. Yep, sometimes doing that new standard works.