Slashdot Mirror


Unicode 6.1 Released

An anonymous reader writes "The latest version of the Unicode standard (v. 6.1.0) was officially released January 31. The latest version includes 732 new characters, including seven brand new scripts. It also adds support for distinguishing emoji-style and text-style symbols and emoticons with variation selectors, updates to the line-breaking algorithm to more accurately reflect Japanese and Hebrew texts, and updates other algorithms and technical notes to reflect new characters and newly documented text behaviors."

170 comments

  1. Zomg by Anonymous Coward · · Score: 0

    13 new emoticons1!1! http://www.unicode.org/charts/PDF/Unicode-6.1/U61-1F600.pdf

    1. Re:Zomg by aepurniet · · Score: 1

      9 cat faces emoticons? is this really necessary in a character standard?

    2. Re:Zomg by piripiri · · Score: 3, Insightful

      Yes, lolcats are a standard now.

    3. Re:Zomg by BSAtHome · · Score: 1

      The correct sequence for business, politics and everything is of now:
      #1F648 #1F649 #1F64A

      Gotta love the effort that went into providing the proper symbols.

    4. Re:Zomg by fuzzyfuzzyfungus · · Score: 4, Funny

      I believe you mean to say that lolcats are in ur standardz, occupyin ur code-points; but not necessarily prescribing ur particular choice of glyph...

    5. Re:Zomg by willie3204 · · Score: 1

      There are at least 3 I know of right now.

      (=^^=)
      (=^^=)
      =^_^=

  2. 27cb appearing in HTML in 5.4.3.2.1... by vlm · · Score: 2

    Take a good look at glyph 27cb aka \diagup part of the Misc Math Symbols. People are gonna try embedding that in html now. Can't wait.

    --
    "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    1. Re:27cb appearing in HTML in 5.4.3.2.1... by GrangerX · · Score: 1

      If I read the character list correctly, that's the division / slash symbol. That does sound somewhat ominous from a malformed-URL perspective. (They also added something that looks like backslash as 27cd).

    2. Re:27cb appearing in HTML in 5.4.3.2.1... by Anonymous Coward · · Score: 0

      These aren't the only slash-like characters. U+2044 and U+2215 look even more slash-like. And why shouldn't they be in Unicode? If your concern is hostname spoofing, I can assure you that the set of non-Latin characters allowed in the hostname part of URLs is very restricted. In the path part it doesn't matter.

    3. Re:27cb appearing in HTML in 5.4.3.2.1... by vlm · · Score: 1

      Thats a good once, but I'm also worried about html parsers needing to understand half a dozen variants of the "closing slash"

      --
      "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    4. Re:27cb appearing in HTML in 5.4.3.2.1... by BetterThanCaesar · · Score: 2

      Parsers of XML, HTML and SGML need and may only support U+002F SOLIDUS as "closing slash". If that weren't the case, we'd already have problems with people writing and .

      --
      "Stop failing the Turing test!" -- Dilbert
  3. Favourite unicode character by Cocodude · · Score: 3, Interesting

    has got to be the Love Hotel.

    Does anyone know why this is even there?

    1. Re:Favourite unicode character by vlm · · Score: 2

      As if http://www.fileformat.info/info/unicode/char/1f4be/index.htm makes sense to anyone under age 30. I demand the addition of a punchcard glyph...

      --
      "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    2. Re:Favourite unicode character by tepples · · Score: 1

      What better icon is there for the action of committing an edited document to storage?

    3. Re:Favourite unicode character by am+2k · · Score: 2

      The "don't bother me with those implementation details"-icon?

    4. Re:Favourite unicode character by Anonymous Coward · · Score: 0

      It amuses me how, in our zeal to infuse ourselves with whatever the newest whatever is, we're accelerating towards "lawl anything made last month is so ancient and wut is taht lawl". Amusing in that I've already lost my faith in "internet culture" years ago, and it'll be hilarious to watch the entire thing inevitably implode due to everyone repeating the same mistakes a few years earlier solely because it's not trendy and non-trendy things are boring.

      But, don't mind me, I'll just play along and laugh from the sidelines. lawl wut iz a floppeedsik is taht liek a ipod lawl?

    5. Re:Favourite unicode character by tepples · · Score: 1

      What exactly did you mean by this statement? What are you calling an implementation detail with which the user shouldn't be bothered?

    6. Re:Favourite unicode character by Anonymous Coward · · Score: 0

      For the "under 30" group: cloud
      It even has an error character too: thundercloud

    7. Re:Favourite unicode character by am+2k · · Score: 1

      What exactly did you mean by this statement? What are you calling an implementation detail with which the user shouldn't be bothered?

      The location where the data ist stored (RAM vs. harddrive). There are some effects that play against each other here:

      • For editing, the data has to be in RAM (at least the part that's edited at the moment).
      • When the data is in RAM, but not on the disk, the state is lost after a crash or sudden power loss. This is undesirable.
      • Copying from RAM to harddrive (aka "saving") takes time.

      As computers get better, the latter effect becomes negligible. This means that when this is done automatically in the background (which is certainly possible for most data these days), the user doesn't have to manage this technical detail. Less management means that the user has more time thinking about the things that he/she really wants to do using the application, and it reduces the number of errors (losing hours of work, because the user forgot to save).

    8. Re:Favourite unicode character by piripiri · · Score: 1

      Like "I want to store my document on the hard disk but the available feature is saving to a floppy", or for youglings these days: "WTF is that?"

    9. Re:Favourite unicode character by JDG1980 · · Score: 1

      Oh, come on. Everyone who uses computers even casually knows that the floppy-disk icon means "Save." That it no longer reflects the underlying hardware is irrelevant.

    10. Re:Favourite unicode character by tepples · · Score: 1

      ...Copying from RAM to harddrive (aka "saving") takes time. As computers get better, the latter effect becomes negligible.

      Continuous autosave is possible with current technology, but it requires wasting battery power on spinning a hard drive's platter at all times while the user continues to edit the document. I agree that it's an implementation issue, but the underlying technical reason for the implementation issue is still present in 2012 technology. I don't see the distinction between fast temporary storage and large nonvolatile storage "becom[ing] negligible" until large SSDs and cellular data become a lot cheaper. In addition, one ordinarily doesn't want to create a new numbered revision of the document in a revision control system after each keypress; there has to be some way to mark one's changes as suitable for being viewed by other editors of the document, not unlike the SQL keyword COMMIT.

    11. Re:Favourite unicode character by am+2k · · Score: 1

      Continuous autosave is possible with current technology, but it requires wasting battery power on spinning a hard drive's platter at all times while the user continues to edit the document. I agree that it's an implementation issue, but the underlying technical reason for the implementation issue is still present in 2012 technology. I don't see the distinction between fast temporary storage and large nonvolatile storage "becom[ing] negligible" until large SSDs and cellular data become a lot cheaper. In addition, one ordinarily doesn't want to create a new numbered revision of the document in a revision control system after each keypress; there has to be some way to mark one's changes as suitable for being viewed by other editors of the document, not unlike the SQL keyword COMMIT.

      Yes, you shouldn't save after every single keypress, but a timer for saving every minute or so (if there are any changes) should suffice. Committing for others to see is a different thing, that's something a user can be expected to understand.

      Ultimately, for revert/versions there should be a timeline slider like there was in Google Wave, where you can go back to your document's state of any point in the past.

      btw, affordable SSDs are already large enough for everyday use. My notebook has a 256GB SSD in it, and I didn't have to sell my car for it.

    12. Re:Favourite unicode character by tlhIngan · · Score: 1

      What exactly did you mean by this statement? What are you calling an implementation detail with which the user shouldn't be bothered?

      Why should the user be bothered with it? There aren't many real-life instances where a user creates and it isn't "autosaved".

      It's one of the things that OS X Lion is doing - it's asking "why do we still do this?". Lion-aware apps automatically autosave in the background, and have a time-machine like feature that lets them view their document as it existed in the past. If they write a brilliant paragraph a day ago, then deleted it in the morning, they can view the document as it existed yesterday, copy the paragraph back out, and be done with it.

      Heck. Lion is trying to get away from the whole "You need to manage your application's state" as well - the OS can manage its resources.

      Right now, most apps implement some form of autorecovery. Word keeps crashing on me so I'm thankful when it seems to only lose a few minutes work. Ditto vim. And that's because people forget to save - why not have the OS do it for them? (And with Lion's autosave, it won't commit unrecoverable changes so you can always go back to an earlier revision).

    13. Re:Favourite unicode character by tepples · · Score: 1

      Committing for others to see is a different thing, that's something a user can be expected to understand.

      Back to my original question: If not a floppy disk, what icon should be used for this action of committing an edited document to the part of the file system viewable by other users and applications?

      btw, affordable SSDs are already large enough for everyday use.

      Not when "everyday use" includes storing a large collection of purchased music and purchased movies.

      I didn't have to sell my car for [a 256 GB SSD].

      But you did have to pay more than one would for the stock hard drive that comes bundled with a low-end laptop. Google Product Search shows 256 GB SSD in the $300-$400 range. Until the ultrabook market matures, autosave will still waste the computer's hardware resources.

    14. Re:Favourite unicode character by GreatBunzinni · · Score: 1

      Here, a punch card glyph. Not quite what I expected but still...
      http://www.fileformat.info/info/unicode/char/5361/index.htm

      There is also a card index glyph do?
      http://www.fileformat.info/info/unicode/char/1f4c7/index.htm

      There might not be a punchcard glyph, but there is a minidisk one:
      http://www.fileformat.info/info/unicode/char/1f4bd/index.htm

      and an optical disk one:
      http://www.fileformat.info/info/unicode/char/1f4bf/index.htm

      and a DVD one:
      http://www.fileformat.info/info/unicode/char/1f4c0/index.htm

      I cannot imagine how this can ever be used in a useful manner, instead of being simply an irrelevant gimmick. Does anyone know why this stuff found its way into the standard?

      --
      Slashdot, fix your code or at least hire someone who is competent at it to do it for you.
    15. Re:Favourite unicode character by DragonWriter · · Score: 2

      Back to my original question: If not a floppy disk, what icon should be used for this action of committing an edited document to the part of the file system viewable by other users and applications?

      The generic flowchart datastore symbol with an inbound arrow (retrieving something previously committed would use the same symbol with an outbound arrow.)

      For products with less technical audiences, a stone tablet with an etching instrument, since committing results in the data being "carved in stone".

    16. Re:Favourite unicode character by snowgirl · · Score: 2

      They have 14 planes of ~65,536 characters... even after including massive syllabaries, and the unified CJK ideographs, they still had really only used the first plane. Now they're presented with only using about 7% of the space available, and so they started chucking just about every pictograph that they could possibly come up with into it...

      I'm sorry, but while I'm down for having every script that is actually used, and every script that has been decoded, I don't see why we should have all of these pictographs, before we have something like tengwar, and cirth. Sure, tengwar and cirth are made up fantasy scripts, but they're more widely used than Linear B...

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    17. Re:Favourite unicode character by X0563511 · · Score: 1

      The first one you link is a Chinese symbol. Looks totally valid to me.

      Remember, Chinese has symbols for entire words or ideas, it is not "alphabetical" like most other popular languages.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    18. Re:Favourite unicode character by Hognoxious · · Score: 1

      What better icon is there for the action of committing an edited document to storage?

      One with the word "Save" on it.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    19. Re:Favourite unicode character by jbengt · · Score: 1

      If the user never saved it, then where is it when the user needs it later? Auto-saved, OK, but where and under what name? There still needs to be a save option, and an icon, even if outdated, is useful for that.

    20. Re:Favourite unicode character by tepples · · Score: 1

      The generic flowchart datastore symbol with an inbound arrow

      Thank you. I had forgotten about the flowchart symbols because nowadays none of them appear see popular use except an oval for module entry and exit, a box for a step, and a diamond for a decision.

    21. Re:Favourite unicode character by GreatBunzinni · · Score: 1

      Yes, it is. I don't question that character. The others, on the other hand, are a bit silly though.

      --
      Slashdot, fix your code or at least hire someone who is competent at it to do it for you.
    22. Re:Favourite unicode character by X0563511 · · Score: 1

      Agreed. Myself, I think it would be better to just reserve the space for future use, giving us plenty of expansion room without having to increase the word size (utf8 to utf16 to utf32) - instead of just filling the section up with nonsense.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    23. Re:Favourite unicode character by Actually,+I+do+RTFA · · Score: 1

      They have 14 planes of ~65,536 characters

      I thought unicode was unlimited? The coding methods might each have a limit, but the standard is unlimited.

      --
      Your ad here. Ask me how!
    24. Re:Favourite unicode character by pjt33 · · Score: 1

      But you did have to pay more than one would for the stock hard drive that comes bundled with a low-end laptop.

      You could remove "the stock hard drive that comes bundled with" from that sentence and it would still be true.

    25. Re:Favourite unicode character by amorsen · · Score: 1

      I'm sorry, but while I'm down for having every script that is actually used, and every script that has been decoded, I don't see why we should have all of these pictographs

      If they are or were in use in real programs, it sucks to not have them in the standard. Unicode started out as a quite political project (e.g. Han Unification) but it has become much more pragmatic over time.

      We need the emoji and the other junk in the standard so that we are able to use Unicode as a credible archiving format.

      --
      Finally! A year of moderation! Ready for 2019?
    26. Re:Favourite unicode character by Anonymous Coward · · Score: 0

      Ey, I know what it is, and I'm 19. It's a 3.5" (un-)floppy disk, nicknamed the diskette, unlike the 5.25" floppy my 1570 uses.

    27. Re:Favourite unicode character by am+2k · · Score: 1

      If the user never saved it, then where is it when the user needs it later? Auto-saved, OK, but where and under what name? There still needs to be a save option, and an icon, even if outdated, is useful for that.

      Saved to an internal directory, and will be opened as an untitled document the next time you open the application.

    28. Re:Favourite unicode character by Anonymous Coward · · Score: 0

      Not one of these guys again...

    29. Re:Favourite unicode character by am+2k · · Score: 1

      Yes, however I don't think that many users know what an internal hard drive looks like... So using this as an icon for saving is not a solution either. USB sticks and external drives vary too wildly in their looks to be recognized at that size.

    30. Re:Favourite unicode character by Anonymous Coward · · Score: 0

      Actually, it was ISO/IEC 10646 that started out as a Han Unification project. Unicode actually began as a universal character encoding standard. Between version 1.0 and 1.01, Unicode merged with 10646, and they became one big squabbling family, where everyone got to act like they were Unicode, but got named after 10646. The Tibetans got lost when they moved into the new house, and somehow the Koreans ended up being triplets, but they eventually found their way back home. Eventually the Cherokees brought some native flair, while the Mormons made everyone stop drinking, at least for a while. Eventually the Chinese decided they needed a place for all their ancestors ashes, and the Japanese kids spread lolcats all over the place. Of course, now we've got French stenographers and old Hungarians knocking at the door, trying to get in, not to mention a bunch of African tribesmen and some more Minoans trying to force the gate.

      PS, the "pictographs" are encoded because of a need to catalogue all of those emoji-laden text messages they send in Japan.

    31. Re:Favourite unicode character by Anonymous Coward · · Score: 0

      The first one just plainly means "card", and is pronounced as "ka" in mandarin.

    32. Re:Favourite unicode character by unixisc · · Score: 1

      If it is a 16 bit standard, how can it be unlimited? It can support at the most 2^16, or 65,536 characters. Where does it get planes from?

    33. Re:Favourite unicode character by Xest · · Score: 1

      I had no idea but was intrigued to find out myself, and stumbled upon this, which presumably explains it:

      http://www.developerfusion.com/news/91207/unicode-6-out-with-2000-new-characters-but-what-support-does-it-have/

      I knew the Japanese would be involved somewhere!

    34. Re:Favourite unicode character by neonsignal · · Score: 1

      The "love hotel" symbol is part of the Emoji set. These are a semi-standardized set of emoticons that had widespread use in Japan. It was Google that proposed their inclusion in Unicode. http://sites.google.com/site/unicodesymbols/Home/emoji-symbols

    35. Re:Favourite unicode character by snowgirl · · Score: 1

      They have 14 planes of ~65,536 characters

      I thought unicode was unlimited? The coding methods might each have a limit, but the standard is unlimited.

      The limit is mostly purely arbitrary as newer encodings allow for much more expanded coding sequences. However, due to the way UTF-16 encodes values above UTF+0xFFFF it is limited to expressing at most a 20-bit codepoint, meaning that the Unicode standard is basically limited practically to 16 pages of 65536 values. So, short of breaking changes to the UTF-16 standards you're basically SOL.

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    36. Re:Favourite unicode character by snowgirl · · Score: 1

      If it is a 16 bit standard, how can it be unlimited? It can support at the most 2^16, or 65,536 characters. Where does it get planes from?

      UTF-16 is NOT a naive 16-bit encoding, and has a set of surrogate pairs that allow one to construct codepoints of up to 20-bits in a UTF-16 stream. Subtract out the 16-bits per plane, and you're left with 4-bits, which is 16.

      I misquoted 14 in my post, the Unicode standard only defines 14 planes, and 2 private use areas.

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    37. Re:Favourite unicode character by Actually,+I+do+RTFA · · Score: 1

      If it is a 16 bit standard, how can it be unlimited?

      If it were a 16-bit standard, it couldn't be unlimited. But it's not. In two ways. First, Unicode is simply a number->meaning table, and doesn't specify actual in memory format. There are a lot of competing standards for that. Second, UTF-16 has 1.1 M values. UTF-32 has 4B. UTF-8 has a 2B or a 1.1M limit depending on the version.

      --
      Your ad here. Ask me how!
    38. Re:Favourite unicode character by unixisc · · Score: 1

      Okay, I haven't been keeping track of the standard, beyond knowing that it's a 16-bit standard, and so I thought that 65,536 characters is probably enough to cover all characters in all known languages. I stand corrected.

      But UTF-8? Is that a 8 bit standard, and if yes, how different is it from ASCII, aside from the 8th bit? Also, in UTF-32, beyond the UTF-16 set, what are the characters that UTF-32 supports? Somehow, I miss the point about why card symbols, tetris pieces and other pictographs should be supported - I can understand emoticons, since in this thread, they mentioned the Japanese using them, but I'm not seeing where the others would be used.

      Incidentally, do they have a separate set for mathematical symbols, such as surface integral, volume integral, partial derivatives & so on? In the case that Greek symbols are used, do they simply re-use them, or do they redefine them elsewhere, things like Sigma (for summation), pi, and so on?

    39. Re:Favourite unicode character by Actually,+I+do+RTFA · · Score: 1

      But UTF-8? Is that a 8 bit standard, and if yes, how different is it from ASCII, aside from the 8th bit?

      UTF-8 and UTF-16 (as well as others, like UTF-7), are variable length. Hence, both can exceed the number of characters you may expect.

      I have no clue why various symbols are added. I assume it's a result of having way too much space.

      I believe they do have mathematical symbols. And I believe that the Greek symbols are redefined. I mean, I can see slightly different font renderings for Capital Sigma (summation) and Capital Sigma (in Greek).

      --
      Your ad here. Ask me how!
  4. Re:Stick to ASCII by aepurniet · · Score: 1

    seriously, why would people need more than 256 characters? and why would they need more than 640k of memory?

  5. Why Slashdot won't adopt it by tepples · · Score: 5, Informative

    Before anyone chimes in complaining that Slashdot doesn't even support an old version of Unicode, this is for several reasons. For one thing, there was once a fad of posting pornographic ASCII art on Slashdot, so it appears Slashdot disallows any character that would be more useful for glyph art than for English text. For another, there was once a fad of using bidirectionality override control characters for turning text backwards, which would break the layout and allow spoofing a comment's moderation score.

    1. Re:Why Slashdot won't adopt it by countertrolling · · Score: 1

      Before anyone chimes in complaining that Slashdot doesn't even support an old version of Unicode...

      Oops.. But I kinda wish the <i> tag still worked

      --
      For justice, we must go to Don Corleone
    2. Re:Why Slashdot won't adopt it by BetterThanCaesar · · Score: 4, Insightful

      Raise your hand if you couldn't code a parser that detects those characters and takes appropriate action, such as popping bidi characters.

      I'd love to be able to write IPA when discussing pronunciation, or actually write out words in other languages, ohm character for discussing electronics, pound and yen signs for currency ... Hey, even a bigger whitelist than what we have now would be great!

      --
      "Stop failing the Turing test!" -- Dilbert
    3. Re:Why Slashdot won't adopt it by Kjella · · Score: 1

      Just admit that it's because it's old and random, there's a few HTML entities working but there's no reason why &aelig; = æ should would and &mu; = shouldn't - like in micrograms, or uTorrent. It's a geeky site, but it's made for writing English prose with some half-hearted Latin1 support, no math or science.

      --
      Live today, because you never know what tomorrow brings
    4. Re:Why Slashdot won't adopt it by Anonymous Coward · · Score: 1

      Before anyone chimes in complaining that Slashdot doesn't even support an old version of Unicode, this is for several reasons. For one thing, there was once a fad of posting pornographic ASCII art on Slashdot, so it appears Slashdot disallows any character that would be more useful for glyph art than for English text.

      Trolls gonna troll; that's what moderation is for.

      For another, there was once a fad of using bidirectionality override control characters for turning text backwards, which would break the layout and allow spoofing a comment's moderation score.

      So filter those character ranges.

    5. Re:Why Slashdot won't adopt it by Fastolfe · · Score: 1

      There are technical solutions to these problems, such as tracking language/BIDI overrides when embedding strings provided by users (and reversing the effect afterward). You could also do it the "easy" way and just filter out characters based on their Unicode property (e.g. disallow all 'other' characters, which would include these formatting characters).

    6. Re:Why Slashdot won't adopt it by alex67500 · · Score: 1

      You can write your comments in RAW html no? HTML entities might help...

    7. Re:Why Slashdot won't adopt it by Anonymous Coward · · Score: 0

      The old bullshit excuses...

      Unicode has different *pages*. You can filter by page. This *guarantees* that nobody will do any tricks with e.g. direction reversal etc. So that "argument" is out.
      And about the ASCII art: Hell, other blogs have, *gasp* IMAGE links!
      How about that?

      You know what? What's stopping us from just creating a Greasemonkey script that translates back and forth from HTML with square brackets and allows the full HTML set, by putting every message in its own e.g. IFRAME so it can't mess with the stuff around it. (Or alternatively, just disallow style parameters, allow only certain CSS classes, and force a maximum size on the comment content.)

      Come on, it's not that hard! You're just either too lazy, too stupid, or both.

    8. Re:Why Slashdot won't adopt it by X0563511 · · Score: 1

      Looks like extended-ASCII, not necessarily UTF/UCS. For example, 0xE9: é

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    9. Re:Why Slashdot won't adopt it by X0563511 · · Score: 1

      Here's the reason: æ = 0xE6 (or 0xC6 for capitol) in extended ASCII, where Mu is not present in extended ASCII. It appears slashdot dumps anything outside of that range.

      Lets try an experiment:
      0xAB and 0xBB:
      0xA7 and 0xB6:

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    10. Re:Why Slashdot won't adopt it by X0563511 · · Score: 1

      False! Only a subset is allowed, but anything outside of it most definitly seems to fail.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    11. Re:Why Slashdot won't adopt it by Hentes · · Score: 2

      For one thing, there was once a fad of posting pornographic ASCII art on Slashdot, so it appears Slashdot disallows any character that would be more useful for glyph art than for English text.

      If ASCII can be used for trolling just the same than there is little point in not implementing Unicode. The point of moderation is to prevent these issues.

      For another, there was once a fad of using bidirectionality override control characters for turning text backwards, which would break the layout and allow spoofing a comment's moderation score.

      That's because of a buggy/unsecure implementation. It doesn't mean it can't be done right.

    12. Re:Why Slashdot won't adopt it by bill_mcgonigle · · Score: 1

      Raise your hand if you couldn't code a parser that detects those characters and takes appropriate action, such as popping bidi characters.

      Um, so do it and submit a patch against Slashcode?

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    13. Re:Why Slashdot won't adopt it by unixisc · · Score: 1

      Do raw HTML 4 symbols show up, if one explicitly typed them? Such as &#923 or &Lambda? That would help quite a bit - one is not likely to usually see morons use them to make pornographic ASCII or Unicode art.

  6. Re:Stick to ASCII by cc1984_ · · Score: 5, Funny

    Yeah but can you write a pile of poo in ASCII?

    http://www.fileformat.info/info/unicode/char/1f4a9/index.htm

  7. Re:Stick to ASCII by countertrolling · · Score: 1

    Slashdot seems to believe so, seeing that we can't type accents and whatnot without jumping through a few hoops

    --
    For justice, we must go to Don Corleone
  8. Re:Stick to ASCII by Anonymous Coward · · Score: 0

    Yes. +2D3cqQ

  9. emoticons? by pz · · Score: 3, Insightful

    Seriously, emoticons? Who ever thought it a good idea to include those in a standard? Should we have an encoding for hearts as dots over lower case i as well? And little horseys, too? And y with a big tail that wraps around to the front of the word?

    --

    Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    1. Re:emoticons? by snowgirl · · Score: 3, Informative

      And little horseys, too?

      U+1F40E ... no, seriously...

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    2. Re:emoticons? by leonbloy · · Score: 1

      There's also a carrousel horse.

    3. Re:emoticons? by Anonymous Coward · · Score: 0

      Well then, we need to include:
        - all the sprites and playfield object tiles from the NES Super Mario Bros. 1 CHR ROM.
        - all the Tetris pieces
        - glyphs of game pieces of all well known games
        - heck, instead of just the suit symbols why not 52 glyphs for a standard deck of cards
        - throw the Major Arcana tarot cards in there too
        - gang symbols

    4. Re:emoticons? by Anonymous Coward · · Score: 0

      Seriously, emoticons? Who ever thought it a good idea to include those in a standard? Should we have an encoding for hearts as dots over lower case i as well? And little horseys, too? And y with a big tail that wraps around to the front of the word?

      Look, unless you want to put the Unicode committee members out of jobs, we have GOT to keep looking for new characters!

    5. Re:emoticons? by gutnor · · Score: 1

      Unicode encode old characters of a dead languages only a few professor will ever use, that makes a lot less sense than emoticons, character that are actually used daily by lots of people.

    6. Re:emoticons? by GreatBunzinni · · Score: 1

      The U+1f4af character is a bit harder to explain than little horses, because it relies on a 4-octet code character to express something which can be easily expressed by using 3 1-octed characters.

      --
      Slashdot, fix your code or at least hire someone who is competent at it to do it for you.
    7. Re:emoticons? by Anonymous Coward · · Score: 0

      Seriously, emoticons? Who ever thought it a good idea to include those in a standard?

      The 100+ million Japanese who use them to communicate on a daily basis? Or the phone manufacturers only have to worry about one encoding system going into the future to display they characters?

    8. Re:emoticons? by Anonymous Coward · · Score: 0

      Who ever thought it a good idea to include those in a standard?

      Umm, Google, for one. Yahoo, for another. Turns out there were something like 15 proprietary versions of Shift JIS that the different phone carriers were using to transmit and store emoji. Instead of having to tag every message with an encoding - welcome to the bad old days of mojibake! - or having to settle for only one of the JIS versions that wouldn't necessarily reflect every carrier's emoji set, the Unicode repertoire allows all those SMS messages to get stored and indexed.

    9. Re:emoticons? by Anonymous Coward · · Score: 0

      Well then, we need to include:

        - heck, instead of just the suit symbols why not 52 glyphs for a standard deck of cards

      Done and done.

      - throw the Major Arcana tarot cards in there too

      Working on it.

    10. Re:emoticons? by Hentes · · Score: 3, Funny

      The next thing will be teenagers building bigger emoticons out of emoticon characters. Then they will have to be included in the standard as well, and so on...

    11. Re:emoticons? by unixisc · · Score: 1

      How about Mahjong? ;)

  10. Checking for the release of a new version by tepples · · Score: 1

    Raise your hand if you couldn't code a parser that detects those characters and takes appropriate action, such as popping bidi characters.

    &#x1F64B; If I were writing such a parser, I don't know how I'd get it to automatically check for the release of a new version of the standard and determine which code points are new bidi characters to be popped.

    I'd love to be able to write IPA when discussing pronunciation

    It'd be nice but not necessary: X-SAMPA.

    or actually write out words in other languages

    I guess the rationale is that most moderators would not be able to read foreign words without transliteration into Latin characters.

    pound and yen signs for currency

    £ is Alt+0163 on a Windows machine, and ¥ is Alt+0165. They're probably Ctrl+Shift+U A 3 Enter and Ctrl+Shift+U A 5 Enter on a Linux machine, but I don't have one in front of me right this minute with which to test.

    1. Re:Checking for the release of a new version by Canazza · · Score: 5, Funny

      £ is Shift+3, what are you on about?

      --
      It pays to be obvious, especially if you have a reputation for being subtle.
    2. Re:Checking for the release of a new version by Anonymous Coward · · Score: 0

      £ is AltGr+Shift+4. What are you on about?

    3. Re:Checking for the release of a new version by Anonymous Coward · · Score: 0

      -1, does not have a Linux machine within arm's reach. And admits to it.

    4. Re:Checking for the release of a new version by Anonymous Coward · · Score: 0

      &#x1F64B; If I were writing such a parser, I don't know how I'd get it to automatically check for the release of a new version of the standard and determine which code points are new bidi characters to be popped.

      Bidi ranges are already set by the Unicode roadmaps. It's just a range check.

    5. Re:Checking for the release of a new version by Nivag064 · · Score: 1

      Shift+3 is the 'hash' symbol '#' for me!

    6. Re:Checking for the release of a new version by pjt33 · · Score: 1

      I guess the rationale is that most moderators would not be able to read foreign words without transliteration into Latin characters.

      So at least give us Latin-1. There are English words which use accents in high registers.

    7. Re:Checking for the release of a new version by hackertourist · · Score: 1

      Only on UK keyboard layouts.

    8. Re:Checking for the release of a new version by Anonymous Coward · · Score: 0

      Not on any keyboard *I've* thrown xmodmap at (ok, xkmcomp now). Whoever thought that was a good idea (as well as Shift+2 not being quotedbl) needs to be drowned in the river.

    9. Re:Checking for the release of a new version by Anonymous Coward · · Score: 0

      Sometimes. But it's always Compose, L, -. At least on well-designed operating systems.

    10. Re:Checking for the release of a new version by tepples · · Score: 1

      What high registers are you talking about that use accents other than what one can already get by typing Alt+0233 to get é and similar?

    11. Re:Checking for the release of a new version by GrandTeddyBearOfDoom · · Score: 1

      And on the UK Mac keyboard, you then have to use option-3 to get a hash(#), whereas typical UK keyboards have dedicated keys for both. Makes programming certain languages on the Mac a little tedious (having to use a shift-key combination everytime you want to comment in e.g. PERL or Ruby).

      --
      -- The Grand Teddy Bear has Spoken: "Windows 8 Source Code Available NOW! more disgusting than your pr..."
    12. Re:Checking for the release of a new version by pjt33 · · Score: 1

      I actually use HTML escape codes when I remember, but I have to remember not to just hit the combining accent key, and I forgot more often than I remember. The same is probably true of most of those of us who use keyboards with keys for Latin-1 accents.

  11. The next version of the standard by tepples · · Score: 1

    Trolls gonna troll; that's what moderation is for.

    At one point, ASCII art spammers were filling pages with sexually explicit ASCII art, such as Goatse, male masturbation, and birds perched on a penis, so fast that moderators could not keep up.

    So filter those character ranges.

    Blacklisting doesn't work because the next version of the standard, such as Unicode 6.1, may introduce more undesirable character ranges.

    1. Re:The next version of the standard by StuartHankins · · Score: 4, Funny

      ...filling pages with sexually explicit ASCII art, such as Goatse, male masturbation, and birds perched on a penis...

      Yeah, the way they are going they might actually *have* these characters in the set now...

    2. Re:The next version of the standard by afabbro · · Score: 1

      Blacklisting doesn't work because the next version of the standard, such as Unicode 6.1, may introduce more undesirable character ranges.

      That would lead to the Slashdot "editors" having to maintain their code, and we can't have that.

      --
      Advice: on VPS providers
    3. Re:The next version of the standard by Dahan · · Score: 1

      Blacklisting doesn't work because the next version of the standard, such as Unicode 6.1, may introduce more undesirable character ranges.

      It's not difficult to update a simple file/DB entry/whatever to add more characters to the blacklist. Include a little util to parse the UnicodeData file and automatically blacklist all control characters. But even if you wanted to go with a whitelist instead of a blacklist, there's no reason for the whitelist to be as small as it currently is. And then there's what I assume is a Slashcode bug where non-ASCII characters that are in the whitelist don't come through properly. I've seen numerous posts where a stray character gets included. I don't feel like looking for examples right now, but I don't think people are all making the same consistent typos.

    4. Re:The next version of the standard by Hognoxious · · Score: 1

      At one point, ASCII art spammers were filling pages with sexually explicit ASCII art, such as Goatse, male masturbation, and birds perched on a penis, so fast that moderators could not keep up.

      They can do that with or without unicode, so how does blocking unicode help?

      Blacklisting doesn't work because the next version of the standard, such as Unicode 6.1, may introduce more undesirable character ranges.

      How often do new versions come out? We aren't talking about Firefox here.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  12. Smile emoticon at CP437 code 0x01 by tepples · · Score: 1

    Seriously, emoticons? Who ever thought it a good idea to include those in a standard?

    Unicode had to be able to round-trip (losslessly encode and decode) all old popular encodings. This includes encoding now called "code page 437", introduced with the first IBM PC, which includes a smile emoticon at code value 0x01. It also includes the encodings associated with the widely distributed system fonts Zapf Dingbats and Wingdings.

    1. Re:Smile emoticon at CP437 code 0x01 by Anonymous Coward · · Score: 1

      This has nothing to do with emoticons. Emoticons are by definition composed of several characters, or reinterpretations of existing characters (such as the tilde with dieresis).

      GP probably means emoji. There is an emoji encoding widely used on Japanese mobile devices, so it makes perfect sense. Either Unicode includes emojis, or Japanese mobile devices are never going to switch to Unicode. Unicode emojis were originally requested by Google in 2007 and released with Unicode 6.0 in October 2010, after a long-winded open discussion and many changes.

  13. Re:Alternative proposal: by Anonymous Coward · · Score: 0

    English also has the second-worst spelling system on the planet (only outdone by Japanese). I may have to use it on /. but I'm happy I don't have to resort to it for daily usage. And even if your idiotic proposal were to be universally accepted (which it won't, it's like asking everyone to use DOS) we'd still be in of a way to encode historical documents and such.

  14. I can't see the new characters by Anonymous Coward · · Score: 0

    Because my browser doesn't support Unicode 6.1 yet...

  15. Tetris, Chess, Baseball, and gang symbols by tepples · · Score: 4, Informative

    all the Tetris pieces

    The polyominoes up to five squares can be composed from U+2580 (upper half block), U+2584 (lower half block), and 2588 (full block) characters. Unicode tends not to introduce precomposed ligatures except when needed for round-tripping with pre-Unicode encodings.

    glyphs of game pieces of all well known games

    A lot of well-known pre-1923 tabletop games' game pieces already exist in Unicode. Chess is U+2654 through U+265F, and Checkers is U+26C0 through U+26C3. A lot of game pieces are simple enough in form that the Geometric Shapes (U+25A0 through U+25FF) represent them just fine. For example, Othello is U+25CB and U+25CF, as is Connect Four. Even the enemy in Fast Eddie for Atari 2600 is in Miscellaneous Technical (U+237E) as is home plate in Baseball (U+2302).

    heck, instead of just the suit symbols why not 52 glyphs for a standard deck of cards

    Those can already be composed from a Basic Latin letter or number and a suit symbol. Unicode tends not to introduce precomposed ligatures except when needed for round-tripping with pre-Unicode encodings.

    throw the Major Arcana tarot cards in there too

    I don't know about Tarot, but all twelve signs of the zodiac are in Miscellaneous Symbols, even the "69" looking sign of Cancer (U+264B).

    gang symbols

    The symbol of "Folk Nation" gangs is similar to that of Judaism: a Star of David (U+2721). The symbol of "People Nation" gangs is similar to that of Islam: a 5-point star and crescent (U+262A).

    1. Re:Tetris, Chess, Baseball, and gang symbols by Anonymous Coward · · Score: 0

      All playing cards have their own symbol. Wikipedia.

  16. Obligatory XKCD by eternaldoctorwho · · Score: 0
    1. Re:Obligatory XKCD by marcosdumay · · Score: 4, Insightful

      You know that this is the exact situation that Unicode AVOIDED, doesn't you?

      Now we have one standard with 3 different representation. Those replaced literaly thousands of standards. Yep, sometimes doing that new standard works.

    2. Re:Obligatory XKCD by Anonymous Coward · · Score: 0

      It didn't replace them. For those of use living in a country where one of the single-byte encodings (like the iso8859-series or even ascii) are enough, it's much easier to continue with those, rather than going for UCS-2 (or even worse, UCS-4) which would double (or quadruple) the amount of memory and disk space required, or even worse, the nightmare-encodings (UTF).

    3. Re:Obligatory XKCD by marcosdumay · · Score: 1

      UTF-8 would have a negligible impact on file size, and I really doubt ISO8859 is enough for anybody, since everybody gets texts in foreign languages once in a while. And who is concerned with the size of text files nowadays?

      Anyway, ISO8859 weren't the only encodings with widespread use before Unicode.

    4. Re:Obligatory XKCD by Anonymous Coward · · Score: 0

      Now we have one standard with 3 different representation. Those replaced literaly thousands of standards. Yep, sometimes doing that new standard works.

      Hey, I thought we hated "embrace and extend"?

  17. Hundreds of iframes by tepples · · Score: 1

    Unicode has different *pages*. You can filter by page.

    New versions of Unicode introduce new pages. If you're blocking a page for some reason, the next version of Unicode might introduce another page that extends the functionality of the old page, reintroducing the behavior that led you to block the old page.

    What's stopping us from just creating a Greasemonkey script that translates back and forth from HTML with square brackets and allows the full HTML set

    Slashdot's lameness filter would probably confuse those square brackets with ASCII art, and even if not, the comment would likely draw negative moderations from moderators who haven't installed the Greasemonkey script.

    by putting every message in its own e.g. IFRAME

    There was a time when hundreds of <iframe> elements on a page would cause the browser to become unusably slow or even crash. I reported this to bugzilla.mozilla.org as Bug 103649, and a decade later it's still not RESOLVED FIXED. And are you going to put the subject of a comment in its own iframe too?

    and force a maximum size on the comment content.

    Until April 2014, when IE 6 passes out of extended support, one can't assume that all supported browsers support CSS max-width.

    1. Re:Hundreds of iframes by Jesus_666 · · Score: 1

      Why not use a reasonable whitelist? It's unlikely that a new version of Unicode would turn a printable character into a bidi control character and printable JIS characters are not automatically evil, especially not if the lameness filter treats them as non-letters.

      As for "people could spam ASCII art": People could also flood Slashdot with bizarre textual porn copypasta. The key part of "posting ASCII art faster than the mods can cope" is "faster than the mods can cope", not "ASCII art".

      It is fairly weird that a geek-centric website like Slashdot doesn't support Unicode but instead relies on an undocumented subset of Latin-1. Especially in 2012.

      --
      USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
    2. Re:Hundreds of iframes by JDG1980 · · Score: 1

      New versions of Unicode introduce new pages. If you're blocking a page for some reason, the next version of Unicode might introduce another page that extends the functionality of the old page, reintroducing the behavior that led you to block the old page.

      So use a whitelist instead of a blacklist for pages.

    3. Re:Hundreds of iframes by amorsen · · Score: 1

      Until April 2014, when IE 6 passes out of extended support, one can't assume that all supported browsers support CSS max-width.

      Who the fuck cares whether Slashdot renders on IE6?!

      Although to be fair, it does seem like that is the only browser that Slashdot does care about. All the others probably spend more time supporting Slashdot than Slashdot spends supporting them.

      --
      Finally! A year of moderation! Ready for 2019?
  18. Disclosure, drive space, and spinning up by tepples · · Score: 2

    If they write a brilliant paragraph a day ago, then deleted it in the morning, they can view the document as it existed yesterday, copy the paragraph back out, and be done with it.

    For one thing, an application that saves (and sends) a document's undo history along with the document can disclose things that the document's author did not want to disclose. I seem to vaguely remember scandals with Word's AutoRecover being used to recover redacted parts of a document. For another, how much of the limited space on the drive should be dedicated to saving a document's undo history since creation, especially when the document is a large layered picture or multitrack audio project?

    And that's because people forget to save - why not have the OS do it for them?

    I agree, but how often should the OS spin up the hard drive to do so?

  19. Re:Stick to ASCII by unixisc · · Score: 1

    Yeah, it's fantastic that Cyrillic or Katanaga or Devanagiri scripts can be so beautifully supported in ASCII. Speaking of which, does HTML5 have a complete character list for unicode, or is it still restricted to ASCII?

  20. Re:Alternative proposal: by DragonWriter · · Score: 2

    Standardise the world on English. It'll be easier. It's already the second-most-spoken language, and Chinese is a real nightmare of character encoding in itsself. Then we can go back to good old ASCII.

    ASCII leaves off a lot of English punctuation, and accents that are, in fact, used in English (sure, in words of foreign origin, but they are still used.)

  21. I blame Star Trek & LotR. by Hognoxious · · Score: 1

    Well said, that man. If you feel the desire to "write" with stick figures and squiggles use a bastarding graphic, for fuck's sake.

    Eklinóringëon my arse.

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  22. No hoops... by Anonymous Coward · · Score: 0

    Àçcênts aré easy (if you have Windows). See http://vulpeculox.net/ax.
    Works for 'any' application. Free. No stupid picking or codes.

    1. Re:No hoops... by icebraining · · Score: 2

      They're only "easy" if you have your system configured for ISO-8859-1. Those of us who use UTF-8 get this result: à é.

    2. Re:No hoops... by marcosdumay · · Score: 1

      Hey, so it is the /.'s web server that doesn't do encoding right? I always tought it was the GCI code.

      WTF are they using to serve those pages?

  23. Re:Alternative proposal: by snowgirl · · Score: 2

    English also has the second-worst spelling system on the planet (only outdone by Japanese).

    ??? WTF are _YOU_ on about? English does not have the worst spelling system on the planet, and Japanese certainly doesn't qualify as the worst. "But they have three different scripts: two syllabaries, and an ideographic set" but...

    Look, perhaps I better just demonstrate to you what a real bad spelling system looks like; go look at Irish.

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  24. Re:Alternative proposal: by snowgirl · · Score: 2

    ASCII leaves off a lot of English punctuation, and accents that are, in fact, used in English (sure, in words of foreign origin, but they are still used.)

    Some that aren't foreign as well. "Coöperate" is an archaic spelling. Basically, any prefix that ends in "o" that is attached to a word that starts with an "o" can archaically be spelled with a diaeresis, in the French/Dutch method of "this vowel should be pronounced separately, and not as part of a diphthong".

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  25. Re:Stick to ASCII by Pieroxy · · Score: 3, Informative

    ASCII is just 128 characters.

  26. Re:Alternative proposal: by shutdown+-p+now · · Score: 1

    ??? WTF are _YOU_ on about?

    Can you concisely explain why the English word "psyche" is pronounced the way it is to a non-native speaker of the language?

  27. "Save" with no icon in a toolbar full of icons by tepples · · Score: 1

    In a toolbar full of icons, the word "Save" or its localization without an icon will probably look out of place. Is this out-of-placeness somehow superior to the use of a floppy disk icon?

    1. Re:"Save" with no icon in a toolbar full of icons by Hognoxious · · Score: 1

      Yes, because none of the [working] machines here has a floppy drive and nobody under the age of twenty has ever even seen one except in a museum, you smug wanker.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    2. Re:"Save" with no icon in a toolbar full of icons by m2shariy · · Score: 1

      We need an icon for smug wanker!

  28. Re:Stick to ASCII by metamatic · · Score: 0

    There's a bug in WebKit on the Mac that stops font fallback working properly.

    Reported by me in Chrome, reported up the chain to Apple.

    It works fine in Chrome for Linux, so it's something weird and Mac-specific Apple will probably need to fix.

    --
    GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
  29. Re:Stick to ASCII by metamatic · · Score: 3, Funny

    This is Slashdot, I'm sure you can find any number of examples of people who've written a pile of poo in ASCII.

    --
    GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
  30. Re:Great, yet another "unified language" by Anonymous Coward · · Score: 0

    What a hell are you talking about? Scuse me, are you from the past?!

  31. And where's Tengwar? by Xtifr · · Score: 1

    They've got symbols for a love hotel, a horse, and a steaming pile of poo, along with emoticons, and they still haven't accepted the Tengwar draft that's been around since '93? Where are these people's priorities!?

    1. Re:And where's Tengwar? by Anonymous Coward · · Score: 0

      These proposals don't just write themselves. Quite frankly, the reason why Tengwar is still waiting in the wings is because the Tengwar community has either been satisfied with the status quo, uninvolved in the process moving forward - support of any modern user community is important to these proposals - or no one has been willing to come forward and spearhead the process for Tengwar. Trust me, if you wrote to the Unicode list tomorrow, saying that you wanted to dedicate the time, effort, and research resources to get Tengwar encoded, you'd have helpers, some grant funding, and community contact info in a snap. So what are you waiting for?

    2. Re:And where's Tengwar? by Xtifr · · Score: 1

      The proposal has already been written. The encoding has already been done. The draft is on its third revision and seems quite stable. I don't see what else is needed except...drumming up support among the "modern user community". Hence my post. (My guess is that Slashdot has more Tengwar fans than the Unicode list.)

  32. Re:Stick to ASCII by Xtifr · · Score: 2

    Yeah but can you write a pile of poo in ASCII?

    As far as I know, Windows was originally written in ASCII... :)

  33. Re:Alternative proposal: by Anonymous Coward · · Score: 0

    Can you explain why any word in french is pronounced the way it is?
    It seems like they have different rules for what letters to pronounce for every word.

    Anyway the reason you pronounce psyche like that is because it sounds better than psitsh.

  34. It needed to be flexible, so it's a VM now. by VortexCortex · · Score: 1, Offtopic

    "It needed to be flexible, so it's a VM now."

    I fear this is the next step. The right to left and line wrapping BS is complicated enough that I'd welcome a specialized VM with loadable bytecode & glyph data. Yes, from a security standpoint this could create a wider attack surface. However, I'd argue it would be less attack surface considering that the VM for my unlimited precision scientific & programming calculator is smaller than my UTF-8 text display implementation.

    I'd also argue that it would be faster to adopt new glyphs and behaviors if all I needed was to drop in a new batch of bytecode.

    I'd also argue just to argue... because, well this IS Unicode we're talking about.

  35. Re:Stick to ASCII by rishistar · · Score: 1

    Something wrong with the Java code for this though Character.getNumericValue() is documented as returning -1 for this character, when quite clearly it should be a number 2.

    --
    Professor Karmadillo Songs of Science
  36. Re:Alternative proposal: by Whibla · · Score: 1

    Can you concisely explain why the English word "psyche" is pronounced the way it is to a non-native speaker of the language?

    ps: Pronounced s. Whenever you see the letters 'p' and 's' together at the start of a word, do not pronounce the 'p', for example in pseudonym, psilocybin, or psst!
    y: Pronounced eye or, more simply i. Why? Exactly! Um...
    che. Pronounced key. Because without it everything would remain locked up in your brain.

    And if that doesn't work there's always the fall back: Because it is!

  37. Re:Stick to ASCII by Megane · · Score: 1

    I'm particularly fond of this set:

    1F648 SEE-NO-EVIL MONKEY
    1F649 HEAR-NO-EVIL MONKEY
    16F4A SPEAK-NO-EVIL MONKEY

    The only thing better would be a smoking monkey character. Because there ain't nothing funnier than a smoking monkey!

    --
    #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
  38. Re:Alternative proposal: by pjt33 · · Score: 1

    It's a loan-word from Greek. It follows the basic English rules for borrowing Greek words.

  39. Re:Alternative proposal: by shutdown+-p+now · · Score: 1

    The rules for regular English words are no better, to be honest. It's like someone was trying to come up with the most perverted way to make a letter represent something as different as possible from what it does in most European languages (and Latin, where it originates). The only language that's possibly worse in that regard is French, but at least they are consistent in the way they mutilate their phonemes (and most of it is just dropping them altogether), whereas in English you have to guess which of the possible 2-4 radically different pronunciations of a single letter (like, say, /e/ vs /i/ vs /ai/), often with no rules other than "you just kinda look at which words are similar and go from there, but mind the exceptions!".

    I understand that this is not really the fault of the language itself, it's just that its spelling effectively represents how the words were pronounced several centuries ago (circa Chaucer), rather than how they are pronounced today. It would be great to fix that, but it's probably too late by now - it already is the "world language" in its present shape, and all those millions of people are not going to re-learn. So all that we can do is rant about it.

  40. There's no emoticon for what I'm feeling!!!! by paranoid123 · · Score: 1

    ....oh there it is.

  41. Re:Alternative proposal: by shutdown+-p+now · · Score: 1

    Can you explain why any word in french is pronounced the way it is? It seems like they have different rules for what letters to pronounce for every word.

    You know, "better than French" is not a great achievement. Indeed, one of the reasons why English is in such a sorry shape is because it absorbed an unhealthy dose of French poison as part of its history.

    Anyway, the rule of thumb in French seems to be, if you don't know how to pronounce any given letter, just skip it altogether - >50% chance of you getting it right in that case. ~

    Anyway the reason you pronounce psyche like that is because it sounds better than psitsh.

    Technically, it should be /psixe/, which sounds reasonable to me.

  42. I don't know... by frank_adrian314159 · · Score: 1

    I'm sure we could have found some way to get along without "Mathematical Rising Diagonal" and "Kissing Face".

    --
    That is all.
  43. Re:Alternative proposal: by SuricouRaven · · Score: 1

    Drop the accents, people will know what you mean... and in a long enough period of time, only historians will care.

  44. Re:Stick to ASCII by petermgreen · · Score: 2

    I'm pretty sure in HTML5 like in HTML4 the document is considered to be made up of unicode characters and other charsets are considered as encodings of unicode. Of course the HTML5 spec doesn't include all unicode characters explicitly that would be insane.

    --
    note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  45. Re:Stick to ASCII by Anonymous Coward · · Score: 1

             )
            (
              ,
           ___)\
          (_____)
         (_______)

  46. half stars still missing? by Anonymous Coward · · Score: 0

    A glyph for an ice cream cone but still no half-stars to do movie ratings?

  47. U+0057 U+2693 = wanker by tepples · · Score: 1

    We already have a two-character icon for wanker: Latin capital letter W (U+0057), followed by Anchor (U+2693).

  48. Re:Stick to ASCII by unixisc · · Score: 1

    But that defeats the purpose of Unicode, doesn't it? I'm not expecting that HTML5 support, for instance, Wingdings, but if someone, for whatever reason, in an English document needs to type a foreign character outside ASCII, such as a word in Cyrillic, or Mandarin, and can't, what's the good of making the spec Unicode, as opposed to ASCII compliant? I'd just want all the characters in all languages to be supported, but things like card symbols, or emoticons are okay not to support.

  49. Re:Mahjongg? by Anonymous Coward · · Score: 0
  50. Re:Stick to ASCII by neonsignal · · Score: 1

    The character entities in HTML are only to try to get around legacy encodings. And since you can specify numerical Unicode entities, all of the Unicode set is accessible, there is no need for explicit names for everything.

    If you aren't constrained to legacy encodings, then the obvious approach is just to set the encoding to something sensible, for example UTF8. There are several ways to do this in HTML. http://www.w3.org/TR/html5-diff/#character-encoding

  51. Re:Stick to ASCII by petermgreen · · Score: 1

    Specifying the "document character set" as unicode means that even if the charset you are writing your document in doesn't support the character you want you can still enter it as a numeric (or named if one is defined) entity, whether it will be displayed is mostly a matter of whether appropriate fonts are installed but generally i'd expect someone who writes Chinese to have Chinese fonts installed.

    Generally it's the GUI system's job to handle input and output of text not an individual application. Is it reasonable to expect browsers to ship a massive font full mostly of characters that most of it's users will either fond meaningless or have already? Is it reasonable to expect browsers to implement their own input methods in case the operating system's one is defficient? Is it reasonable to expect them to implement their own font rendering for the same reason?

    --
    note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  52. Re:Stick to ASCII by jrumney · · Score: 1

    95

  53. Re:Alternative proposal: by snowgirl · · Score: 1

    ??? WTF are _YOU_ on about?

    Can you concisely explain why the English word "psyche" is pronounced the way it is to a non-native speaker of the language?

    The word being originally from Greek and pronounced /psyxe/ was transliterated and taken into English. English phonology does not allow for a word to start with /ps/, and so the rules change that to a /s/. English phonology does not allow for a /x/, and so the rules change that to a "k". English phonology does not allow for a word to end with /e/, and so the rules change that to either a /ej/ or an /i/, but more more commonly /i/ (e.g. Japanese "sake" is typically pronounced /saki/). All that is left is the /y/ which also cannot occur in English phonology, and thus the rules treat it as if it were orthographically an "i", and then apply phonological rules based on this. Since the "i" would be long (CVCe rule) it is pronounced /aj/.

    Thus, after a whole bunch of interference from English phonology /psyxe/ comes out as /sajki/.

    Oh, you wanted a concise answer: "Because English can't pronounce 'psyche' properly, and fuck it up." The same way "keyboard" becomes "kiiboodo" in Japanese, and "Merry Christmas" becomes "Mele Kalikimaka" in Hawai'ian.

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  54. Re:Alternative proposal: by snowgirl · · Score: 1

    Can you explain why any word in french is pronounced the way it is?
    It seems like they have different rules for what letters to pronounce for every word.

    Actually, French orthography guarantees that if you know how something is spelled, then you can pronounce it, but if you only know how it is pronounced, then you cannot know how to spell it.

    So, while it might be difficult for some people learning the language, it is at least consistent in spelling to pronunciation (unlike English).

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  55. Re:Alternative proposal: by snowgirl · · Score: 1

    It's like someone was trying to come up with the most perverted way to make a letter represent something as different as possible from what it does in most European languages

    No, seriously... go look at Irish Gaelic spelling. You will be amazed at how much more unpredictable the system works, and how incredibly variant the letters are from the way that they're used by everyone else.

    Playing Exalted, I get a head-explosion every time someone talks about "geis" as /giz/ rather than as /geS/... (I am at least understanding of their inability to pronounce /J\/ properly, as I don't think I have any experience producing it properly either.)

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  56. Re:Alternative proposal: by GrandTeddyBearOfDoom · · Score: 1

    The classic in English is the ough ending. Count how many different ways it can be prounouced. (Oooh, Uff, Ow, Oh, Uh... any more? And I have no explantion for a non-native speaker as to how that came about.)

    --
    -- The Grand Teddy Bear has Spoken: "Windows 8 Source Code Available NOW! more disgusting than your pr..."
  57. Re:Alternative proposal: by Drinking+Bleach · · Score: 1

    It's not always so clear. When you see "resume" typed out, for example, don't you ever stop and have to think if it means resume or résumé?

    Two quite different words, even pronounced differently, that can be muddled by dropping the accents in text.

  58. Re:Alternative proposal: by DragonWriter · · Score: 1

    Drop the accents, people will know what you mean...

    I think you missed the first part of the sentence:

    ASCII leaves off a lot of English punctuation, and accents that are, in fact, used in English

    Standardizing on ASCII, even accents aside, would be insufficient for English. There's some punctuation used in English in the high end of Latin-1 (outside of the low-end which is ASCII), and even more in the Unicode general punctuation range (2000-206F).

  59. Re:Alternative proposal: by SuricouRaven · · Score: 1

    Such as? Is it that important to have your quote marks angled?

  60. Re:Alternative proposal: by DragonWriter · · Score: 1

    Such as?

    Well, you get one of the biggies on your own:

    Is it that important to have your quote marks angled?

    Sure, it greatly improves readability. Same thing with visual distinction between hyphens, various forms of dashes, and minus signs. There is a reason why professionally-published documents rarely restrict themselves to the subset of English punctuation supported by ASCII.

  61. Re:Alternative proposal: by Anonymous Coward · · Score: 0

    The difference is that the Japanese and Hawaiians actually write (the kana for) "kiiboodo" or "Mele Kalikimaka" according to their phonology of their own languages.