Slashdot Mirror


Google Releases An Open Source Font That Supports 800 Languages (googleblog.com)

An anonymous Slashdot reader quotes Hot Hardware: It's been working on the project over the past five years in collaboration with Monotype in hopes of eradicating so-called "tofu" -- the blank boxes you see when a PC or website can't display a particular text -- from the web. Noto, or No more tofu, is Google's answer, and it's available now to download...

"We are thrilled to have played such an important role in what has become one of the most significant type projects of all time," said Scott Landers, president and CEO of Monotype... Monotype played the biggest role, though Google also collaborated with Adobe and had a network of volunteer reviewers. As far as Monotype is concerned, Noto is one of the expansive typography projects ever undertaken.

There's 110,000 characters, and Google says the project "required design and technical testing in hundreds of languages."

7 of 175 comments (clear)

  1. "Now available to download" link by aneroid · · Score: 4, Informative

    https://www.google.com/get/not... You're welcome

    Came across this a few days ago when I borked my Slackware upgrade. Everything went fine except GUI login; X kept crashing because I deleted the fonts it was trying to use. One of the google search results was Noto.

    All fonts = 472.6 MB.

    1. Re:"Now available to download" link by aneroid · · Score: 4, Informative

      1. On the emjoi's fonts there's "Raised Hand With Part Between Middle And Ring Fingers" - WhyTF is that not called "live long and prosper"? Some fonts are described by how they look while others are described by what they mean. A bit inconsistent but I guess that's more of a Unicode consortium issue.

      2. Some of the hand emoji's like "White Left Pointing Backhand Index" are all called "white..." even though they've clearly done the race/skin tone colour spectrum ala whatsapp.

      2b. The colours are a second unicode code (emoji modifier sequence) on the emoji ranging from U+1F3FB (white/pale) to 1F3FF (black/dark). (Btw, that's counter intuitive to programmers since RGB colour codes have "#00" being dark and "#FF" being light.) P.S. I haven't decided if the skin colour aspect of emoji's is racist or not. There may be some people who found the default yellow emoji's racist.

      Answer to #2:
       

      Names of symbols such as BLACK MEDIUM SQUARE or WHITE MEDIUM SQUARE are not meant to indicate that the corresponding character must be presented in black or white, respectively; rather, the use of “black” and “white” in the names is generally just to contrast filled versus outline shapes, or a darker color fill versus a lighter color fill. Similarly, in other symbols such as the hands U+261A BLACK LEFT POINTING INDEX and U+261C WHITE LEFT POINTING INDEX, the words “white” and “black” also refer to outlined versus filled, and do not indicate skin color.

      and

      General-purpose emoji for people and body parts should also not be given overly specific images: the general recommendation is to be as neutral as possible regarding race, ethnicity, and gender. Thus for the character U+1F777 CONSTRUCTION WORKER, the recommendation is to use a neutral graphic like (with an orange skin tone) instead of an overly specific image like (with a light skin tone). This includes the emoji modifier base characters listed in Sample Emoji Modifier Bases. The emoji modifiers allow for variations in skin tone to be expressed.

    2. Re:"Now available to download" link by Qzukk · · Score: 4, Informative

      Way back when Unicode decided to unify all the CJK glyphs they made several screwups in unifiying characters that were not actually the same in each of the languages. Aside from the character looking wrong in Chinese or Japanese (whichever language you don't have installed as default) they may sort differently in different languages so collation is wrong too. More information (note that you'll need a full CJK font and a browser supporting language selection to see the differences).

      Noto's solution was to create a font with every possible glyph, then for systems which can't support identifying the correct glyph based on language, they made versions of the fonts where the default characters are the Japanese versions or the Chinese versions or so on, then for embedded stuff they made versions of the fonts with just one language's characters. Noto's explanation of their CJK fonts. In other words, you only need one of the 110MB font files.

      --
      If I have been able to see further than others, it is because I bought a pair of binoculars.
    3. Re:"Now available to download" link by Anonymous Coward · · Score: 2, Informative

      German and Swedish might be a better example.
      They both have ö and ä, but German orders ö like o and ä like a, while Swedish puts them after z.
      And those very much ARE the same characters.

  2. Re:Keeping up with the emojis by dmoen · · Score: 5, Informative

    Bitstream Cyberbit was closed source, and had a license incompatible with GPL. Noto is free and open source. The source files for the fonts, and the build tools, are all open.

    Noto is an ongoing open source project that will continue to track the Unicode standard, while Cyberbit implemented Unicode 1.0.1 and then just stopped.

    Noto has Sans and Serif variants in a range of weights and styles, unlike Cyberbit, which had only a single style and weight (serif).

    So that's more than just "the same thing all over again".

    --
    I have written a truly remarkable program which this sig is too small to contain.
  3. Re:Repairing the Unicode Consortium Clusterfuck by AmiMoJo · · Score: 4, Informative

    It's even worse than that. On many systems, e.g. Windows, w_char is defined as 16 bits, meaning it can only ever support the Unicode Basic Multilingual Plane without hacks. Since a lot of the fixed CJK characters are outside this plane, software that uses w_char usually doesn't support them. Some of this is baked into hardware, for example Unicode uses UTF16,

    I'm seriously thinking about writing an open source library to support TRON encoding. The lack of a good alternative seems to be what is preventing Unicode from being deprecated in favour of something better.

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  4. Re:Keeping up with the emojis by AmiMoJo · · Score: 3, Informative

    There are still multiple font files for different languages, because you can't have a unified "all language" font with Unicode. It's impossible to support Chinese, Japanese and Korean in the same font, for example.

    Android's font rendering is excellent, has been for years. It also helps that many Android phones, even mid range ones from a few years back, have 1080p or better displays that start to rival print for DPI (400-500 PPI on the screen, 3x that horizontally with sub-pixel rendering, vs. 600 DPI for prints).

    Google just want consistency everywhere and the ability to ship one font that covers all possible languages. You still need hacks because of the Unicode flaw mentioned above, but it's a big step none the less. AFAIK the only other open source font that tries to do this is GNU Unifont, but it's more functional that pretty.

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC