Google Releases An Open Source Font That Supports 800 Languages (googleblog.com)

← Back to Stories (view on slashdot.org)

Google Releases An Open Source Font That Supports 800 Languages (googleblog.com)

Posted by EditorDavid on Sunday October 9, 2016 @01:40PM from the Unicode-complete dept.

An anonymous Slashdot reader quotes Hot Hardware: It's been working on the project over the past five years in collaboration with Monotype in hopes of eradicating so-called "tofu" -- the blank boxes you see when a PC or website can't display a particular text -- from the web. Noto, or No more tofu, is Google's answer, and it's available now to download...

"We are thrilled to have played such an important role in what has become one of the most significant type projects of all time," said Scott Landers, president and CEO of Monotype... Monotype played the biggest role, though Google also collaborated with Adobe and had a network of volunteer reviewers. As far as Monotype is concerned, Noto is one of the expansive typography projects ever undertaken.
There's 110,000 characters, and Google says the project "required design and technical testing in hundreds of languages."

27 of 175 comments (clear)

Min score:

Reason:

Sort:

"Now available to download" link by aneroid · 2016-10-09 13:59 · Score: 4, Informative

https://www.google.com/get/not... You're welcome
Came across this a few days ago when I borked my Slackware upgrade. Everything went fine except GUI login; X kept crashing because I deleted the fonts it was trying to use. One of the google search results was Noto.
All fonts = 472.6 MB.
1. Re:"Now available to download" link by aneroid · 2016-10-09 14:42 · Score: 4, Informative
  
  1. On the emjoi's fonts there's "Raised Hand With Part Between Middle And Ring Fingers" - WhyTF is that not called "live long and prosper"? Some fonts are described by how they look while others are described by what they mean. A bit inconsistent but I guess that's more of a Unicode consortium issue.
  2. Some of the hand emoji's like "White Left Pointing Backhand Index" are all called "white..." even though they've clearly done the race/skin tone colour spectrum ala whatsapp.
  2b. The colours are a second unicode code (emoji modifier sequence) on the emoji ranging from U+1F3FB (white/pale) to 1F3FF (black/dark). (Btw, that's counter intuitive to programmers since RGB colour codes have "#00" being dark and "#FF" being light.) P.S. I haven't decided if the skin colour aspect of emoji's is racist or not. There may be some people who found the default yellow emoji's racist.
  Answer to #2:
  
  Names of symbols such as BLACK MEDIUM SQUARE or WHITE MEDIUM SQUARE are not meant to indicate that the corresponding character must be presented in black or white, respectively; rather, the use of “black” and “white” in the names is generally just to contrast filled versus outline shapes, or a darker color fill versus a lighter color fill. Similarly, in other symbols such as the hands U+261A BLACK LEFT POINTING INDEX and U+261C WHITE LEFT POINTING INDEX, the words “white” and “black” also refer to outlined versus filled, and do not indicate skin color.
  and
  
  General-purpose emoji for people and body parts should also not be given overly specific images: the general recommendation is to be as neutral as possible regarding race, ethnicity, and gender. Thus for the character U+1F777 CONSTRUCTION WORKER, the recommendation is to use a neutral graphic like (with an orange skin tone) instead of an overly specific image like (with a light skin tone). This includes the emoji modifier base characters listed in Sample Emoji Modifier Bases. The emoji modifiers allow for variations in skin tone to be expressed.
2. Re:"Now available to download" link by Qzukk · 2016-10-09 14:55 · Score: 4, Informative
  
  Way back when Unicode decided to unify all the CJK glyphs they made several screwups in unifiying characters that were not actually the same in each of the languages. Aside from the character looking wrong in Chinese or Japanese (whichever language you don't have installed as default) they may sort differently in different languages so collation is wrong too. More information (note that you'll need a full CJK font and a browser supporting language selection to see the differences).
  Noto's solution was to create a font with every possible glyph, then for systems which can't support identifying the correct glyph based on language, they made versions of the fonts where the default characters are the Japanese versions or the Chinese versions or so on, then for embedded stuff they made versions of the fonts with just one language's characters. Noto's explanation of their CJK fonts. In other words, you only need one of the 110MB font files.
  
  --
  If I have been able to see further than others, it is because I bought a pair of binoculars.
3. Re:"Now available to download" link by ptaff · 2016-10-09 15:32 · Score: 3, Insightful
  
  Google Web Fonts is still the way to go.
  And helps Google track users one more way. Please be a good hacker and serve fonts from your own domain. Thank you.
4. Re:"Now available to download" link by Travis+Mansbridge · 2016-10-09 15:48 · Score: 2
  
  In HTML5 you can serve fonts, so it's just a matter of including Noto on sites where tofu might be a problem.
5. Re:"Now available to download" link by _merlin · 2016-10-09 16:57 · Score: 3, Interesting
  
  Yeah, but it's like "90% of people use 10% of features" - everyone uses a different 10%, so 100% of features are used. Similarly, everyone needs a different combination of languages, so if you're going to use one family of fonts, you want to have massive coverage.
6. Re: "Now available to download" link by TheRaven64 · 2016-10-09 20:20 · Score: 4, Insightful
  
  It's not always laziness (or tracking, from Google's perspective). Google sets a long cache value for most of these resources. If 10 different sites all host them individually, then someone visiting the site will have to download the fonts 10 times. Alternatively, if they all point to Google then they'll download once and cache the copy locally for the other 9 sites.
  There was a proposal a couple of years ago to embed a cryptographic hash of the resource in the link. This would allow you to specify a download location, but if you've already downloaded the file from another source then you could still use it (it would also make caches more efficient, because you could set an infinite timeout and make clients redownload by having a different hash in the link - clients would keep their copy potentially forever, until you updated the version). I don't know of any browsers that implemented it though.
  
  --
  I am TheRaven on Soylent News
7. Re:"Now available to download" link by Anonymous Coward · 2016-10-09 20:59 · Score: 2, Informative
  
  German and Swedish might be a better example.
  They both have ö and ä, but German orders ö like o and ä like a, while Swedish puts them after z.
  And those very much ARE the same characters.
Re: Keeping up with the emojis by Anonymous Coward · 2016-10-09 14:30 · Score: 2, Funny

I just need the Klingon word for mocking condescension to belittle you with.
Re:Keeping up with the emojis by dmoen · 2016-10-09 14:33 · Score: 5, Informative

Bitstream Cyberbit was closed source, and had a license incompatible with GPL. Noto is free and open source. The source files for the fonts, and the build tools, are all open.
Noto is an ongoing open source project that will continue to track the Unicode standard, while Cyberbit implemented Unicode 1.0.1 and then just stopped.
Noto has Sans and Serif variants in a range of weights and styles, unlike Cyberbit, which had only a single style and weight (serif).
So that's more than just "the same thing all over again".

--
I have written a truly remarkable program which this sig is too small to contain.
Re:Keeping up with the emojis by Anonymous Coward · 2016-10-09 15:14 · Score: 4, Interesting

Hate to say it but I consider the conversion of all emojis to tofu a feature, not a bug. The tofu neatly summarises the vacuousness of the original abomination... I mean, message.
This should have been put together by Unicode by complete+loony · 2016-10-09 15:44 · Score: 5, Insightful

The Unicode consortium should have published glyphs like these as part of the effort of defining the standard.
Why did it take a separate private company to do this?

--
09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
1. Re:This should have been put together by Unicode by speedplane · 2016-10-09 18:18 · Score: 2
  
  The Unicode consortium should have published glyphs like these as part of the effort of defining the standard.
  Why did it take a separate private company to do this?
  Probably because building a consortium to even define the characters is hard enough and expensive. Getting buy-in from everyone in the consortium to develop high quality glyphs for orphan languages would have reduced overall support. I agree they should have, but I don't think most company's are as generous as Google.
  
  --
  Fast Federal Court and I.T.C. updates
2. Re:This should have been put together by Unicode by TheRaven64 · 2016-10-09 21:26 · Score: 2
  
  The entire point of unicode is that the glyphs are separate from the codepoints. The codepoints (defined by the unicode spec) convey semantics, not presentation. There are lots of different (valid) ways of representing each codepoint (if there weren't, then you wouldn't need fonts at all).
  Then along came emojis and the entire clusterfuck that led to.
  
  --
  I am TheRaven on Soylent News
No programmers' typeface by tdelaney · 2016-10-09 15:51 · Score: 4, Insightful

They have a monospaced typeface, but it's not useable for programming - doesn't even have a significant distinction between zero and O, let alone any other programmer-friendly features.
Since I presume they're going to want people at Google to use Noto as standard, it seems sensible to me that they create a programmers' version.
1. Re:No programmers' typeface by Anonymous Coward · 2016-10-09 16:32 · Score: 2, Insightful
  
  I don't see why distinguishing between the zero digit and the letter O is more important for programmers than for anyone else. Sure, programmers might make mistakes when writing code and want to fix them; but that's true for other people writing text that might contain digits and letters, too.
  If anything, distinguishing between the characters is less important for programmers than other people because programmers will already notice the problem when their code won't compile. I think it is very probable not distinguishing the zero digit and the letter O was a deliberate design decision, and I doubt distinguishing between letters is as important as programmers seem to think it is.
2. Re:No programmers' typeface by Hypoon · 2016-10-09 17:43 · Score: 4, Insightful
  
  ...because programmers will already notice the problem when their code won't compile.
  Substitutions of the letter 'O' for the number zero in numeric literals, function names, variable names, and other similar constructs will usually generate syntax errors, yes. (This makes me want to create a library called "Input0utput", just for headaches.)
  However, the compiler probably won't notice if you make the substitution within a string or character literal (if the user types "Outbound", but the software is expecting "0utbound", this might be a hard problem to debug). I've only done this once or twice, but it was infuriating. It's one of those few times when commenting out the line and retyping it verbatim will actually fix the problem.
  The fact that the keys are adjacent on QWERTY keyboards doesn't help anything.
  
  ...but that's true for other people writing text that might contain digits and letters, too.
  I misunderstood this at first. I was picturing something like, "Mr. Orville's appointment is at 1O:OO.", where the substitution is harmless, so I didn't understand. In something like a model number, "MSO001" might be the first (001) release of a Mixed Signal Oscilloscope (MSO). Writing it as "MSOOO1" definitely obfuscates the meaning behind the model number. Of course, "MSO-001" would probably be best, but it's preferable to match the label on the hardware itself. So yes, I see your point.
  But no, I'm firmly of the belief that the average programmer has a greater need (than the average typist) for easily distinguishable characters.
3. Re:No programmers' typeface by Nethead · 2016-10-09 19:01 · Score: 3, Insightful
  
  Where I find the problem is in randomly generated passwords. I have a large spreadsheet of VPN passwords for users at work that I had to change the the password column to an OCR font just to make sure I was giving out the correct code.
  The original C64 had this issue which was worse on the SX64 with its 5" screen. I went as far as to design a custom font and burn it into the font EPROM.
  
  --
  -- I have a private email server in my basement.
4. Re:No programmers' typeface by UberVegeta · 2016-10-09 20:38 · Score: 2
  
  00 1 - oh oh one. (Don't know why we don't say double oh but I've never heard it said that way.)
  You mean in the same way that nobody says "double oh seven?"
  
  --
  I knew I needed to stop reading Slashdot and finish my PhD when I started to miss articles by Bennett Haselton.
Re:Keeping up with the emojis by DraconPern · 2016-10-09 16:33 · Score: 2, Insightful

I think it's more, this is all the glyph in one font, where as before, you had Chinese, Arabic etc. all in separate fonts. The other half the problem google had was that they didn't have good font rendering in Android, e.g. how you actually render the font. Microsoft, Apple, and Adobe had it figured out a long time ago and all that knowledge is part of the OS. So google is basically just playing catch up and open sourcing the data part. Also... do we really want to load that large of a font when most people only use a fraction of the data?
Repairing the Unicode Consortium Clusterfuck by Anonymous Coward · 2016-10-09 17:32 · Score: 5, Interesting

Thank you Google! This is badly needed because the Unicode Consortium screwed up Asian language support badly. The problem started when a bunch of Silicon Valley WASPS got together and formed the Unicode Consortium. Their experts were a joke. They had a foreign language expert who by his own admission couldn't speak the language he was supposedly expert it.

Then without consulting Asian language speakers they decided to combine all the Asian language characters - including those that were physically different.The result was like some elitist looking at the Greek and Roman alphabets and deciding 'a' is a lot like alpha, 'b' a lot like beta, so why not comine the two of them into a single alphabet, then tell you your name isn't Sam, it's "S". (Slashdot probably won't display this but you get the idea.) This affected eastern and central and south east asian languages.

This created the absurd situation where some people couldn't even spell write their names or enter them into databases prompting the famous "I Can Text You A Pile of Poo, But I Can't Write My Name" https://modelviewculture.com/p...

When it was pointed out did the Unicode Consortium admit they fucked up and fix it? Nope. They dug in their heels and insisted each country produce their own font which would display each Unicode character differently to suit their own language. Given the original goals of Unicode this was an amazing backflip. https://en.wikipedia.org/wiki/... https://books.google.com/books... https://plus.google.com/+LizHa... There are other problems too: The encoding the consortium expected makes asian codepages use more space than the standards they were supposed to replace. This was stupid since ASCII was already super efficient for English language, so what was the point?

If you only write English language software and ASCII is good enough you won't notice any of this but if you have to write International software it's a nightmare. Yes, you might think adding Unicode support allows any your app to run in any language, but it doesn't work like that because of this clusterfuck. You still have to provide different fonts for different countries, and you often have to provide support for old codepages (the various BIG5 variants) for fallback which Unicode was supposed to replace. It also makes translation very hard.

But Unicode fixed it eventually? Nope. The Unicode consortium continued to ignore it to this very day and instead started churning out stupid emoji: a steaming pile of poo, a taco, and farcical 'equality' emoticons. https://www.theguardian.com/te... https://www.theguardian.com/ar...

I hope this new font gives us one font which can display all languages and fuck the Unicode Consortium
1. Re:Repairing the Unicode Consortium Clusterfuck by KozmoStevnNaut · 2016-10-09 21:00 · Score: 2
  
  I've been using the Noto font(s) for a while, they're installed by default in Linux Mint (probably Ubuntu and others, too), so I assume this is an incremental release, where they've finally achieved some semblance of full(ish) coverage.
  While I have a couple of minor issues with the fonts design (the lowercase 'm' and 0/O distinction in Noto Mono are atrocious), the font is quite nice on the whole. And while I will never personally use all of the myriads of different scripts included, I whole-heartedly applaud the effort taken to produce a font family that finally covers East Asian languages in a sensible way. I have many colleagues from India (specifically Bengal) and China. It has been a real shitshow for them how the Unicode Consortium first completely neglected and then mishandled their languages.
  We can blame Google for a great many things, but Noto is one thing they definitely got right, and I hope they continue to evolve and refine it, perhaps fix the small font design annoyances, even though they're relatively minor for what is an absolutely huge project.
  
  --
  Eat the rich.
2. Re:Repairing the Unicode Consortium Clusterfuck by AmiMoJo · 2016-10-09 21:44 · Score: 4, Informative
  
  It's even worse than that. On many systems, e.g. Windows, w_char is defined as 16 bits, meaning it can only ever support the Unicode Basic Multilingual Plane without hacks. Since a lot of the fixed CJK characters are outside this plane, software that uses w_char usually doesn't support them. Some of this is baked into hardware, for example Unicode uses UTF16,
  I'm seriously thinking about writing an open source library to support TRON encoding. The lack of a good alternative seems to be what is preventing Unicode from being deprecated in favour of something better.
  
  --
  const int one = 65536; (Silvermoon, Texture.cs)
  SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Re: Keeping up with the emojis by Guppy · 2016-10-09 19:36 · Score: 4, Funny

toDSaH
Wow, Klingons have a word for everything. They're like Space Germans.
Re: Keeping up with the emojis by Ash-Fox · 2016-10-09 19:58 · Score: 2

You just wrote it in English though, your point is invalid.

--
Change is certain; progress is not obligatory.
hells teeth by johnjones · 2016-10-09 22:12 · Score: 3, Interesting

honestly
where is the mathematical fonts and symbols for science ?
STIX goes some way but why this is not in noto ?
why would you send a mathematical explanation into the stars but we cant express those notations on machines we use every day ?
thanks
John Jones
Re:Keeping up with the emojis by AmiMoJo · 2016-10-09 22:58 · Score: 3, Informative

There are still multiple font files for different languages, because you can't have a unified "all language" font with Unicode. It's impossible to support Chinese, Japanese and Korean in the same font, for example.
Android's font rendering is excellent, has been for years. It also helps that many Android phones, even mid range ones from a few years back, have 1080p or better displays that start to rival print for DPI (400-500 PPI on the screen, 3x that horizontally with sub-pixel rendering, vs. 600 DPI for prints).
Google just want consistency everywhere and the ability to ship one font that covers all possible languages. You still need hacks because of the Unicode flaw mentioned above, but it's a big step none the less. AFAIK the only other open source font that tries to do this is GNU Unifont, but it's more functional that pretty.

--
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC