Gmail Now Rejects Emails With Misleading Combinations of Unicode Characters

FÜÇK ÿèàh by Anonymous Coward · 2014-08-12 09:32 · Score: 5, Funny

...

Re:FÜÇK ÿèàh by Wootery · 2014-08-13 01:41 · Score: 1

Could've sworn Slashdot had zero support for unicode characters.
(I appear to be unable to paste in a 'Trademark' symbol. What is this magic, AC?!)
Re:FÜÇK ÿèàh by Eunuchswear · 2014-08-13 05:40 · Score: 1

iso-8859-1
Slashdot supports the first 256 unicode characters (except some of C0 and C1)

--
Watch this Heartland Institute video

Good that this applies to from: and not the body by CRCulver · 2014-08-12 09:33 · Score: 4, Interesting

...of the e-mail. Any attempt to block spam or phising on the basis of mixing character sets would have to confront the fact that some people do need to mix character sets. Typically representations of Mari in the Latin alphabet, for example, also make use of the Greek letters beta and eta. In fact, eta is used in Latin representations of several minority languages of Russia. And the Reddit crowd loves making weird smilies in their English-language writing by means of symbols drawn from Indian scripts.

Re:Good that this applies to from: and not the bod by Russ1642 · 2014-08-12 09:36 · Score: 3, Funny

If this spells death to those ridiculous smilies then it's ok with me.

Re:Good that this applies to from: and not the bod by mi · 2014-08-12 09:37 · Score: 3, Interesting

I routinely substitute Cyrillic letters for Latin on Disqus and other forums to get around their filters (which block for more than mere "profanity").

Slashdot does not allow non-ASCII characters — although it does not attempt to screen out profanity either.

--
In Soviet Washington the swamp drains you.

Homoglyph protection at last, sort of. by Animats · 2014-08-12 09:38 · Score: 1

OK, good. Now if ICANN applied that tougher standard to domain name registrars, we'd make progress. But no, ICANN still allows registrars to register domain names without forcing them to comply with the most restrictive profile.

all of them then? by hurfy · 2014-08-12 09:38 · Score: 1

This looks like fun, I probably wouldn't catch that bank example and family certainly wouldn't. Looks like pretty much any word could substitute one letter.

No idea exactly what these "combinations" are. The example used one letter substitution. Using this example and the little display of new letters there would appear to be billions of potentially misleading combinations.

Re:all of them then? by TheGavster · 2014-08-12 09:58 · Score: 2

The "restrictive profile" that Google is using for the filtering is defined in Unicode as any combination of the Latin character set with another set or sets, with the exception of very specific combinations (selected legitimate combinations of Asian sets that contain radically different letter forms and thus are unlikely to cause confusion).

--
"Because Science" is one step from "Because old book". Try "Because of my experiment testing my falsifiable assertion".
Re:all of them then? by godrik · 2014-08-13 03:44 · Score: 1

I'd like to see the precise rules (but too lazy to RTFA now). There are many non-english words that can be highly confusing. In french "telephone" is "tÃ©lÃ©phone" which could be though as a way to trick users. Also turkish have a dotless i, I would not be surprised it appears in words with similar spelling in english.
Re:all of them then? by Eunuchswear · 2014-08-13 05:44 · Score: 1

ITYM téléphone.

--
Watch this Heartland Institute video

Sounds bad by Anonymous Coward · 2014-08-12 09:38 · Score: 2, Insightful

If I start a business with a unicode domain, and if later a scammer registers an ascii domain that is similar looking, then Gmail will blackhole my business, not the scammer, because I'm the one using unicode.

Re:Sounds bad by Anonymous Coward · 2014-08-12 09:45 · Score: 0

I was also thinking about private people buying look-alike domains because "their" name was taken. especially for people wanting their own name or cool nick as email address.
Re:Sounds bad by Immerman · 2014-08-12 10:28 · Score: 1

Probably a bad idea - what exactly is the legitimate point of having a cool web address if *everyone* will *always* mistype it and go to the original site instead?

--
--- Most topics have many sides worth arguing, allow me to take one opposite you.
Re:Sounds bad by Anonymous Coward · 2014-08-12 13:42 · Score: 0

Probably a bad idea - what exactly is the legitimate point of having a cool web address if *everyone* will *always* mistype it and go to the original site instead?
no, it's just like yahoo.com except you have to use ALT + 224 for the a. What could be easier to remember than that?
Re:Sounds bad by Desolation+Row · 2014-08-12 14:18 · Score: 1

In the Russian borrowed word "radio", the Cyrillic characters a and o look identical to the same English letters (the rest are completely different).

The Russian word "radio" should be in (the specific Russian Cyrillic subset of) Unicode/UTF, while the English word "radio" should be in Unicode/ASCII.

Mixing and matching character sets in URLs or email address typically indicates "intent to confuse". Within text, it usually just confuses translators and spell checkers.
Re: Sounds bad by Anonymous Coward · 2014-08-14 19:02 · Score: 0

I've read your comment several times and still don't understand.
First ÑÐÐÐÐ¾ would be allowed as well as radio, but paÐÐo etc. would be disallowed.
Could you simplify and restate your point?

whack-a-mole 3.0 by Anonymous Coward · 2014-08-12 09:40 · Score: 2, Insightful

And the latest round of whack-a-mole begins...

This is going to do... by sudden.zero · 2014-08-12 09:40 · Score: 1

...absolutely nothing! The scammers will just find some other way to create their automated email garbage.

Re:This is going to do... by pla · 2014-08-12 10:54 · Score: 1

This is going to do...absolutely nothing! The scammers will just find some other way to create their automated email garbage.

You kidding? Thanks to allowing these new email addresses, I have an entirely new category of auto-deletable spam. These won't "confuse" me because I'll never see them. Win/Win!

Go ahead, spammers, get cute. Just makes my life thaaat much easier.
Re:This is going to do... by AHuxley · 2014-08-12 11:22 · Score: 1

It looks after working with ads in English.
It looks after other interested parties looking for expected keywords.

--
Domestic spying is now "Benign Information Gathering"

&^*308cbpBO)780i76D$^*.//.we0-fw by pigiron · 2014-08-12 09:40 · Score: 0

q898(^*$*EUIDXEZ{Pm;vd80eGUIOIO:>P{
{}.

det6767ir6768P)I*)&%B(()_}K>?YIBV$WCJ!!!!!

Re:&^*308cbpBO)780i76D$^*.//.we0-fw by sudden.zero · 2014-08-12 09:44 · Score: 1

Ah but did you use Unicode to make those characters?
Re:&^*308cbpBO)780i76D$^*.//.we0-fw by pigiron · 2014-08-12 09:52 · Score: 1

ÂÂâ¥
Ã¥â(TM)â(TM)âs--ðYfâSâ±âââOEâSoeâ...â'ââoe

Re:Good that this applies to from: and not the bod by tlhIngan · 2014-08-12 09:43 · Score: 1

...of the e-mail. Any attempt to block spam or phising on the basis of mixing character sets would have to confront the fact that some people do need to mix character sets. Typically representations of Mari in the Latin alphabet, for example, also make use of the Greek letters beta and eta. In fact, eta is used in Latin representations of several minority languages of Russia. And the Reddit crowd loves making weird smilies in their English-language writing by means of symbols drawn from Indian scripts.

Or perhaps more practically, needing to send email with multiple translations in them. Either as a courtesy to your audience who may speak English or French, or German, and you're not quite sure which they're more comfortable with. So you send your email with all three languages in it.

North American based companies may do English, French and Spanish in their email.

Though perhaps one area where they could block in the body is in HTML tags - if there's a restricted character in a link, perhaps that's a reason to block.

Re:Good that this applies to from: and not the bod by TubeSteak · 2014-08-12 09:47 · Score: 1

Good that this applies to from: and not the body of the e-mail.

That's not at all good and filtering the body exactly what I want.
Spammers already spoof the from: domain and then link you out to exactly the type of domain that Gmail is now filtering.

There's no reason Gmail can't flag [body] links to domains that use mixed character sets.

--
[Fuck Beta]
o0t!

Finally! by nospam007 · 2014-08-12 09:48 · Score: 1

Damn, now i see it's just domains, i tought they killed all my german and french spammers.

Re:Finally! by Anonymous Coward · 2014-08-12 09:56 · Score: 0

That would be murder
Re:Finally! by Anonymous Coward · 2014-08-12 10:18 · Score: 0

He said "spammers". Murder victims have to be human.
Re:Finally! by Drumhellar · 2014-08-12 10:28 · Score: 1

At worst, it's illegal dumping.
Re:Finally! by Anonymous Coward · 2014-08-12 11:10 · Score: 0

i tought they killed all my german and french spammers
I tought I taw a puddy tat.
Re:Finally! by Anonymous Coward · 2014-08-13 02:40 · Score: 0

Untrue. Go kill a police dog and they'll put you away for life.
Re:Finally! by Anonymous Coward · 2014-08-13 03:05 · Score: 0

Romulan...

Why are we still blocking spam ? by Anonymous Coward · 2014-08-12 09:56 · Score: 3, Interesting

90% of the population would be better off with a white listed email account, i.e. if you are not on their list the email does not get through. END OF STORY.

I would seem to be more efficient to filter mail IN than to filter it out. Most people would have 20 or so people they actually want mail from.
I have mail accounts strictly for family and my local email rules enforce this
I have mail accounts for "sign up" sessions for competitions that I know are going to get spammed to hell
I have mail account for work, another for my business , etc etc all with differing contacts.

White listing would pretty much kill off spam, if there is zero chance of it getting though, what is the point. Currently spammers get through because of out dated spam lists, new tricks to get around baynesian filters, etc etc etc. White lists would negate the need.

Google, if you set up a white listed email system, my friends and family will happily sign up.

Re:Why are we still blocking spam ? by Dutch+Gun · 2014-08-12 14:58 · Score: 1

E-mail authentication seems like a better solution than whitelisting in the long term. Whitelisting can kill off spam, but that's sort of like saying you can fix a broken arm by amputation. It's technically true, but removes a lot of useful functionality.
The big problem with e-mail spam is that the e-mail sender can be trivially forged. If we employed ubiquitous authentication systems that proved a specific domain was used, and blocked non-authenticated users (or at the very least, flag them with a big warning), it would go a long way to solving the spam problem. Moreover, if a particular domain is repeatedly being used by spammers or scammers, that can provide additional heuristic information to the filters.
Unfortunately, there are too many competing authenticating standards and (presumably) far too much legacy code that would be broken by moving to such a system. Given the ridiculous amounts of spamming and scamming going on by e-mail, it really seems like it would be worth the short-term pain to buckle down and select a single, robust solution, and block anything that doesn't use it.
The world just isn't the same when the SMTP protocol was invented. It's ridiculous, not to mention slightly worrisome, that the only way we can practically use e-mail is if the combined technical might of Google or some other large enterprise helps us to filter out 99% of the crap so we can view the 1% that isn't.

--
Irony: Agile development has too much intertia to be abandoned now.
Re:Why are we still blocking spam ? by Anonymous Coward · 2014-08-12 16:03 · Score: 0

Think of white listing like giving people access to your house.
A few people will get keys
If you are not there it is locked and therefore you need a key to get in (a white list)
If you are there you can admit other people, the choice is yours, however it is simply a white list with 1 time admittance.
(yes I know about buglers , however that does not change the point being made)
So with a white list, even if someone hacks the system to send spam, it will most likely only go to 20-30 people, a lot of effort for little gain
Currently spambots send out billions of spams, this could not easily happen with a white list that authenticates the sender. And the easy solution is to remove them from the white list until the fix their machine. If a secure web page with 2 factor authentication is used to modify the whitelist with the ISP then adding/changing the whitelist is also difficult.
Spam would effectively die a natural death because with so few people getting any, and so difficult to send it to anyone the money chain would break. However
Nigerian scams... gone
Penis enlarger spams ....gone
Fake Pharmacies ..... gone
penny stock scams .... gone
fake invoice scams .... gone
phishing scams ..... severely damaged
What the internet needs is a Unix like approach where security and user separation is built in as opposed to the security as an ad-on windows approach. One is much harder to break into than the other, neither is perfect, but one is definitely a better first choice.
Re:Why are we still blocking spam ? by Anonymous Coward · 2014-08-12 18:46 · Score: 0

90% of people don't have the time to manage a whitelist. 9% can't afford the risk of a class positive. Congratulations! You finally made it to the 1%!!
Re:Why are we still blocking spam ? by IamTheRealMike · 2014-08-12 21:28 · Score: 1

Google, if you set up a white listed email system, my friends and family will happily sign up.
They already happily sign up. Gmail is the largest email provider in the world.
BTW the Gmail spam filter, like any good one, does have per-user whitelists. If you reply to mail or mark mail from a sender as not spam, the filter will leave mail from those senders alone (modulo caveats like the sender properly authenticating). Thus the filter spends almost all of its effort on email from senders you haven't interacted with, like, for example, the password reset mail from the website you used 3 years ago and forgot how to log in. You wouldn't want to lose those, would you?
Re:Why are we still blocking spam ? by AmiMoJo · 2014-08-13 00:52 · Score: 1

A whitelist would break site sign-up and password reset emails. You could never whitelist every legit site as hundreds are launched every day. Users will never figure out how to add sites to their whitelists before signing up, and can barely cope with such emails ending up in their spam folders.
Having said that, gmail filters 99.9% of spam for me, and I can tolerate hitting delete for the 1 in 1000 that gets though.

--
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC

Don't use Unicode for network stuff by Anonymous Coward · 2014-08-12 09:57 · Score: 1

If you use Unicode for domains, addresses, certificates and whatnot you are begging for an endless cascade of support problems and glitches, not to mention security vulnerabilities. Let others exercise all these broken codes paths for you while you avoid the fail. Eventually, after most of the broken code gets cycled out of use, many years from now, you may then safely allow this stuff into real systems.

Unicode breaks all sorts of stuff in subtle and unfixed ways. A fine example from a widely used Microsoft system (W2K8 R2 SP1, three years old) is this gem: http://support.microsoft.com/kb/2597665; IIS can't handle Unicode attributes in x509 certs. You have to "hotfix" that broken OS to deal with Unicode.

Just leave it be another decade or so, if you can.

For those of you frothing at the mouth to write "BUT BUT I HAVE TOO!!!!1" re-read the end of that last sentence over and over till it sinks in; not everyone can avoid dealing with this. My sympathies. I'm writing for those that can.

Re:Don't use Unicode for network stuff by Immerman · 2014-08-12 10:39 · Score: 1

But why would anyone waste resources properly fixing a bug that doesn't affect anyone? The only way these things will get fixed properly, is if they start causing a lot of problems. And the only way they'll cause problems is if people start using them.
Meanwhile, why should most of the world's population have to deal with an internet incapable of handling addresses in their language? How would you like it if you woke up tomorrow to discover that all web addresses could only be written in Arabic? The Web may have been invented in the US, but it belongs to the world now.

--
--- Most topics have many sides worth arguing, allow me to take one opposite you.
Re:Don't use Unicode for network stuff by Anonymous Coward · 2014-08-12 11:21 · Score: 0

why should most of the world's population have to deal with an internet incapable of handling addresses in their language?
The world has been using the internet for close to two decades without domains and email addresses in anything but ASCII. It works, and that is more important than vanity. The web was invented in Switzerland (not in the US), but if computers and the internet had been invented in the Middle East, we would all have learned enough Arabic to use them. Addresses must be interoperable for a world wide network to function. The content may well be in any number of native languages and scripts, but the addresses must be simple, and Unicode isn't. It is better for all to learn simple addresses than for all to learn complex addresses.

Slippery slope by Dishwasha · 2014-08-12 09:59 · Score: 1

As much as I can appreciate the intent and the fact that this will solve 99.999% of people's problems for this type of spamming and create 00.0000000001% of problems for legitimate users, it still feels a little like Google is trying to be the thought police on this one; you know free speech and all.

... because we make new friends by Anonymous Coward · 2014-08-12 10:04 · Score: 1

Seriously,
most filters are now "very good". And, I make new acquaintenances, connections and friends. They have new email addresses that aren't in the whitelist. But, the filters pretty much just work.

Re:... because we make new friends by Anonymous Coward · 2014-08-12 11:29 · Score: 1

One way you could make whitelists work is to have a "secret handshake", a word that you require in the subject of mail from addresses that aren't whitelisted yet. You would regularly change that word and give it to new acquaintances along with your email address.
The problem with the whitelist approach is something else: A lot of spam already pretends to be from someone you know. Spammers don't just collect individual email addresses anymore. They collect email address pairs: Who knows who.
Re:... because we make new friends by Anonymous Coward · 2014-08-12 12:17 · Score: 0

Its easy to add then to a white list
Re:... because we make new friends by Anonymous Coward · 2014-08-12 14:10 · Score: 0

Yes, but a white list would help stop the morons from clicking on the link of "xxxyyyy-celebrity-nude.exe"
A whitelist that also confirms where it was actually sent from would also prevent that form of abuse.

More generally by SigmundFloyd · 2014-08-12 10:07 · Score: 1

IME, Gmail is rejecting a lot of legitimate mail nowadays.

Their filters used to be good, but they completely fucked it up lately.

--
Knowledge is power; knowledge shared is power lost.

Re:Good that this applies to from: and not the bod by Anonymous Coward · 2014-08-12 10:21 · Score: 0

Unfortunately those aren't likely to be mistaken for latin characters... They probably get a free pass.

non Latin characters? by Anonymous Coward · 2014-08-12 10:22 · Score: 0

I never did see a domain with non-Latin characters in spam. I have seen Russian, Chinese or Japanese text in the body and subject line.

Re:Good that this applies to from: and not the bod by Ichijo · 2014-08-12 10:25 · Score: 2

Slashdot does not allow non-ASCII characters...

...unless they're in code page 1252.

--
Any sufficiently unpopular but cohesive argument is indistinguishable from trolling.

But how... by Anonymous Coward · 2014-08-12 10:26 · Score: 0

will I talk to ZALGO!

Al by jones_supa · 2014-08-12 10:31 · Score: 1

As an interesting background fact, I heard that Google has an advanced Al doing all this stuff completely autonomously.

His real name is Albert, by the way.

Re:Good that this applies to from: and not the bod by ericloewe · 2014-08-12 10:37 · Score: 1

They're reason enough for me to almost believe whoever designed ASCII was a genius.

Unicode for addresses is a bad idea by Anonymous Coward · 2014-08-12 10:39 · Score: 0

Addresses should be simple and easy to learn and transmit over as many means of transport as possible. We had a working world-wide de-facto standard: 7-bit ASCII. Sure, there were no accented letters, no support for Asian scripts, etc., but it worked. Addresses are infrastructure. You can send anything you want as content. If you need to write Hindi in an email, then do so. That should not require all mail masters to upgrade their software to handle Hindi.

(I write this as someone whose native language has letters beyond ASCII.)

GMail doesn't take everything... by The+New+Guy+2.0 · 2014-08-12 10:50 · Score: 1

GMail doesn't accept all comers. Get too many complaints and they'll reject you... this is just new ideas to add to that filter. There's a list of words you can't say on GMail without it getting read, they don't publish those lists because that'll never be said to them.

Re:GMail doesn't take everything... by Anonymous Coward · 2014-08-12 11:03 · Score: 0

There's a list of words you can't say on GMail without it getting read
"Shit, piss, fuck, cunt, cocksucker, motherfucker, and tits"?

Unicode the standard .. by lippydude · 2014-08-12 10:58 · Score: 1

And so this "standard" was designed in this way because country A didn't want it's script mixed up with country B, introducing vulnerabilities into the DNS system in the process. As in '' '' and 'A' all encode to different unicode er .. codes.

Re:Unicode the standard .. by Anonymous Coward · 2014-08-12 11:11 · Score: 1

They're called "code points" actually. A particular code point can be encoded in different ways (for example, the encoding of 'ß' in UTF-8 is different from the encoding in UTF-16, but they both represent the same code point.) Yeah, something like that ought to be used for network addresses...
Re:Unicode the standard .. by Anonymous Coward · 2014-08-12 12:50 · Score: 0

Why yes, we should collapse all similar looking letters into one, why didn't anyone think about that before!
Not that it matters, for example, that tolower("AT") should properly return "at", "a(cyrillic t)" or "(alpha)(tau)" depending on whether it was originally typed in English, Russian or Greek.
Re:Unicode the standard .. by lippydude · 2014-08-12 13:14 · Score: 1

"Why yes, we should collapse all similar looking letters into one, why didn't anyone think about that before! Not that it matters, for example, that tolower("AT") should properly return "at", "a(cyrillic t)" or "(alpha)(tau)" depending on whether it was originally typed in English, Russian or Greek."

Have one set of 'code points' for every language on the planet and remove the duplicates. That way they wouldn't have needed to hack unicode in order to allow for the following:

'the code point U+006E (the Latin lowercase "n") followed by U+0303 (the combining tilde "") is defined by Unicode to be canonically equivalent to the single code point U+00F1 (the lowercase letter "ñ" of the Spanish alphabet)' ref
Re:Unicode the standard .. by Anonymous Coward · 2014-08-12 14:48 · Score: 0

I see you've already drifted away from "introducing vulnerabilities into DNS".
Combining characters are extremely helpful for text input - I need a lot of accented characters, and I have them all with a bit of customization and without special support from OS or complex layout including every character. I've simply added combining ~/`/'/" on AltGr+corresponding key, and got myself full set of diacritics. I can easily do this in any OS that supports Unicode.
With your proposal, I'd either need a special IME that converts letter+AltGr+... into accented character, or a layout that accomodates all the variants I need.
Also, don't forget that combining marks are not limited to umlauts/acute marks and so on. There are more complex writing systems than Latin alphabet with specific and more involved combining characters in Unicode.
TL;DR: What you consider "simplifying" is actually only a dubious simplification of a single facet of text processing to a detriment of other facets covered by Unicode.

Re:Good that this applies to from: and not the bod by hondo77 · 2014-08-12 11:14 · Score: 1

Slashdot does not allow non-ASCII characters...

Óh réällý?

--
I live ze unknown. I love ze unknown. I am ze unknown.

Re:Good that this applies to from: and not the bod by mi · 2014-08-12 11:30 · Score: 1

Óh réällý?

That's pretty cool. I guess, the entire ISO-8859-15 is Ok? But not Cyrillics :-( Or else, you would've seen some Ukrainian-Russian conflict right here...

--
In Soviet Washington the swamp drains you.

They are right - Uses of unicode ambiguous letters by enriquevagu · 2014-08-12 11:36 · Score: 1

They are right doing so. There are letters in different alphabets whose typing is very very similar -- or in fact they are written exactly the same, depending on the font used.

This can be exploited for interesting uses. For example, "E" and "ÃZ"** are respectively the latin "e" and the greek "epsilon" vowels, but they are indistinguishable in caps, at least in Arial font. The second one is the UTF 395 code. My name has an "E" on it, and for my email signature I spell my name using the traditional latin letter from the keyboard when the email is important and should be archived. By contrast, when the email is mostly irrelevant for future use (such as meeting arrangement emails, which are useless after the meeting takes place) I spell my name using the Greek epsilon letter (hint: 395 followed by Alt+X in most Windows programs). There is no obvious difference for the receiver, but a search tool can be used to quickly find all sent emails which can be deleted safely.

While the previous is a somehow "legit" use, in general any word which combines letters from different alphabets could be used to confuse an trick the receiver, for example by creating an email account which reads exactly the same as the one from another person. There is a nice image of 5 letters a-b-c-d-e in different alphabets in the linked post. I agree with Google in preventing such combinations for email accounts. It would be interesting to know the exact policy used to forbid account names, which is not detailed.

** At the time of writing, these two letters look exactly the same. Classic Slashdot lacks Unicode support and does not represent the greek Unicode letter from my comment. I tried logging into Slashdot Beta (first time, I swear it!!) and it seems to represent a different letter... Please try this on your own computer!

You're a poopy head :P by Anonymous Coward · 2014-08-12 12:37 · Score: 0

I found it amusing that you are aware of the existence of different fields in email and then used you post to demonstrate that you have no fucking clue how to use them. Sentences should not be split across the subject and body fields.

How about starting with dropping obvious spam? by Anonymous Coward · 2014-08-12 12:52 · Score: 0

I must have about 50 filters to auto-delete some of the really basic, obvious spam that Google accepts to my gmail account, and I still get 20-30 spams a day. Auto-filtered into my spam folder, but even so I still have to look at it because I do get the occasional false positive.

In contrast to my DNS-RBL+SpamAssassin+procmail I have for my own domain MTA which successfully turns away or drops about 99.99% of the spam that arrives at that email address.

I thought those guys at google were supposed to be smart. If they're so smart, why can't their mail system recognize the obvious spam?

Re:Good that this applies to from: and not the bod by Dutch+Gun · 2014-08-12 14:16 · Score: 1

Heuristics could pretty easily determine if someone communicate only in English in their e-mails, and as such, any legitimate e-mails that contain large amounts of non-English words or characters should be viewed with greater suspicion. For those that routinely communicate in more than one language and use non-ascii sets, the heuristic should be able to account for that fact.

These sorts of rules are always fuzzy by nature. Obviously, whether an e-mail is determined to be legitimate or not is due to many different factors. This could simply be one of those contributing factors.

--
Irony: Agile development has too much intertia to be abandoned now.

Re:Good that this applies to from: and not the bod by Anonymous Coward · 2014-08-12 19:29 · Score: 0

Boo, slashdot discards Esperanto and Polish letters; surprisingly lame!

It's not as if UTF-8 is some crazy new thing, sheesh.

Sounds rather ethnocentric by Chrisq · 2014-08-12 19:30 · Score: 2

It allows combinations of Latin + Han + Hiragana + Katakana; Latin + Han + Bopomofo; or Latin + Han + Hangul.

There are a lot of equally safe combinations - what about Latin + Devanagari + Tamil? There would be no look-alike characters and it would allow a lot of people to put their name in multiple scripts that are likely to be meaningful to certain audiences (e.g. someone from Tamil Nadu sending an email to people throughout India and internationally). I'm sure that there are many other combinations that wouldn't have "look alike" issues but which would be useful

Re:Good that this applies to from: and not the bod by Anonymous Coward · 2014-08-12 19:30 · Score: 0

I'm curious about why you need to get around the filters. If you disagree with filtering in general, do you really think rebelling against it on some Internet forums is going to make a difference? Why not just move on to less-restricted forums or stay and follow the rules?

Insufficient by taikedz · 2014-08-12 20:56 · Score: 1

The "highly restricted" spec is meant to catch suspicious combos like in the mybank example - but does not catch full-ascii (which is an even more restrictive level) trickery like tvvitter.com (notice the two "v" chars). that combo in particular is now known, but goes to demonstrate that trickery does not need charsets larger than 7-bit... some people simply get caught by hsbc.net...

--
-- "Simplicity is prerequisite for reliability." --Dijkstra

Do observe by Anonymous Coward · 2014-08-12 21:34 · Score: 0

That we have a supposedly "universal" characterset that is not universally usable without considerable bolted on as an afterthought blacklists and whitelists. In fact, spotify already learned the hard way that the standard ways to compare unicode strings just don't cut it, and inventing your own is fraught with peril. There's much more slightly, subtly, insidiously "off" with unicode, before we consider the cost in code size and its associated costs.

In other words, it's not really suitable for real-world use, for you can only (and then only so-so) trust it if you generated it yourself. As soon as the unicode comes from elsewhere it's a liability to safely reading the input.

Re:Good that this applies to from: and not the bod by Anonymous Coward · 2014-08-14 01:07 · Score: 0

...and should be hanged by his nuts

Re:Good that this applies to from: and not the bod by Anonymous Coward · 2014-08-14 07:34 · Score: 0

He's trying to post on local news websites (TV stations) is my guess, they all have their comments farmed out to Disqus or Topix these days. The problem is you can't engage in a conversation without editing your comment 10 times, with no indication at any point what is actually being flagged as inappropriate, or else by subverting the filter as GP mentioned. They aren't just filtering vulgarity. Words like bribe and corrupt are blocked by my TV station's Topix comments, so it's hard to discuss politicians for example.

Why are we still blocking spam ? by perryizgr8 · 2014-08-18 16:17 · Score: 1

Google, if you set up a white listed email system, my friends and family will happily sign up.

They did, it's called Google+. Nobody seems to like it.

--
Wealth is the gift that keeps on giving.

Slashdot Mirror

Gmail Now Rejects Emails With Misleading Combinations of Unicode Characters

79 comments