Spammers Using Soft Hyphen To Hide Malicious URLs
Trailrunner7 writes with this excerpt from ThreatPost illustrating the ongoing Spy-vs.-Spy battle between spammers and the rest of us:
"Spammers have jumped on the little-used soft hyphen (or SHY character) to fool URL filtering devices. According to researchers, spammers are larding up URLs for sites they promote with the soft hyphen character, which many browsers ignore. Spammers aren't shy about jumping humans flexible cognitive abilities to slip past the notice of spam filters (H3rb41 V14gr4, anyone?). ... The latest trend involves the use of an obscure character called the soft hyphen or 'SHY' character to obscure malicious URLs in spam messages. Writing on the Symantec Connect blog, researcher Samir Patil said that the company has seen recent spam messages that insert the HTML symbol for the soft hyphen to obfuscate URLs for Web pages promoted by the spammers."
I never got the leet speak in spam thing. Sure, it might get past the filter, but who can read it? Are they trying to sell drugs to script kiddies?
There's no -1 for "I don't get it."
Why don't modern browsers render this character?
GENERATION 25: The first time you see this, copy it into your sig on any forum and add 1 to the generation. Social exper
Spammers are getting more shy? That's a relief!
When you're afraid to download music illegally in your own home, then the terrorists have won!
Why didn't they just put the friggin character in the summary so I didn't have to read the article?
Anyways, according to the article it's ­, which looks "identical to a regular hyphen." Are you happy now slashdot? I had to read TFA to find that out.
Is there any good reason not to just call the presence of soft hyphens as a reliable indicator of spam and use it as the basis of a spam filter?
"Prefiero morir de pie que vivir siempre arrodillado!"
Tongue-tied, (I'm) short of breath, don't even try
Try a little harder
Something's wrong, you're not naive, you must be strong
Ooh, baby, try
Hey girl, move a little closer.
You're
CHORUS:
Too shy shy
Hush hush, eye to eye
Too shy shy
Hush hush, eye to eye
Too shy shy
Hush hush, eye to eye
Too shy shy
So now spam filters will pick up on soft hyphens used in URIs inside emails (when was the last time you saw one used legitimately?), making the spam easier to spot.
Not a typewriter
No good shysters.
The advent of HTML 5 within the next couple years - and browsers that support it - is expected to solve many of these problems, because that specification finally standardizes how HTML code should be parsed by Web browsers, rather than leaving it up to individual platform vendors to develop their own interpretations of how the code should be parsed.
I bet 4pple is behind the spam trying to further promote 1-1TML-5.......$t3v3 J0b$ l0v3s v14gr4
www.RacquetUp.org - Helping Detroit Youth
Just tested this in SpamAssassin with http ://exa ­ mple.com (spaced to evade slashdot's own obfuscation-eliminator) - Result: The URL domain (example.com) is properly extracted without the obfuscation.
That said, SA is fully capable of detecting the obfuscation attempt itself (using a rawbody rule)...
Use my userscript to add story images to Slashdot. There's no going back.
Note the use of the phrase "should be". I see this a lot when reading about HTML 5. Are people really that stupid and/or naive that they think all browsers will follow the HTML 5 spec exactly? (yes Microsoft I'm looking at you)
The thing that really grates on the nerves, is using a soft-hypen to sell Viagra.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
I'm not too worried about flexible cognitive abilities, but jumping humans do bother me too.
Spammers are using the soft hyphen. Spammers are using the soft hyphen. ...Spammers are using the soft hyphen. "Spammers are using the soft hyphen."
Yes, each sentence says a little bit more, but it still repeats the same fact over and over. I usually don't complain about slashdot summaries, but this was honestly painful to read. Just because you copy+pasted what TFA says doesn't mean it's okay.
My webcomic
It is only supposed to be rendered when the word is split across multiple lines.
For example if your text was "super­cali­fragilistic­expialidocious" then all of the following are valid rendering depending on where the render decides to start a new line:
supercalifragilisticexpialidocious
or
supercalifragilistic-
expialidocious
or
supercali-
fragilistic-
expialidocious
I don't get how you can put a soft-hyphen in a URL and have it work? It's a formatting character, it shouldn't ever be legal to have a formatting character as part of a URL? Are they registering domain-names with soft-hyphens in the name? Or is this a case where the browser 'helpfully' replaces a soft hyphen with a regular hyphen when actually trying to connect to the web server, but for some reason does NOT render they hyphen when displaying it to the user? It seems like the browser should behave consistently - if it doesn't render a hyphen when displaying it, it shouldn't render a hyphen when making the DNS lookup.
My theory of how this submission reads is as follows.
how is babby formed?
It doesn't make any sense, probably just some nonsense to scare people into buying their product(symantec).
By using softhyphen in IDN, a spammer can get a spoof domain that looks like an authentic domain on screen. But how can that fool any spam filters?
In other words, security flaws will be baked in so that vendors can't fix them without breaking standards compliance?
Is it just me or is this summary terrible? Every sentence says the same thing, just slightly reworded. In the summary, it's as if each new sentence doesn't give any additional information, but it's worded as if it does. Researchers have found that this summary is repetitive. Some say this can indicate the repetitiveness of a summary.
Does this work on, say, "Smart"Filter?
Would be lovely if there was a way to craft your own bookmarks to bypass the damn filters. Say, by using a Greasemonkey script or something...
Or I would have been, if that had been a real URL. Who could resist that?
Especially if you're into überwachungsaufgaben. Not that I am! No sir. Just saufgaben-curious.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
...unless they get a lucky break.
I think I learned that in junior high.
This is so easy to defeat with a simple regular expression in your spam filter. I doubt spammers will continue with this tactic for long.
In a real emergency, we would have all fled in terror, and you would not have been notified.
Soft Hymen for Viagra and Porn Spammers!
Makes it easier and less bloody Mary....
The number of attacks made possible by allowing non-ASCII chars in URLs is huge. Bruce Schneier warned about it years ago. This ain't the first type of such an "attach". This definitely ain't the last. And security-conscious people knew about it years and years, if not decades, ago. The problem is that in our industry there are still people that don't understand that there are a *lot* of bad guys out there that will try to break havoc. For fun or for profit. As long as there will be decision makers who don't understand that security should the first concern we're fuxx0red.
Atlas Shrugged : Thematic Story
Alright I did some testing in Chrome, Firefox, Internet Explorer and Opera (all latest versions)
simple link, with a SHY character in the link. Depending on the format of the link (with a http, or without), All 4 browser did the exact thing we expected them to do : The link either showed the hyphen and linked to a hyphened page correctly (when I say "Showed", I mean, that if you mouse-over the link, you see the hyphen in the task bar) or just didn't show it and didn't link to a hyphened page.
So, I don't see the problem in here... i call this FUD.
I wouldn't mind you in my head, if you weren't so clearly mad -Lews Therin Telamon
I'm a SpamAssassin developer, both on the official project and on a commercial derivative. Others on my commercial team independently verified my claim as well. I highly doubt we're all wrong.
That said, I decided to FULLY dig into the issue to see what's going on under the hood. In addition to a careful analysis of the spamassassin debug output, I spun up Wireshark to look at the actual DNS queries. Since SA knows what example.com is ([84234] dbg: uridnsbl: domain example.com in skip list), I had to use something else. I ran two tests: one on a nonexistant domain as separated by the SHY character in a manner that doesn't result in delimiting the latter portion into an existing domain, and then one as a heavily-spammed domain with a SHY character again breaking it into a nonexisting domain.
Analysis: Debug output from SA 3.3.1 and SVN trunk (rev 1005948, build reports version as 3.4.0-r929098) displays the SHY character (which my terminal renders as a space but after a paste, my browser does not) and uses \255 in its DNS lookups (older versions display it as \173 and I didn't capture the raw lookups). In addition to looking for the domain with the SHY character, it also queries without the SHY character. My live test confirmed a hit in URIBL for the defanged domain and no hit for the obfuscated one (I didn't test a real sample of the obfuscation -- presumably, the blocklists can learn the obfuscated domain in addition to the defanged one). I see no reference to the IDN syntax you mentioned.
A sample of the debug output (tweaked to convert the SHY to a space so it is distinguishable on the web):
$ grep obinemedic ~/url.eml.output |grep -i uribl |sed 's/r.obin/r obin/'
Oct 8 15:06:55.950 [1570] dbg: dns: providing a callback for id: 64792/r obinemedic.ru.multi.uribl.com/A/IN
Oct 8 15:06:55.950 [1570] dbg: async: starting: URI-DNSBL, DNSBL:multi.uribl.com.:r obinemedic.ru (timeout 15.0s, min 3.0s)
Oct 8 15:06:55.955 [1570] dbg: dns: providing a callback for id: 40779/robinemedic.ru.multi.uribl.com/A/IN
Oct 8 15:06:55.955 [1570] dbg: async: starting: URI-DNSBL, DNSBL:multi.uribl.com.:robinemedic.ru (timeout 15.0s, min 3.0s)
Oct 8 15:06:55.985 [1570] dbg: uridnsbl: domain "robinemedic.ru" listed (URIBL_DBL_SPAM): 127.0.1.2
Oct 8 15:06:55.987 [1570] dbg: uridnsbl: domain "robinemedic.ru" listed (URIBL_AB_SURBL): 127.0.0.102
Oct 8 15:06:55.988 [1570] dbg: uridnsbl: domain "robinemedic.ru" listed (URIBL_WS_SURBL): 127.0.0.102
Oct 8 15:06:55.988 [1570] dbg: uridnsbl: domain "robinemedic.ru" listed (URIBL_JP_SURBL): 127.0.0.102
Oct 8 15:06:55.989 [1570] dbg: uridnsbl: domain "robinemedic.ru" listed (URIBL_SC_SURBL): 127.0.0.102
Oct 8 15:06:56.121 [1570] dbg: async: completed in 0.162 s: URI-DNSBL, DNSBL:multi.uribl.com.:robinemedic.ru
Oct 8 15:06:56.121 [1570] dbg: uridnsbl: domain "robinemedic.ru" listed (URIBL_BLACK): 127.0.0.2
Oct 8 15:06:56.122 [1570] dbg: async: completed in 0.167 s: URI-DNSBL, DNSBL:multi.uribl.com.:r obinemedic.ru
Oct 8 15:06:57.980 [1570] dbg: async: timing: 0.162 . DNSBL:multi.uribl.com.:robinemedic.ru
Oct 8 15:06:57.980 [1570] dbg: async: timing: 0.167 . DNSBL:multi.uribl.com.:r obinemedic.ru
Use my userscript to add story images to Slashdot. There's no going back.