Spammers Using Soft Hyphen To Hide Malicious URLs
Trailrunner7 writes with this excerpt from ThreatPost illustrating the ongoing Spy-vs.-Spy battle between spammers and the rest of us:
"Spammers have jumped on the little-used soft hyphen (or SHY character) to fool URL filtering devices. According to researchers, spammers are larding up URLs for sites they promote with the soft hyphen character, which many browsers ignore. Spammers aren't shy about jumping humans flexible cognitive abilities to slip past the notice of spam filters (H3rb41 V14gr4, anyone?). ... The latest trend involves the use of an obscure character called the soft hyphen or 'SHY' character to obscure malicious URLs in spam messages. Writing on the Symantec Connect blog, researcher Samir Patil said that the company has seen recent spam messages that insert the HTML symbol for the soft hyphen to obfuscate URLs for Web pages promoted by the spammers."
I never got the leet speak in spam thing. Sure, it might get past the filter, but who can read it? Are they trying to sell drugs to script kiddies?
There's no -1 for "I don't get it."
Spammers are getting more shy? That's a relief!
When you're afraid to download music illegally in your own home, then the terrorists have won!
Why didn't they just put the friggin character in the summary so I didn't have to read the article?
Anyways, according to the article it's ­, which looks "identical to a regular hyphen." Are you happy now slashdot? I had to read TFA to find that out.
Why don't modern browsers render this character?
The character isn't supposed to be rendered. Soft hyphen indicates where to break words if necessary. The hyphens are not rendered if the word doesn't need to be broken.
Lurking at the bottom of the gravity well, getting old
Is there any good reason not to just call the presence of soft hyphens as a reliable indicator of spam and use it as the basis of a spam filter?
"Prefiero morir de pie que vivir siempre arrodillado!"
Why don't modern browsers render this character?
From Wikipedia:
"Since it is difficult for a computer program to automatically make good decisions on when to hyphenate a word, the concept of a soft hyphen was introduced to allow manual specification of a place where a hyphenated break was allowed without forcing a line break in an inconvenient place if the text was later re-flowed."
So a soft hyphen marks a position where you can hyphenate a word. If you don't do it, you of course shouldn't print anything at that position.
The Tao of math: The numbers you can count are not the real numbers.
So now spam filters will pick up on soft hyphens used in URIs inside emails (when was the last time you saw one used legitimately?), making the spam easier to spot.
Not a typewriter
No good shysters.
Why don't modern browsers render this character?
Two reasons, the first being that HTML 4 specs call for it to not be rendered unless it meets the criteria. Here is the full blurb:
The other reason is that the current unicode standard basically says it doesn't support when and where it should be displayed as a hyphen and leaves it open to interpretation of whoever is coding for it. Here is the blurb from the unicode standard on it:
please ignore my parent post. It seems that GP is correct
GENERATION 25: The first time you see this, copy it into your sig on any forum and add 1 to the generation. Social exper
Just tested this in SpamAssassin with http ://exa ­ mple.com (spaced to evade slashdot's own obfuscation-eliminator) - Result: The URL domain (example.com) is properly extracted without the obfuscation.
That said, SA is fully capable of detecting the obfuscation attempt itself (using a rawbody rule)...
Use my userscript to add story images to Slashdot. There's no going back.
The thing that really grates on the nerves, is using a soft-hypen to sell Viagra.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
I'm a pretty IT-savvy guy, but WHAT IS that bloody character?
Say you're laying out a book. You have the word Sauerkraut at a line wrap, but it is broken into Sauerk-raut because your layout software don't know where to break it. You then put in a soft hyphen between r and k, this indicates to your software that this word should be broken there. It turns into Sauer-kraut which is correct.
Later you get angry with the Sauerkraut and call it "bloody Sauerkraut". Now the whole word will be at the next line, and the soft hyphen won't show because your software doesn't need to break the word. Thus you can insert these freely without fretting about words containing a hyphen later on, they'll only be rendered when used as a hint.
HTH
Are you a grammar Nazi? I'm trying to improve my English; please correct my errors!
It is only supposed to be rendered when the word is split across multiple lines.
For example if your text was "super­cali­fragilistic­expialidocious" then all of the following are valid rendering depending on where the render decides to start a new line:
supercalifragilisticexpialidocious
or
supercalifragilistic-
expialidocious
or
supercali-
fragilistic-
expialidocious
Speaking of bloody sauerkraut, I think there was some sort of hyphen-depression when the inventors of the German language decided it would be fun to glue adjectives and nouns together. i.e. when I see something like: unabhaengigkeitserklaerungen, I have an nigh-irresistible urge to shout Gesundheit!
I'm still not sure why the nazis went to all of the trouble of building cipher-machines. The language looks sufficiently jumbled from the start.
Constitutional rights may be respected, repealed, or modified; but they must never be ignored.
Is it just me or is this summary terrible? Every sentence says the same thing, just slightly reworded. In the summary, it's as if each new sentence doesn't give any additional information, but it's worded as if it does. Researchers have found that this summary is repetitive. Some say this can indicate the repetitiveness of a summary.
Where one in English might use a series of adjectives plus a noun a German would use a single agglomerative word - what is your problem?
Deutsch is a sufficiently sophisticated language without your assistance.
It doesn't work the same as your native tongue - get a life and stop trolling my forum - twat.
This is purely senseless and is a mark of poor language design.
Languages (in general) aren't designed, they evolve. Which makes your (all too long-winded) point quite moot.
"Total destruction the only solution" - Bob Marley