Slashdot Mirror


Spammers Using Soft Hyphen To Hide Malicious URLs

Trailrunner7 writes with this excerpt from ThreatPost illustrating the ongoing Spy-vs.-Spy battle between spammers and the rest of us: "Spammers have jumped on the little-used soft hyphen (or SHY character) to fool URL filtering devices. According to researchers, spammers are larding up URLs for sites they promote with the soft hyphen character, which many browsers ignore. Spammers aren't shy about jumping humans flexible cognitive abilities to slip past the notice of spam filters (H3rb41 V14gr4, anyone?). ... The latest trend involves the use of an obscure character called the soft hyphen or 'SHY' character to obscure malicious URLs in spam messages. Writing on the Symantec Connect blog, researcher Samir Patil said that the company has seen recent spam messages that insert the HTML symbol for the soft hyphen to obfuscate URLs for Web pages promoted by the spammers."

22 of 162 comments (clear)

  1. H3rb41 V14gr4? by MrEricSir · · Score: 4, Insightful

    I never got the leet speak in spam thing. Sure, it might get past the filter, but who can read it? Are they trying to sell drugs to script kiddies?

    --
    There's no -1 for "I don't get it."
    1. Re:H3rb41 V14gr4? by caffeinemessiah · · Score: 3, Insightful

      I never got the leet speak in spam thing. Sure, it might get past the filter, but who can read it? Are they trying to sell drugs to script kiddies?

      I don't know about you, but I can't stop trying to figure out what word they're trying to represent with the symbols. For example, I know the second word in your subject means viagra, but what is "H3rb41"? Oh..."herbal". It's naturally (perhaps unknowingly) targeted towards geeks and puzzle-solvers, which perhaps isn't the worst market to target available-without-human-contact penis drugs towards.

      --
      An old-timer with old-timey ideas.
    2. Re:H3rb41 V14gr4? by maxwell+demon · · Score: 3, Insightful

      I thought the only situation where you need Viagra is exactly human contact (in the most literal meaning of the word).

      --
      The Tao of math: The numbers you can count are not the real numbers.
    3. Re:H3rb41 V14gr4? by commodore64_love · · Score: 3, Funny

      I think this photograph is appropriate. And I'm happy to say: No I can't read it.

      http://media.ebaumsworld.com/picture/strober/get_laid.jpg

      --
      "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
  2. Shy Spammers by biryokumaru · · Score: 3, Funny

    Spammers are getting more shy? That's a relief!

    --
    When you're afraid to download music illegally in your own home, then the terrorists have won!
  3. What is it? by iONiUM · · Score: 5, Funny

    Why didn't they just put the friggin character in the summary so I didn't have to read the article?

    Anyways, according to the article it's &shy, which looks "identical to a regular hyphen." Are you happy now slashdot? I had to read TFA to find that out.

    1. Re:What is it? by maxwell+demon · · Score: 3, Insightful

      Are registrars accepting domain names with soft hyphens? And if so, why? It's rather obvious that such domain names would only be used for fraud.
      IMHO registrars should not accept any non-printable character in domain names.

      --
      The Tao of math: The numbers you can count are not the real numbers.
    2. Re:What is it? by sexconker · · Score: 3, Insightful

      Yes, they are. Otherwise this story wouldn't exist.
      Why? Because they like money, and don't give a fuck.
      Of course they should not accept any non-printable characters.

      Registrars are pretty much only half a step above the spammers in terms of ethics / shittiness.

  4. Re:Why by TopSpin · · Score: 5, Informative

    Why don't modern browsers render this character?

    The character isn't supposed to be rendered. Soft hyphen indicates where to break words if necessary. The hyphens are not rendered if the word doesn't need to be broken.

    --
    Lurking at the bottom of the gravity well, getting old
  5. So how often is it used legitimately? by JesseL · · Score: 4, Interesting

    Is there any good reason not to just call the presence of soft hyphens as a reliable indicator of spam and use it as the basis of a spam filter?

    --
    "Prefiero morir de pie que vivir siempre arrodillado!"
    1. Re:So how often is it used legitimately? by Anonymous Coward · · Score: 4, Informative

      Is there any good reason not to just call the presence of soft hyphens as a reliable indicator of spam and use it as the basis of a spam filter?

      Yes, there is: languages other than English. In e.g. German, the use soft hyphens, while not universal, is becoming more common, at least, and for a reason: longer words that can't automatically be hyphenated by the browser as necessary lead to ugly layout, especially when there's not a lot of horizontal space (e.g. on news sites, which often tend to emulate printed newspapers).

    2. Re:So how often is it used legitimately? by treeves · · Score: 3, Informative

      So, when I get an email with a link to www.Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz.de, should I avoid clicking the link, or what?

      --
      ...the future crusty old bastards are already drinking the Kool-Aid.
  6. Re:Why by maxwell+demon · · Score: 3, Informative

    Why don't modern browsers render this character?

    From Wikipedia:

    "Since it is difficult for a computer program to automatically make good decisions on when to hyphenate a word, the concept of a soft hyphen was introduced to allow manual specification of a place where a hyphenated break was allowed without forcing a line break in an inconvenient place if the text was later re-flowed."

    So a soft hyphen marks a position where you can hyphenate a word. If you don't do it, you of course shouldn't print anything at that position.

    --
    The Tao of math: The numbers you can count are not the real numbers.
  7. shy by Anonymous Coward · · Score: 3, Funny

    No good shysters.

  8. Re:Why by Tynin · · Score: 4, Informative

    Why don't modern browsers render this character?

    Two reasons, the first being that HTML 4 specs call for it to not be rendered unless it meets the criteria. Here is the full blurb:

    9.3.3 Hyphenation

    In HTML, there are two types of hyphens: the plain hyphen and the soft hyphen. The plain hyphen should be interpreted by a user agent as just another character. The soft hyphen tells the user agent where a line break can occur.

    Those browsers that interpret soft hyphens must observe the following semantics: If a line is broken at a soft hyphen, a hyphen character must be displayed at the end of the first line. If a line is not broken at a soft hyphen, the user agent must not display a hyphen character. For operations such as searching and sorting, the soft hyphen should always be ignored.

    In HTML, the plain hyphen is represented by the "-" character ( or ). The soft hyphen is represented by the character entity reference ( or )

    The other reason is that the current unicode standard basically says it doesn't support when and where it should be displayed as a hyphen and leaves it open to interpretation of whoever is coding for it. Here is the blurb from the unicode standard on it:

    Hyphenation. U+00AD soft hyphen (SHY) indicates an intraword break point, where a line break is preferred if a word must be hyphenated or otherwise broken across lines. Such break points are generally determined by an automatic hyphenator. SHY can be used with any script, but its use is generally limited to situations where users need to override the behavior of such a hyphenator. The visible rendering of a line break at an intraword break point, whether automatically determined or indicated by a SHY, depends on the surrounding characters, the rules governing the script and language used, and, at times, the meaning of the word. The precise rules are outside the scope of this standard, but see Unicode Standard Annex #14, “Unicode Line Breaking Algorithm,” for additional information. A common default rendering is to insert a hyphen before the line break, but this is insufficient or even incorrect in many situations.

    Contrast this usage with U+2027 hyphenation point, which is used for a visible indication of the place of hyphenation in dictionaries. For a complete list of dash characters in the Unicode Standard, including all the hyphens, see Table 6-3.

    The Unicode Standard includes two nonbreaking hyphen characters: U+2011 non-breaking hyphen and U+0F0C tibetan mark delimiter tsheg bstar. See Section 10.2, Tibetan, for more discussion of the Tibetan-specific line breaking behavior.

  9. Re:Why by KillaGouge · · Score: 4, Insightful

    please ignore my parent post. It seems that GP is correct

    --
    GENERATION 25: The first time you see this, copy it into your sig on any forum and add 1 to the generation. Social exper
  10. SpamAssassin is not vulnerable to this by Khopesh · · Score: 4, Informative

    Just tested this in SpamAssassin with http ://exa ­ mple.com (spaced to evade slashdot's own obfuscation-eliminator) - Result: The URL domain (example.com) is properly extracted without the obfuscation.

    That said, SA is fully capable of detecting the obfuscation attempt itself (using a rawbody rule)...

    --
    Use my userscript to add story images to Slashdot. There's no going back.
  11. The Wrongest Part by SuperKendall · · Score: 5, Funny

    The thing that really grates on the nerves, is using a soft-hypen to sell Viagra.

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
  12. Re:Why by Man+Eating+Duck · · Score: 5, Informative

    I'm a pretty IT-savvy guy, but WHAT IS that bloody character?

    Say you're laying out a book. You have the word Sauerkraut at a line wrap, but it is broken into Sauerk-raut because your layout software don't know where to break it. You then put in a soft hyphen between r and k, this indicates to your software that this word should be broken there. It turns into Sauer-kraut which is correct.

    Later you get angry with the Sauerkraut and call it "bloody Sauerkraut". Now the whole word will be at the next line, and the soft hyphen won't show because your software doesn't need to break the word. Thus you can insert these freely without fretting about words containing a hyphen later on, they'll only be rendered when used as a hint.

    HTH

    --
    Are you a grammar Nazi? I'm trying to improve my English; please correct my errors! :)
  13. Not always by pavon · · Score: 4, Informative

    It is only supposed to be rendered when the word is split across multiple lines.

    For example if your text was "super­cali­fragilistic­expialidocious" then all of the following are valid rendering depending on where the render decides to start a new line:

    supercalifragilisticexpialidocious

    or

    supercalifragilistic-
    expialidocious

    or

    supercali-
    fragilistic-
    expialidocious

  14. Re:Why by modecx · · Score: 3, Funny

    Speaking of bloody sauerkraut, I think there was some sort of hyphen-depression when the inventors of the German language decided it would be fun to glue adjectives and nouns together. i.e. when I see something like: unabhaengigkeitserklaerungen, I have an nigh-irresistible urge to shout Gesundheit!

    I'm still not sure why the nazis went to all of the trouble of building cipher-machines. The language looks sufficiently jumbled from the start.

    --
    Constitutional rights may be respected, repealed, or modified; but they must never be ignored.
  15. terrible summary by Laxori666 · · Score: 3, Funny

    Is it just me or is this summary terrible? Every sentence says the same thing, just slightly reworded. In the summary, it's as if each new sentence doesn't give any additional information, but it's worded as if it does. Researchers have found that this summary is repetitive. Some say this can indicate the repetitiveness of a summary.