Slashdot Mirror


Spammers Using Soft Hyphen To Hide Malicious URLs

Trailrunner7 writes with this excerpt from ThreatPost illustrating the ongoing Spy-vs.-Spy battle between spammers and the rest of us: "Spammers have jumped on the little-used soft hyphen (or SHY character) to fool URL filtering devices. According to researchers, spammers are larding up URLs for sites they promote with the soft hyphen character, which many browsers ignore. Spammers aren't shy about jumping humans flexible cognitive abilities to slip past the notice of spam filters (H3rb41 V14gr4, anyone?). ... The latest trend involves the use of an obscure character called the soft hyphen or 'SHY' character to obscure malicious URLs in spam messages. Writing on the Symantec Connect blog, researcher Samir Patil said that the company has seen recent spam messages that insert the HTML symbol for the soft hyphen to obfuscate URLs for Web pages promoted by the spammers."

34 of 162 comments (clear)

  1. H3rb41 V14gr4? by MrEricSir · · Score: 4, Insightful

    I never got the leet speak in spam thing. Sure, it might get past the filter, but who can read it? Are they trying to sell drugs to script kiddies?

    --
    There's no -1 for "I don't get it."
    1. Re:H3rb41 V14gr4? by caffeinemessiah · · Score: 3, Insightful

      I never got the leet speak in spam thing. Sure, it might get past the filter, but who can read it? Are they trying to sell drugs to script kiddies?

      I don't know about you, but I can't stop trying to figure out what word they're trying to represent with the symbols. For example, I know the second word in your subject means viagra, but what is "H3rb41"? Oh..."herbal". It's naturally (perhaps unknowingly) targeted towards geeks and puzzle-solvers, which perhaps isn't the worst market to target available-without-human-contact penis drugs towards.

      --
      An old-timer with old-timey ideas.
    2. Re:H3rb41 V14gr4? by MysteriousPreacher · · Score: 2, Interesting

      I never understood how it actually worked, except as you suggested, the script kiddy crowd are heavily in to giving money to strangers in exchange for uber zomg epic sexual prowess.

      Maybe I'm old fashioned, but I'm kind of reluctant to whip out my credit card to buy something from a company that employs mittens-wearing illiterates to write their adverts. Sure I'll eat at a Chinese restaurant with an amusingly translated menu, but that's a little different.

      --
      -- Using the preview button since 2005
    3. Re:H3rb41 V14gr4? by maxwell+demon · · Score: 3, Insightful

      I thought the only situation where you need Viagra is exactly human contact (in the most literal meaning of the word).

      --
      The Tao of math: The numbers you can count are not the real numbers.
    4. Re:H3rb41 V14gr4? by commodore64_love · · Score: 3, Funny

      I think this photograph is appropriate. And I'm happy to say: No I can't read it.

      http://media.ebaumsworld.com/picture/strober/get_laid.jpg

      --
      "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
    5. Re:H3rb41 V14gr4? by Anonymous Coward · · Score: 2, Funny

      I like my hyphen hard, not soft. That's why I use H3rb41 V14gr4.

    6. Re:H3rb41 V14gr4? by Abstrackt · · Score: 2, Informative

      I thought the only situation where you need Viagra is exactly human contact (in the most literal meaning of the word).

      There's the rub, so to speak. Most men using viagra don't need it, they just like using it, and nothing prevents them from enjoying it on their own.

      --
      They say a little knowledge is a dangerous thing, but it's not one half so bad as a lot of ignorance. - Terry Pratchett
  2. Shy Spammers by biryokumaru · · Score: 3, Funny

    Spammers are getting more shy? That's a relief!

    --
    When you're afraid to download music illegally in your own home, then the terrorists have won!
  3. What is it? by iONiUM · · Score: 5, Funny

    Why didn't they just put the friggin character in the summary so I didn't have to read the article?

    Anyways, according to the article it's &shy, which looks "identical to a regular hyphen." Are you happy now slashdot? I had to read TFA to find that out.

    1. Re:What is it? by maxwell+demon · · Score: 3, Insightful

      Are registrars accepting domain names with soft hyphens? And if so, why? It's rather obvious that such domain names would only be used for fraud.
      IMHO registrars should not accept any non-printable character in domain names.

      --
      The Tao of math: The numbers you can count are not the real numbers.
    2. Re:What is it? by sexconker · · Score: 3, Insightful

      Yes, they are. Otherwise this story wouldn't exist.
      Why? Because they like money, and don't give a fuck.
      Of course they should not accept any non-printable characters.

      Registrars are pretty much only half a step above the spammers in terms of ethics / shittiness.

    3. Re:What is it? by arth1 · · Score: 2, Informative

      Yes, they are. Otherwise this story wouldn't exist.

      Who modded this insightful?

      No, domain registrars don't allow soft hyphens in domain name registrations. Give me a single example of a registered domain with a soft hyphen in it.

      As the other user said, this is used for masking URLs in e-mails, and thus trying to thwart spam filters.

      Get your ch­eap Vi­agra at http://www.chshyeapvishyagra.com/

      This will render as

      Get your cheap Viagra at http://www.cheapviagra.com/

      Yet many spam filters will not trigger on the words "cheap" and "Viagra", and the e-mail has a greater chance of getting through filters.

      A similar technique was used in the past for domain names and e-mail addresses, back when e-mail actually followed the standards, and no-one read their e-mail in a HTML browser.
        is a legal e-mail address that is interpreted as cheapviagra@hotmail.com, but many newer programs don't follow the standards and will barf, which is why spammers seldom do this anymore.

    4. Re:What is it? by maxwell+demon · · Score: 2, Informative

      So the problem is browsers silently removing them.

      A browser should never modify an URL. Especially it should not remove invalid characters. It should give an error if you try to go to an invalid URL. It should not try to be "helpful" here.

      --
      The Tao of math: The numbers you can count are not the real numbers.
  4. Re:Why by TopSpin · · Score: 5, Informative

    Why don't modern browsers render this character?

    The character isn't supposed to be rendered. Soft hyphen indicates where to break words if necessary. The hyphens are not rendered if the word doesn't need to be broken.

    --
    Lurking at the bottom of the gravity well, getting old
  5. So how often is it used legitimately? by JesseL · · Score: 4, Interesting

    Is there any good reason not to just call the presence of soft hyphens as a reliable indicator of spam and use it as the basis of a spam filter?

    --
    "Prefiero morir de pie que vivir siempre arrodillado!"
    1. Re:So how often is it used legitimately? by Cinder6 · · Score: 2, Funny

      Well, I know I've certainly never seen it!

      --
      If you can't convince them, convict them.
    2. Re:So how often is it used legitimately? by Anonymous Coward · · Score: 4, Informative

      Is there any good reason not to just call the presence of soft hyphens as a reliable indicator of spam and use it as the basis of a spam filter?

      Yes, there is: languages other than English. In e.g. German, the use soft hyphens, while not universal, is becoming more common, at least, and for a reason: longer words that can't automatically be hyphenated by the browser as necessary lead to ugly layout, especially when there's not a lot of horizontal space (e.g. on news sites, which often tend to emulate printed newspapers).

    3. Re:So how often is it used legitimately? by ceoyoyo · · Score: 2, Interesting

      I would think most spam filters would do that automatically as they learn.

      Symantec seems to think people still use character-for-character text matching spam filters that don't learn. Maybe Symantec products do.

    4. Re:So how often is it used legitimately? by AltairDusk · · Score: 2, Insightful

      Shouldn't be too hard for the spam filter to strip the soft hyphens then analyze the URL, I don't see this being useful to the spammers for too long unless I'm missing something.

    5. Re:So how often is it used legitimately? by treeves · · Score: 3, Informative

      So, when I get an email with a link to www.Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz.de, should I avoid clicking the link, or what?

      --
      ...the future crusty old bastards are already drinking the Kool-Aid.
    6. Re:So how often is it used legitimately? by jthill · · Score: 2, Interesting
      DNS permits everything in domain names. You can implement any restrictions you want on names you issue on your own authority, but

      Implementations of the DNS protocols must not place any restrictions on the labels that can be used. In particular, DNS servers must not refuse to serve a zone because it contains labels that might not be acceptable to some DNS client programs.

      --
      As always, all IMO. Insert "I think" everywhere grammatically possible.
  6. Re:Why by maxwell+demon · · Score: 3, Informative

    Why don't modern browsers render this character?

    From Wikipedia:

    "Since it is difficult for a computer program to automatically make good decisions on when to hyphenate a word, the concept of a soft hyphen was introduced to allow manual specification of a place where a hyphenated break was allowed without forcing a line break in an inconvenient place if the text was later re-flowed."

    So a soft hyphen marks a position where you can hyphenate a word. If you don't do it, you of course shouldn't print anything at that position.

    --
    The Tao of math: The numbers you can count are not the real numbers.
  7. Good News! by hardburn · · Score: 2, Interesting

    So now spam filters will pick up on soft hyphens used in URIs inside emails (when was the last time you saw one used legitimately?), making the spam easier to spot.

    --
    Not a typewriter
  8. shy by Anonymous Coward · · Score: 3, Funny

    No good shysters.

  9. Re:Why by Tynin · · Score: 4, Informative

    Why don't modern browsers render this character?

    Two reasons, the first being that HTML 4 specs call for it to not be rendered unless it meets the criteria. Here is the full blurb:

    9.3.3 Hyphenation

    In HTML, there are two types of hyphens: the plain hyphen and the soft hyphen. The plain hyphen should be interpreted by a user agent as just another character. The soft hyphen tells the user agent where a line break can occur.

    Those browsers that interpret soft hyphens must observe the following semantics: If a line is broken at a soft hyphen, a hyphen character must be displayed at the end of the first line. If a line is not broken at a soft hyphen, the user agent must not display a hyphen character. For operations such as searching and sorting, the soft hyphen should always be ignored.

    In HTML, the plain hyphen is represented by the "-" character ( or ). The soft hyphen is represented by the character entity reference ( or )

    The other reason is that the current unicode standard basically says it doesn't support when and where it should be displayed as a hyphen and leaves it open to interpretation of whoever is coding for it. Here is the blurb from the unicode standard on it:

    Hyphenation. U+00AD soft hyphen (SHY) indicates an intraword break point, where a line break is preferred if a word must be hyphenated or otherwise broken across lines. Such break points are generally determined by an automatic hyphenator. SHY can be used with any script, but its use is generally limited to situations where users need to override the behavior of such a hyphenator. The visible rendering of a line break at an intraword break point, whether automatically determined or indicated by a SHY, depends on the surrounding characters, the rules governing the script and language used, and, at times, the meaning of the word. The precise rules are outside the scope of this standard, but see Unicode Standard Annex #14, “Unicode Line Breaking Algorithm,” for additional information. A common default rendering is to insert a hyphen before the line break, but this is insufficient or even incorrect in many situations.

    Contrast this usage with U+2027 hyphenation point, which is used for a visible indication of the place of hyphenation in dictionaries. For a complete list of dash characters in the Unicode Standard, including all the hyphens, see Table 6-3.

    The Unicode Standard includes two nonbreaking hyphen characters: U+2011 non-breaking hyphen and U+0F0C tibetan mark delimiter tsheg bstar. See Section 10.2, Tibetan, for more discussion of the Tibetan-specific line breaking behavior.

  10. Re:Why by KillaGouge · · Score: 4, Insightful

    please ignore my parent post. It seems that GP is correct

    --
    GENERATION 25: The first time you see this, copy it into your sig on any forum and add 1 to the generation. Social exper
  11. SpamAssassin is not vulnerable to this by Khopesh · · Score: 4, Informative

    Just tested this in SpamAssassin with http ://exa ­ mple.com (spaced to evade slashdot's own obfuscation-eliminator) - Result: The URL domain (example.com) is properly extracted without the obfuscation.

    That said, SA is fully capable of detecting the obfuscation attempt itself (using a rawbody rule)...

    --
    Use my userscript to add story images to Slashdot. There's no going back.
  12. The Wrongest Part by SuperKendall · · Score: 5, Funny

    The thing that really grates on the nerves, is using a soft-hypen to sell Viagra.

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
  13. Re:Why by Man+Eating+Duck · · Score: 5, Informative

    I'm a pretty IT-savvy guy, but WHAT IS that bloody character?

    Say you're laying out a book. You have the word Sauerkraut at a line wrap, but it is broken into Sauerk-raut because your layout software don't know where to break it. You then put in a soft hyphen between r and k, this indicates to your software that this word should be broken there. It turns into Sauer-kraut which is correct.

    Later you get angry with the Sauerkraut and call it "bloody Sauerkraut". Now the whole word will be at the next line, and the soft hyphen won't show because your software doesn't need to break the word. Thus you can insert these freely without fretting about words containing a hyphen later on, they'll only be rendered when used as a hint.

    HTH

    --
    Are you a grammar Nazi? I'm trying to improve my English; please correct my errors! :)
  14. Not always by pavon · · Score: 4, Informative

    It is only supposed to be rendered when the word is split across multiple lines.

    For example if your text was "super­cali­fragilistic­expialidocious" then all of the following are valid rendering depending on where the render decides to start a new line:

    supercalifragilisticexpialidocious

    or

    supercalifragilistic-
    expialidocious

    or

    supercali-
    fragilistic-
    expialidocious

  15. Re:Why by modecx · · Score: 3, Funny

    Speaking of bloody sauerkraut, I think there was some sort of hyphen-depression when the inventors of the German language decided it would be fun to glue adjectives and nouns together. i.e. when I see something like: unabhaengigkeitserklaerungen, I have an nigh-irresistible urge to shout Gesundheit!

    I'm still not sure why the nazis went to all of the trouble of building cipher-machines. The language looks sufficiently jumbled from the start.

    --
    Constitutional rights may be respected, repealed, or modified; but they must never be ignored.
  16. terrible summary by Laxori666 · · Score: 3, Funny

    Is it just me or is this summary terrible? Every sentence says the same thing, just slightly reworded. In the summary, it's as if each new sentence doesn't give any additional information, but it's worded as if it does. Researchers have found that this summary is repetitive. Some say this can indicate the repetitiveness of a summary.

  17. Re:Why by JSG · · Score: 2, Insightful

    Where one in English might use a series of adjectives plus a noun a German would use a single agglomerative word - what is your problem?

    Deutsch is a sufficiently sophisticated language without your assistance.

    It doesn't work the same as your native tongue - get a life and stop trolling my forum - twat.

  18. Re:Why by stjobe · · Score: 2, Insightful

    This is purely senseless and is a mark of poor language design.

    Languages (in general) aren't designed, they evolve. Which makes your (all too long-winded) point quite moot.

    --
    "Total destruction the only solution" - Bob Marley