Spammers Using Soft Hyphen To Hide Malicious URLs

← Back to Stories (view on slashdot.org)

Spammers Using Soft Hyphen To Hide Malicious URLs

Posted by timothy on Thursday October 7, 2010 @09:32AM from the conservative-in-what-you-accept dept.

Trailrunner7 writes with this excerpt from ThreatPost illustrating the ongoing Spy-vs.-Spy battle between spammers and the rest of us: "Spammers have jumped on the little-used soft hyphen (or SHY character) to fool URL filtering devices. According to researchers, spammers are larding up URLs for sites they promote with the soft hyphen character, which many browsers ignore. Spammers aren't shy about jumping humans flexible cognitive abilities to slip past the notice of spam filters (H3rb41 V14gr4, anyone?). ... The latest trend involves the use of an obscure character called the soft hyphen or 'SHY' character to obscure malicious URLs in spam messages. Writing on the Symantec Connect blog, researcher Samir Patil said that the company has seen recent spam messages that insert the HTML symbol for the soft hyphen to obfuscate URLs for Web pages promoted by the spammers."

162 comments

Min score:

Reason:

Sort:

H3rb41 V14gr4? by MrEricSir · 2010-10-07 09:33 · Score: 4, Insightful

I never got the leet speak in spam thing. Sure, it might get past the filter, but who can read it? Are they trying to sell drugs to script kiddies?

--
There's no -1 for "I don't get it."
1. Re:H3rb41 V14gr4? by caffeinemessiah · 2010-10-07 09:43 · Score: 3, Insightful
  
  I never got the leet speak in spam thing. Sure, it might get past the filter, but who can read it? Are they trying to sell drugs to script kiddies?
  I don't know about you, but I can't stop trying to figure out what word they're trying to represent with the symbols. For example, I know the second word in your subject means viagra, but what is "H3rb41"? Oh..."herbal". It's naturally (perhaps unknowingly) targeted towards geeks and puzzle-solvers, which perhaps isn't the worst market to target available-without-human-contact penis drugs towards.
  
  --
  An old-timer with old-timey ideas.
2. Re:H3rb41 V14gr4? by MysteriousPreacher · 2010-10-07 09:49 · Score: 2, Interesting
  
  I never understood how it actually worked, except as you suggested, the script kiddy crowd are heavily in to giving money to strangers in exchange for uber zomg epic sexual prowess.
  Maybe I'm old fashioned, but I'm kind of reluctant to whip out my credit card to buy something from a company that employs mittens-wearing illiterates to write their adverts. Sure I'll eat at a Chinese restaurant with an amusingly translated menu, but that's a little different.
  
  --
  -- Using the preview button since 2005
3. Re:H3rb41 V14gr4? by Anonymous Coward · 2010-10-07 09:51 · Score: 0
  
  Dude, you can't read that? Man, you missed out as a kid.
  Calculators > leet speak.
4. Re:H3rb41 V14gr4? by maxwell+demon · 2010-10-07 10:02 · Score: 3, Insightful
  
  I thought the only situation where you need Viagra is exactly human contact (in the most literal meaning of the word).
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
5. Re:H3rb41 V14gr4? by Anonymous Coward · 2010-10-07 10:04 · Score: 0
  
  H3rb41 = Herbal
6. Re:H3rb41 V14gr4? by clarkkent09 · 2010-10-07 10:10 · Score: 1
  
  Funny, I read it immediately as herbal viagra. I guess different people's brains may be handling the job of reading differently. Reminds me of Richard Feynman's "experiments" with reading and counting at the same time etc: http://www.youtube.com/watch?v=Cj4y0EUlU-Y
  
  --
  Negative moral value of force outweighs the positive value of good intentions.
7. Re:H3rb41 V14gr4? by commodore64_love · 2010-10-07 10:21 · Score: 3, Funny
  
  I think this photograph is appropriate. And I'm happy to say: No I can't read it.
  http://media.ebaumsworld.com/picture/strober/get_laid.jpg
  
  --
  "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
8. Re:H3rb41 V14gr4? by froggymana · 2010-10-07 10:24 · Score: 1
  
  "wh47 7h3 h3|| d035 7h47 54y?" This might be the phrase you were looking for to describe your conundrum....
  
  --
  "To prevent this day from getting any worse, I'll just read ERROR as GOOD THING" 1GJU8xLuDKDxEs4KLf8fAGyptoDsqvEsBT
9. Re:H3rb41 V14gr4? by Beale · 2010-10-07 10:44 · Score: 1
  
  I hear you can get better by grinding.
10. Re:H3rb41 V14gr4? by Obfuscant · 2010-10-07 11:11 · Score: 1
  
  I never understood how it actually worked, except as you suggested, the script kiddy crowd are heavily in to giving money to strangers in exchange for uber zomg epic sexual prowess.
  Never watched late night cable channels, have we? Does the word "Extenze" ring a bell? Those ads are taking the word "ubiquitous" to a whole new level, and proving that "skank hoes" ain't just on the street corner anymore. Well, ok, they DO go out and do "man on the street" interviews, where amazingly enough, every man they come across is a satisfied user and every woman is a satisfied usee, and they all crow about "the certain part of the male anatomy".
  Too bad they ain't talking about their brains.
11. Re:H3rb41 V14gr4? by LambdaWolf · 2010-10-07 11:14 · Score: 1
  
  You know how, even though only a tiny fraction of a percent of people actually respond to spam by buying the product, sending the spam is so cheap that it's still profitable to do so? I always assumed that the incomprehensible leetspeak just tacks on another factor of 0.1 or so but the resulting sales still justify the spamming. Or at least that's what the spammers think; who knows whether they're being economically rational.
  
  --
  "This algorithm runs in constant time. Come on, 2,147,483,648 is a constant..."
12. Re:H3rb41 V14gr4? by bill_kress · 2010-10-07 11:57 · Score: 1
  
  Since I didn't see anyone mention it I'll take the chance you weren't just making a joke and give you the answer:
  The point of the character substutitions / "Leet speek" is exactly the same as the URL mangling they are talking about here--getting around spam filters. When the spam filters know to search for anything with "Viagra" in it, you just change that to V1agra, problem solved. The next week go with V1@gra.
  The people buy this stuff are likely not to mind.
13. Re:H3rb41 V14gr4? by Anonymous Coward · 2010-10-07 12:58 · Score: 0
  
  You just blew my mind.
14. Re:H3rb41 V14gr4? by Anonymous Coward · 2010-10-07 13:21 · Score: 2, Funny
  
  I like my hyphen hard, not soft. That's why I use H3rb41 V14gr4.
15. Re:H3rb41 V14gr4? by Anonymous Coward · 2010-10-07 14:40 · Score: 0
  
  If you can't read it, how do you know what emotion to have? And I wouldn't be proud of not having basic reasoning skills, since leet-speak is simple symbol substitution.
16. Re:H3rb41 V14gr4? by rwa2 · 2010-10-07 15:31 · Score: 1
  
  http://megatokyo.com/strip/9
  "Does anyone here speak 133+?"
  Probably MegaTokyo's finest moment, and blatantly ripped from "Airplane!" at that :P
17. Re:H3rb41 V14gr4? by kmoser · 2010-10-07 16:14 · Score: 1
  
  Scrip kiddies. Get it? Work with me here.
18. Re:H3rb41 V14gr4? by Nyder · 2010-10-07 16:51 · Score: 1
  
  I never got the leet speak in spam thing. Sure, it might get past the filter, but who can read it? Are they trying to sell drugs to script kiddies?
  I figured someone that falls for that crap, bad spelling and all, sort of deserves losing their money.
  
  --
  Be seeing you...
19. Re:H3rb41 V14gr4? by EdIII · 2010-10-07 17:14 · Score: 1
  
  I think it's worse than that. Their attempt to fool the pattern recognition algorithms on the scanners is understandable, but self-defeating for more reasons than their supposed target audience.
  Even script kiddies have a modicum of intelligence to know not to have anything to do with spam. Unless they are trying to develop skills to use in that industry.....
  In any case, normal people can read that stuff pretty easily, but at the same time it sets of alarms that it is unsafe. Ironically, the same pattern recognition abilities that humans have works against them at the same time it helps them to evade spam/malware detection. I see that "leet speek" in an email subject and immediately believe it is some kind of spam. Why not? 99.99999% of the time it turns out be spam. My pattern recognition is working just fine.
  This URL trick is tragically, and comically, unwise on the part of the spammers. If a browser cannot understand it, the HTML link cannot work... then how does the victim click the link to be victimized?
  Furthermore, last time I checked there widely spread SPAM detection algorithms that attempt to look for "leet speek" already. I would have to look through the SPAM logs on my mail server but I believe that detection of those character patterns adds something .5-1 of weight to the SPAM score for a message.
20. Re:H3rb41 V14gr4? by xaxa · 2010-10-07 21:23 · Score: 1
  
  Most British people should be able to read it, since substituting numbers for letters is very common on car number plates (driven by the kind of people who are willing to pay extra for this kind of thing). The pattern of letters and numbers is restricted -- if you buy a new car now, it will be __60 ___, where the blanks are letters. You might choose to pay extra for WE60 FST ('we go fast'). Yesterday I saw "MU51 CFX" -- "music fx". Pre-2000, there was a different format, so M4 TT = matt, M477 HEW = matthew etc. The easier to read the more expensive the plate.
21. Re:H3rb41 V14gr4? by Jason+Levine · 2010-10-08 00:48 · Score: 1
  
  I find it humorous and perhaps a bit ironic that spammers, in an attempt to bypass spam filters, will often render their message completely unreadable. Congratulations! You've beaten my spam filter. However, even if there was a sliver of a chance that you could fool me into giving you money, you blew it because I have no clue what your message says.
  
  --
  My sci-fi novel, Ghost Thief, is now available from Amazon.com.
22. Re:H3rb41 V14gr4? by dwinks616 · 2010-10-08 01:00 · Score: 1
  
  We can only hope that no only do they lose their money, but the fake "viagra" causes their member to shrivel and fall off too.
23. Re:H3rb41 V14gr4? by xaositects · 2010-10-08 01:12 · Score: 1
  
  I don't think actually selling a product is always the intent. I suspect some people spam for the sake of wasting time and network resources to accomplish some moral imperative unknown to us.
24. Re:H3rb41 V14gr4? by Abstrackt · 2010-10-08 01:24 · Score: 2, Informative
  
  I thought the only situation where you need Viagra is exactly human contact (in the most literal meaning of the word).
  There's the rub, so to speak. Most men using viagra don't need it, they just like using it, and nothing prevents them from enjoying it on their own.
  
  --
  They say a little knowledge is a dangerous thing, but it's not one half so bad as a lot of ignorance. - Terry Pratchett
25. Re:H3rb41 V14gr4? by vux984 · 2010-10-08 09:48 · Score: 1
  
  "I need help. I need you to get the doctor. I got some bad pain in my chest, I need my pills."
  Priceless. :p
Why by KillaGouge · 2010-10-07 09:34 · Score: 1

Why don't modern browsers render this character?

--
GENERATION 25: The first time you see this, copy it into your sig on any forum and add 1 to the generation. Social exper
1. Re:Why by Anonymous Coward · 2010-10-07 09:39 · Score: 0
  
  Why don't modern browsers render this character?
  Agreed, I just made a test case and the current firefox doesn't render it, it just showed the 2 words I had on either side of it with nothing in between like their should have been.
2. Re:Why by TopSpin · 2010-10-07 09:40 · Score: 5, Informative
  
  Why don't modern browsers render this character?
  The character isn't supposed to be rendered. Soft hyphen indicates where to break words if necessary. The hyphens are not rendered if the word doesn't need to be broken.
  
  --
  Lurking at the bottom of the gravity well, getting old
3. Re:Why by maxwell+demon · 2010-10-07 09:43 · Score: 3, Informative
  
  Why don't modern browsers render this character?
  From Wikipedia:
  "Since it is difficult for a computer program to automatically make good decisions on when to hyphenate a word, the concept of a soft hyphen was introduced to allow manual specification of a place where a hyphenated break was allowed without forcing a line break in an inconvenient place if the text was later re-flowed."
  So a soft hyphen marks a position where you can hyphenate a word. If you don't do it, you of course shouldn't print anything at that position.
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
4. Re:Why by war4peace · 2010-10-07 09:47 · Score: 1
  
  I'm a pretty IT-savvy guy, but WHAT IS that bloody character?
  I understood pretty much everything from the summary. Everything BUT the character :) - Fail. As far as the summary is concerned.
  
  --
  ...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
5. Re:Why by KillaGouge · 2010-10-07 09:53 · Score: 1, Informative
  
  according to here the ISO 8859-1 standard calls for that specific character to be rendered.
  
  --
  GENERATION 25: The first time you see this, copy it into your sig on any forum and add 1 to the generation. Social exper
6. Re:Why by Tynin · 2010-10-07 09:54 · Score: 4, Informative
  
  Why don't modern browsers render this character?
  Two reasons, the first being that HTML 4 specs call for it to not be rendered unless it meets the criteria. Here is the full blurb:
  
  9.3.3 Hyphenation
  
  In HTML, there are two types of hyphens: the plain hyphen and the soft hyphen. The plain hyphen should be interpreted by a user agent as just another character. The soft hyphen tells the user agent where a line break can occur.
  
  Those browsers that interpret soft hyphens must observe the following semantics: If a line is broken at a soft hyphen, a hyphen character must be displayed at the end of the first line. If a line is not broken at a soft hyphen, the user agent must not display a hyphen character. For operations such as searching and sorting, the soft hyphen should always be ignored.
  
  In HTML, the plain hyphen is represented by the "-" character ( or ). The soft hyphen is represented by the character entity reference ( or )
  
  The other reason is that the current unicode standard basically says it doesn't support when and where it should be displayed as a hyphen and leaves it open to interpretation of whoever is coding for it. Here is the blurb from the unicode standard on it:
  
  Hyphenation. U+00AD soft hyphen (SHY) indicates an intraword break point, where a line break is preferred if a word must be hyphenated or otherwise broken across lines. Such break points are generally determined by an automatic hyphenator. SHY can be used with any script, but its use is generally limited to situations where users need to override the behavior of such a hyphenator. The visible rendering of a line break at an intraword break point, whether automatically determined or indicated by a SHY, depends on the surrounding characters, the rules governing the script and language used, and, at times, the meaning of the word. The precise rules are outside the scope of this standard, but see Unicode Standard Annex #14, “Unicode Line Breaking Algorithm,” for additional information. A common default rendering is to insert a hyphen before the line break, but this is insufficient or even incorrect in many situations.
  
  Contrast this usage with U+2027 hyphenation point, which is used for a visible indication of the place of hyphenation in dictionaries. For a complete list of dash characters in the Unicode Standard, including all the hyphens, see Table 6-3.
  
  The Unicode Standard includes two nonbreaking hyphen characters: U+2011 non-breaking hyphen and U+0F0C tibetan mark delimiter tsheg bstar. See Section 10.2, Tibetan, for more discussion of the Tibetan-specific line breaking behavior.
7. Re:Why by KillaGouge · 2010-10-07 09:55 · Score: 4, Insightful
  
  please ignore my parent post. It seems that GP is correct
  
  --
  GENERATION 25: The first time you see this, copy it into your sig on any forum and add 1 to the generation. Social exper
8. Re:Why by Man+Eating+Duck · 2010-10-07 10:10 · Score: 5, Informative
  
  I'm a pretty IT-savvy guy, but WHAT IS that bloody character?
  Say you're laying out a book. You have the word Sauerkraut at a line wrap, but it is broken into Sauerk-raut because your layout software don't know where to break it. You then put in a soft hyphen between r and k, this indicates to your software that this word should be broken there. It turns into Sauer-kraut which is correct.
  Later you get angry with the Sauerkraut and call it "bloody Sauerkraut". Now the whole word will be at the next line, and the soft hyphen won't show because your software doesn't need to break the word. Thus you can insert these freely without fretting about words containing a hyphen later on, they'll only be rendered when used as a hint.
  HTH
  
  --
  Are you a grammar Nazi? I'm trying to improve my English; please correct my errors! :)
9. Re:Why by DrugCheese · 2010-10-07 10:43 · Score: 1
  
  From Wikipedia:
  "Since it is difficult for a computer program to automatically make good decisions on when to hyphenate a word, the concept of a soft hyphen was introduced to allow manual specification of a place where a hyphenated break was allowed without forcing a line break in an inconvenient place if the text was later re-flowed."
  Exactly it's purpose. It's never supposed to be shown, only to give an the browser client an easy way to break the word for dynamic width.
  
  --
  *DrugCheese rants*
10. Re:Why by Anonymous Coward · 2010-10-07 11:24 · Score: 0, Funny
  
  "Exactly it's purpose. It's never supposed to be shown,"
  Sort of like the useless apostrophe you hammered into a harmless possessive pronoun? Why do so many people not get it? They can master dozens of abstruse and recondite subjects, but it's means IT IS defeats them. Why?
11. Re:Why by miro2 · 2010-10-07 11:50 · Score: 1
  
  Bad design -- there is no reason to embed display-control strings in the character set. Is there a "start-italics" character? No, of course not. Software should keep track of hyphenation positions the same way it keeps track of other formatting positions.
12. Re:Why by Anonymous Coward · 2010-10-07 11:59 · Score: 0
  
  So, in how many words do I have to add these characters because the layout software won't do it for me?
  Between spam and layout issues, computers sure do make life easier.
13. Re:Why by modecx · 2010-10-07 12:13 · Score: 3, Funny
  
  Speaking of bloody sauerkraut, I think there was some sort of hyphen-depression when the inventors of the German language decided it would be fun to glue adjectives and nouns together. i.e. when I see something like: unabhaengigkeitserklaerungen, I have an nigh-irresistible urge to shout Gesundheit!
  I'm still not sure why the nazis went to all of the trouble of building cipher-machines. The language looks sufficiently jumbled from the start.
  
  --
  Constitutional rights may be respected, repealed, or modified; but they must never be ignored.
14. Re:Why by mattack2 · 2010-10-07 12:22 · Score: 1
  
  Italics isn't something that just 'happens' when laying out text. Hyphenation is. As one of the other replies said, this is used as a hint, as software doesn't know all of the syllables of words and how to break them.
  (I hadn't heard of it before this article either.)
15. Re:Why by Timmmm · 2010-10-07 12:25 · Score: 1
  
  Well clearly you have to draw the line somewhere. I agree, stuff like \a, and \b obviously don't belong in the character set. But then newlines clearly do, and even spaces are 'display-control strings'. What about tabs?
  I think you're probably right about this soft-hyphen though. It sounds like it is rarely used and creates more problems than it solves.
16. Re:Why by war4peace · 2010-10-07 12:28 · Score: 1
  
  Let me rephrase.
  You read a news entry about "the guy who committed the crime". Never in the summary do they mention the guy's name or what crime he committed, but they emphasize on how dangerous the guy is and how horrible the crime was. Now let me know if that sort of approach doesn't, um, I don't know, miss something essential.
  This is not about me being lazy and not reading the article (I did), but about the summary missing some essential information (it does).
  
  --
  ...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
17. Re:Why by WidgetGuy · 2010-10-07 13:07 · Score: 1
  
   or 
  
  --
  One "Aw, Shit!" is worth 100 "Ata boys!"
18. Re:Why by John+Hasler · 2010-10-07 13:57 · Score: 1
  
  Hyphenating a URL makes no sense. Ones containing this character should be invalid.
  
  --
  Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
19. Re:Why by JSG · 2010-10-07 14:05 · Score: 1
  
  Beautifully put. YIDH
  Cheers
  Jon
20. Re:Why by John+Hasler · 2010-10-07 14:08 · Score: 1
  
  URLs are not words.
  
  --
  Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
21. Re:Why by JSG · 2010-10-07 14:24 · Score: 2, Insightful
  
  Where one in English might use a series of adjectives plus a noun a German would use a single agglomerative word - what is your problem?
  Deutsch is a sufficiently sophisticated language without your assistance.
  It doesn't work the same as your native tongue - get a life and stop trolling my forum - twat.
22. Re:Why by adolf · 2010-10-07 15:33 · Score: 1
  
  I don't think it's generally needed, at all. A line break might be desirous any-
  where in a text; are authors supposed to figure out approx-
  imately where they might be needed? Or should they simp-ly soft-hyphen-ate every-fuck-ing-thing so that it is actu-ally use-ful?
  A browser-side diction-ary might be more bet-ter in most cases.
  
  --
  Kid-proof tablet..
23. Re:Why by Anonymous Coward · 2010-10-07 15:58 · Score: 0
  
  Because it is a white space hinting character, that just suggests where a hyphen should go if the word needs to be split, all the browsers are correct in the way they handle it.
24. Re:Why by modecx · 2010-10-07 17:16 · Score: 1
  
  Well I was merely making jest, but since you asked, that's just the beginning of my problems with the German language--and this is not merely a failing limited to Deutsch, but in fact to most Germanic languages...thankfully, despite the relation, modern English mostly escapes this terrible behavior, compound words usually being limited to a combination of two separate words.
  Besides the fact that one can apparently stream together an arbitrary number adjectives and nouns (the order of which often seems to be immaterial), German has the grammatical gender of Latin based languages, but without the normally sensible, almost always predictable rules for applying gender; however, unlike most modernized Latin based languages, German retains three genders, again complicating issues.
  In modern German, French, Spanish, Portuguese, etc. you have the rational idea of gender agreement. Nouns have a gender class, and through the sound principal of inflection any other words referring to that noun inherit its gender. The Latin based languages are strongly fusional, and most nouns have are marked strongly as to which sex it is. i.e. Abuela vs Abuelo. Pretty tidy. English, being a much less gendered language only deals with gender infliction on pronouns, or some special case items like ships, countries etc. and gender agreement is usually no big deal. In any case, it's pretty easy because the genitive is linked to the actual sex of the pronoun in question.
  German on the other hand? Yeah, you need the gender agreement, but there's no rhyme or reason to the gender of any given noun, and there are NO clues built into the language itself. You must memorize the gender case of each. and. every. word. For example: Schulerin (schoolgirl) is feminine, Weib (wife) is neuter. This is purely senseless and is a mark of poor language design. After only a month of studying Spanish I was confident enough to tackle El ingenioso hidalgo don Quijote de la Mancha with only a few lexicon problems, easily corrected by looking the word up. I'm still not sure I could get through a chapter of my VW's user manual writ in its original language.
  Then, you have writers who have the unfortunate habit of including far too much information into a single sentence. Similar to poor English writers who have a tendency to include entire descriptive paragraphs into (parentheses), except those helpful punctuation, like the hyphen, are also omitted.
  I can't think of a direct parallel of the next phenomenon though: Once you wade through a virtual sea of adjective on adjective on noun grabassery, you finally get to the heart of the whole structure, the verb. It's invariably at the end of the sentence, on yonder page. As far as I can discern, it must be a mechanism to keep the reader in suspense... A sort of linguistic cliffhanger if you will. Of course, these last peeves of mine are not an issue in spoken German, where it would simply be frivolous use of language. All of this is indicative of a language designed to be spoken, and not written--which makes sense, because so precious few were literate whilst it was being created.
  
  --
  Constitutional rights may be respected, repealed, or modified; but they must never be ignored.
25. Re:Why by Anonymous Coward · 2010-10-07 17:30 · Score: 1, Funny
  
  That second apostrophe is fine. It's only the first that needs to be dropped. What I don't understand is who teaches kid's to use apostrophe for use in plural's. Some friend's of mine are real idiot's. Where are kid's learning this from?
26. Re:Why by modecx · 2010-10-07 17:38 · Score: 1
  
  and stop trolling my forum - twat
  Also, what it with you Germans and your megalomania? It wasn't bad enough that you tried to conquer Europe not once but twice, you now have to conquer the internets as well? Pish.
  
  --
  Constitutional rights may be respected, repealed, or modified; but they must never be ignored.
27. Re:Why by gizmod · 2010-10-07 18:33 · Score: 1
  
  It looks like this:
28. Re:Why by Anonymous Coward · 2010-10-07 18:37 · Score: 0
  
  Every language is a mess of half-baked and long forgotten rules, rulettes and necessary rote memorization.
  Problem is that you only notice this in other languages. I'll bet that most people don't know the rules for their own language. It just "doesn't sound right" when you do it wrong. Well, study a foreign language long and hard enough and you'll get that same level of proficiency. I agree that German is a hard language. Not in everyday life, you can get a working grasp on it in a few weeks. But mastering the subtle details takes years and years.
  However, that same effort has to go into learning English as a second language for most people! Don't think English is all that easy. There are more exemptions than you can shake a stick at, and it is one of the very few languages where you can't guess the pronunciation of a word out of the spelling.
  Anyway, as a Europian, I'm used to having to use another language when moving a few hundred miles in any direction, so maybe it isn't as baffling for me.
29. Re:Why by Anonymous Coward · 2010-10-07 18:55 · Score: 0
  
  For comparison could it be considered the opposite of a hard space?
30. Re:Why by stjobe · 2010-10-07 19:37 · Score: 2, Insightful
  
  This is purely senseless and is a mark of poor language design.
  Languages (in general) aren't designed, they evolve. Which makes your (all too long-winded) point quite moot.
  
  --
  "Total destruction the only solution" - Bob Marley
31. Re:Why by Man+Eating+Duck · 2010-10-07 20:30 · Score: 1
  
  Software should keep track of hyphenation positions the same way it keeps track of other formatting positions.
  Yes, a good hyphenation dictionary for every language would be nice. Along with special characters in the character set that indicate which language we are currently using (quotes, book titles and so on in a separate language from the main text is common in many settings).
  Oh wait, that would never work well, it'd complicate the character set, increase application sizes by orders of magnitude in many cases, and we *don't* have good hyphenation dictionary for a great many languages.
  In browsers it might be superfluous as you have no idea which words would be broken and needs a hyphen, in many cases you'd be better off dispensing with hyphenation altogether for readability. But for text in fixed formats it seems to me that the soft hyphen is a good solution to this particular problem after all. Part of my job description is laying out books, and the soft hyphen is an elegant solution. Once in a while you need it even with the very best DTP software.
  
  --
  Are you a grammar Nazi? I'm trying to improve my English; please correct my errors! :)
32. Re:Why by srussia · 2010-10-07 21:07 · Score: 1
  
  Hui!
  
  --
  Set your phasers on "funky"!
33. Re:Why by Anonymous Coward · 2010-10-07 22:21 · Score: 0
  
  Ah, yes, the invention of the German language. If memory serves, it was Dr. Umlaut and Dr. Eszett who did most of that work, in a Hanover laboratory, ca. 1904. Before then, of course, Germans just spoke English with a hard accent. Apparently, it was Umlaut who came up with the idea of gluing words together, or Wörterzusammenbespannung, as he called it. Eszett, meanwhile, worked in the other direction, by dividing words, when he came up with the separable prefix.
  Anyway, it's easy for you to judge, let's see your language inventions, then we'll see whose language looks jumbled!
34. Re:Why by Jesus_666 · 2010-10-07 22:45 · Score: 1
  
  Of course DTP software might use macros to do it even more elegantly (like LaTeX \hyphenation{}) but that basically amounts to a global search-and-replace that replaces all occurrences of the un-soft-hyphenated word with a soft-hyphenated version. And it realy doesn't work without a macro-capable markup language.
  
  --
  USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
35. Re:Why by Jesus_666 · 2010-10-07 23:09 · Score: 1
  
  thankfully, despite the relation, modern English mostly escapes this terrible behavior, compound words usually being limited to a combination of two separate words.
  Unlike German, where we always use the pathological case.
  
  Besides the fact that one can apparently stream together an arbitrary number adjectives and nouns (the order of which often seems to be immaterial), German has the grammatical gender of Latin based languages, but without the normally sensible, almost always predictable rules for applying gender; however, unlike most modernized Latin based languages, German retains three genders, again complicating issues.
  German isn't a Romance language and neither is English. Both are Germanic languages, although English has undergone extensive crossbreeding with French, which has lead to a mix of Germanic and Romance elements. Germanic and Romance are completely different branches of the Indo-European language family.
  
  Then, you have writers who have the unfortunate habit of including far too much information into a single sentence. Similar to poor English writers who have a tendency to include entire descriptive paragraphs into (parentheses), except those helpful punctuation, like the hyphen, are also omitted.
  Overly long sentences are not language-specific and indeed the case can be made (and easily defended) that one can write impossibly long, but entirely correct - in both syntax and semantics -, sentences in any given language that doesn't heavily restrict the way sentences work, which most languages, such as English, don't.
  
  As for punctuation being removed: I find the opposite to be true, with English being short on both hyphens (compound words are often just a string of separate words connected only by context) and commas (German tends to use commas to separate clauses; English only does so in certain cases).
  
  It's invariably at the end of the sentence, on yonder page.
  Invariably are English sentences constructed like this one. Grating it is, but since only one specific word ordering is ever allowed by all languages, nothing can ever be done about it. Or not. German does allow sentences á la "Er geht in den Wald" ("He walks into the forest") and they occur all the time.
  
  All of this is indicative of a language designed to be spoken, and not written--which makes sense, because so precious few were literate whilst it was being created.
  Unlike English, which was developed when the University of Oxford Linguistics Faculty decided to crossbreed German and French in the 1960s. No, wait, it evolved (and in some cases skipped evolution that happened to German) over centuries and much of the evolution happened in the Middle Ages when literacy was an optional skill for most.
  
  --
  USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
36. Re:Why by mapkinase · 2010-10-07 23:28 · Score: 1
  
  If you do not render, you should sanitize the underlying URL.
  
  --
  I do not believe in karma. "Funny"=-6. Do good and forbid evil. Yours, Oft-Offtopic Flamebaiting Troll.
37. Re:Why by Man+Eating+Duck · 2010-10-08 00:50 · Score: 1
  
  So, in how many words do I have to add these characters because the layout software won't do it for me?
  We use Indesign at our publishing company, its hyphenation is usually quite good. It *will* miss in a few words for each book, especially in other languages than English, in those cases soft hyphens are very practical. It's OK, native English speakers tend to forget that we use other languages in most of the world :)
  
  --
  Are you a grammar Nazi? I'm trying to improve my English; please correct my errors! :)
38. Re:Why by maxwell+demon · 2010-10-08 01:20 · Score: 1
  
  hyphenating-urls.can-make-sense.com.
  however.soft-hyphenating.cannot.org.
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
39. Re:Why by operagost · 2010-10-08 02:08 · Score: 1
  
  Since URLs are not formatted data, browsers should render EVERY character, whether it's with a special code (just as some older word processors and modern text editors show control characters). This could have been avoided. It should be considered a defect and treated as such.
  
  --
  
  Gamingmuseum.com: Give your 3D accelerator a rest.
40. Re:Why by operagost · 2010-10-08 02:11 · Score: 1
  
  agglomerative
  
  Thanks-- I'm going to try to use this word in a sentence today.
  
  --
  
  Gamingmuseum.com: Give your 3D accelerator a rest.
41. Re:Why by operagost · 2010-10-08 02:13 · Score: 1
  
  I thought the problem here wasn't rendering of text, but rendering of the URL in the address or status bars. As such, the HTML spec doesn't apply as it is only relevant to page rendering. Browsers should display the entire content of a URL, replacing control or special characters with symbols if necessary.
  
  --
  
  Gamingmuseum.com: Give your 3D accelerator a rest.
42. Re:Why by ppz003 · 2010-10-08 02:21 · Score: 1
  
  And this is why I have my mail clients display emails in plain text only.
  As for web based email, well they should probably add that setting sometime.
Shy Spammers by biryokumaru · 2010-10-07 09:34 · Score: 3, Funny

Spammers are getting more shy? That's a relief!

--
When you're afraid to download music illegally in your own home, then the terrorists have won!
1. Re:Shy Spammers by QRDeNameland · 2010-10-07 14:36 · Score: 1
  
  Spammers are getting more shy? That's a relief!
  Careful what you wish for...instead of getting rickrolled you might end up being Kajagoogled.
  
  --
  Momentarily, the need for the construction of new light will no longer exist.
What is it? by iONiUM · 2010-10-07 09:34 · Score: 5, Funny

Why didn't they just put the friggin character in the summary so I didn't have to read the article?
Anyways, according to the article it's &shy, which looks "identical to a regular hyphen." Are you happy now slashdot? I had to read TFA to find that out.
1. Re:What is it? by Anonymous Coward · 2010-10-07 09:40 · Score: 0
  
  Rendering it would depend on your screen resolution, browser window size, and things like that... Why that is will be left as an exercise for the reader.
2. Re:What is it? by mclearn · 2010-10-07 09:44 · Score: 1, Insightful
  
  And, as TFA points out, this is a valid tactic because "modern browsers" (ambiguously non-committal) do not render the character. I assume, spammers are writing URLs as: http://microsoft.com/ (eg. m-i-crosoft.com, but rendered onscreen as microsoft.com). This, of course, tricks folks into thinking that they are clicking on a valid microsoft.com URL.
3. Re:What is it? by mclearn · 2010-10-07 09:47 · Score: 1
  
  Nope. My bad. Since the SHY character is used as a way to dictate line breaks, it obviously isn't used to forge domains or anything similar. Presumably then, the SHY is used to ensure that patterns such as "Viagra" can be written as Viagra and not be caught by simple pattern matchers? TFA was light on actual examples.
4. Re:What is it? by Anonymous Coward · 2010-10-07 09:52 · Score: 0
  
  They get you to click on a link.
  You look at your address bar, and it says "chase.com"
  But, you're really at "c-h-a-s-e.com" with unrendered soft hyphens.
  Getting the picture yet?
5. Re:What is it? by PhrostyMcByte · 2010-10-07 10:07 · Score: 1
  
  If a word is wrapped to the next line, it shows a hyphen. Otherwise it's hidden. That's what a soft hyphen does.
6. Re:What is it? by maxwell+demon · 2010-10-07 10:25 · Score: 3, Insightful
  
  Are registrars accepting domain names with soft hyphens? And if so, why? It's rather obvious that such domain names would only be used for fraud.
  IMHO registrars should not accept any non-printable character in domain names.
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
7. Re:What is it? by DrugCheese · 2010-10-07 10:36 · Score: 1
  
  Luckily you read it and I garnered the information from your post.
  Now we can all make informed opinions.
  
  --
  *DrugCheese rants*
8. Re:What is it? by sexconker · 2010-10-07 11:55 · Score: 3, Insightful
  
  Yes, they are. Otherwise this story wouldn't exist.
  Why? Because they like money, and don't give a fuck.
  Of course they should not accept any non-printable characters.
  Registrars are pretty much only half a step above the spammers in terms of ethics / shittiness.
9. Re:What is it? by Anonymous Coward · 2010-10-07 12:03 · Score: 0
  
  OMFG, he read the article! Children, don't look at him, keep away!
10. Re:What is it? by adtifyj · 2010-10-07 13:39 · Score: 1
  
  Do you have any evidence that registrars are accepting soft hyphens in domain names?
  soft hyphens supposed to be eliminated in the Name Preparation phase.
  The soft hyphen is being used by spammers to obfuscate their URLs in order to get past anti-spam rules.
  This slashdot story appears to be misinformation and a plug for Symantec.
11. Re:What is it? by SolitaryMan · 2010-10-07 19:13 · Score: 1
  
  The problem is not registrars (at least not in this case). The problem is that security filters people have in place, can warn them when they go to evilsite.com, but they fail to do that when people go to ev&shyil&shysi&shyte.co&shym. This is because filters fail to remove this character, while browsers are silently removing it and sending people to evilsite.com without any warning.
  
  --
  May Peace Prevail On Earth
12. Re:What is it? by arth1 · 2010-10-08 00:25 · Score: 2, Informative
  
  Yes, they are. Otherwise this story wouldn't exist.
  Who modded this insightful?
  No, domain registrars don't allow soft hyphens in domain name registrations. Give me a single example of a registered domain with a soft hyphen in it.
  As the other user said, this is used for masking URLs in e-mails, and thus trying to thwart spam filters.
  
  Get your cheap Viagra at http://www.chshyeapvishyagra.com/
  
  This will render as
  
  Get your cheap Viagra at http://www.cheapviagra.com/
  
  Yet many spam filters will not trigger on the words "cheap" and "Viagra", and the e-mail has a greater chance of getting through filters.
  A similar technique was used in the past for domain names and e-mail addresses, back when e-mail actually followed the standards, and no-one read their e-mail in a HTML browser.
  is a legal e-mail address that is interpreted as cheapviagra@hotmail.com, but many newer programs don't follow the standards and will barf, which is why spammers seldom do this anymore.
13. Re:What is it? by maxwell+demon · 2010-10-08 01:49 · Score: 2, Informative
  
  So the problem is browsers silently removing them.
  A browser should never modify an URL. Especially it should not remove invalid characters. It should give an error if you try to go to an invalid URL. It should not try to be "helpful" here.
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
14. Re:What is it? by c++0xFF · 2010-10-08 03:28 · Score: 1
  
  The soft hyphen is not a very outgoing character. Indeed, it experiences severe apprehension when surrounded by others and may hide itself from view.
  However, like most others with acute diffidence, the shy character can be brought out when placed in a more comfortable position, such as the end of the line, instead of the middle.
15. Re:What is it? by sexconker · 2010-10-08 04:54 · Score: 1
  
  RTFA.
  And registrars are not even allowed to disallow certain characters.
16. Re:What is it? by sexconker · 2010-10-08 04:57 · Score: 1
  
  How about you RTFA?
  And registrars are not even allowed to disallow certain characters.
17. Re:What is it? by stridebird · 2010-10-08 05:33 · Score: 1
  
  MOD PARENT UP!
  Good post, thanks...I think you have nailed it. Most of these comments and TFA seem to be misleading or deluded.
  Summary:
  1 - domain names don't exist with soft hyphens in them. So you can't fake a link with hidden characters in it.
  2 - the character simply doesn't render (in a browser FF/Chrome/IE8) unless needed ... a link with a &shy in it is the same as a link without the &shy in it, as the link (href) part of the tag is never going to be displayed in text flow.
  3 - the malicious use is to evade filtering. That's all!!
  4 - also, in my testing on thunderbird only, isn't even interpreted.
  Bit meh about this now. Over hyped.
18. Re:What is it? by arth1 · 2010-10-08 09:00 · Score: 1
  
  How about you RTFA?
  And registrars are not even allowed to disallow certain characters.
  DNS isn't allowed to disallow certain characters.
  Registrars most certainly can, and do.
  TFA doesn't give an example of a domain with a soft hyphen in the name.
  Again, give me just ONE example. That's all it takes to prove me wrong.
  This is about using soft hyphens to hide the real domain name in e-mails, not about using a real domain name that actually contains a soft hyphen. It'd be rather useless for this purpose, because the RFC for URIs doesn't allow for unicode characters in URLs, and the RFC for SMTP doesn't allow them in e-mail addresses. Which is why we have the IDNA translation system for web URLs in the first place, which translates e.g. http://www./åbo.fi/ to the real name of http://www.xn--bo-xia.fi/
  (see RFC3454, RFC3491 and RFC3492 for more details)
  Also, see http://mct.verisign-grs.com/ and try using a soft hyphen (character code 173 or 00AD in UTF-16).
  Watch the error message you get.
So how often is it used legitimately? by JesseL · 2010-10-07 09:41 · Score: 4, Interesting

Is there any good reason not to just call the presence of soft hyphens as a reliable indicator of spam and use it as the basis of a spam filter?

--
"Prefiero morir de pie que vivir siempre arrodillado!"
1. Re:So how often is it used legitimately? by Cinder6 · 2010-10-07 09:45 · Score: 2, Funny
  
  Well, I know I've certainly never seen it!
  
  --
  If you can't convince them, convict them.
2. Re:So how often is it used legitimately? by Anonymous Coward · 2010-10-07 09:48 · Score: 4, Informative
  
  Is there any good reason not to just call the presence of soft hyphens as a reliable indicator of spam and use it as the basis of a spam filter?
  Yes, there is: languages other than English. In e.g. German, the use soft hyphens, while not universal, is becoming more common, at least, and for a reason: longer words that can't automatically be hyphenated by the browser as necessary lead to ugly layout, especially when there's not a lot of horizontal space (e.g. on news sites, which often tend to emulate printed newspapers).
3. Re:So how often is it used legitimately? by Anonymous Coward · 2010-10-07 09:51 · Score: 0
  
  Yeah, since according to a poster above, the soft hyphen is used to indicate where to hyphenate words should they appear at the edge of the page, there's no reason to have them in a URL. seems an easy way to block.
4. Re:So how often is it used legitimately? by Relic+of+the+Future · 2010-10-07 10:01 · Score: 1
  
  Seriously. It's not valid to use a space in a URL, why would it be valid to use a soft-hyphen? If I type "goo gle.com" in the address bar in (Firefox 3.6.10), it takes me to "google.com"; this should be handled the same way.
  
  --
  Those who fail to understand communication protocols, are doomed to repeat them over port 80.
5. Re:So how often is it used legitimately? by ceoyoyo · 2010-10-07 10:05 · Score: 2, Interesting
  
  I would think most spam filters would do that automatically as they learn.
  Symantec seems to think people still use character-for-character text matching spam filters that don't learn. Maybe Symantec products do.
6. Re:So how often is it used legitimately? by Whyte+Panther · 2010-10-07 10:11 · Score: 1
  
  Why is it possible to even register a domain with a soft hyphen? Oh wait, Domain registrars are greedy.
7. Re:So how often is it used legitimately? by AltairDusk · 2010-10-07 10:23 · Score: 2, Insightful
  
  Shouldn't be too hard for the spam filter to strip the soft hyphens then analyze the URL, I don't see this being useful to the spammers for too long unless I'm missing something.
8. Re:So how often is it used legitimately? by AvitarX · 2010-10-07 10:26 · Score: 1
  
  Fair enough.
  Just prevent visitors from going to a URL with one.
  I assume that;s the problem, something like pncbank.com looking like pncbank.com.
  Simply throw up a big big warning (like that ever works) that says you are not visiting site "pncbank.com" you are visiting "p-ncbank.com".
  Or simply just block them, I see no purpose of the character in a URL.
  
  --
  Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
9. Re:So how often is it used legitimately? by AvitarX · 2010-10-07 10:27 · Score: 1
  
  ah crap, ate my markup ...something like p&SHY;ncbank.com looking like pncbank.com. ...
  
  --
  Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
10. Re:So how often is it used legitimately? by treeves · 2010-10-07 10:34 · Score: 3, Informative
  
  So, when I get an email with a link to www.Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz.de, should I avoid clicking the link, or what?
  
  --
  ...the future crusty old bastards are already drinking the Kool-Aid.
11. Re:So how often is it used legitimately? by maxwell+demon · 2010-10-07 10:38 · Score: 1
  
  I think this is a mistake. "goo gle.com" should lead to an error.
  If there is anything which should be treated by the stricted rules possible, then it's URLs.
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
12. Re:So how often is it used legitimately? by maxwell+demon · 2010-10-07 10:39 · Score: 1
  
  That's because it's so shy, it always hides.
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
13. Re:So how often is it used legitimately? by Anonymous Coward · 2010-10-07 10:43 · Score: 0
  
  Yes but in a url?
14. Re:So how often is it used legitimately? by Pharmboy · 2010-10-07 10:48 · Score: 1
  
  I think this is a mistake. "goo gle.com" should lead to an error.
  If you use the DNS servers of most ISPs, instead of error, you end up either going to a custom search page to which they are getting paid for the ads, or an offer to buy the domain.
  
  --
  Tequila: It's not just for breakfast anymore!
15. Re:So how often is it used legitimately? by maxwell+demon · 2010-10-07 10:58 · Score: 1
  
  Given that it's not a valid domain name (as opposed to a valid, but unregistered domain name), it shouldn't even hit the DNS server. The browser should detect it as invalid, and give you an error straight away. It should not pass it on, neither literally, nor altered.
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
16. Re:So how often is it used legitimately? by Anonymous Coward · 2010-10-07 11:03 · Score: 0
  
  Why is it possible someone as stupid as you can operate a computer?
  RTFA
17. Re:So how often is it used legitimately? by TheRaven64 · 2010-10-07 11:08 · Score: 1, Interesting
  
  Hyphenating long words in German is pretty easy. Long words are usually compound words and they are correctly broken at the word boundaries. Hyphenating English automatically is actually a harder problem than hyphenating German, and is made harder by the fact that English and American have different rules for when you are supposed to hyphenate.
  
  --
  I am TheRaven on Soylent News
18. Re:So how often is it used legitimately? by Pharmboy · 2010-10-07 11:23 · Score: 1
  
  I agree, but ISPs are making money off of it and not likely to give up that extra revenue, which is one more reason I just point to one of my own DNS servers at the office instead.
  
  --
  Tequila: It's not just for breakfast anymore!
19. Re:So how often is it used legitimately? by jthill · 2010-10-07 11:41 · Score: 2, Interesting
  
  DNS permits everything in domain names. You can implement any restrictions you want on names you issue on your own authority, but
  
  Implementations of the DNS protocols must not place any restrictions on the labels that can be used. In particular, DNS servers must not refuse to serve a zone because it contains labels that might not be acceptable to some DNS client programs.
  
  --
  As always, all IMO. Insert "I think" everywhere grammatically possible.
20. Re:So how often is it used legitimately? by mattack2 · 2010-10-07 12:35 · Score: 1
  
  is made harder by the fact that English and American have different rules for when you are supposed to hyphenate
  Can you explain how this is relevant to this soft hyphen issue? That is, I read the relevant part of the wikipedia article (http://en.wikipedia.org/wiki/Hyphenation), and it does mention different rules (e.g. "co-worker" in British English, but "coworker" in American English). However, that is not related to the soft hyphen issue, which is related to hyphenation for justification reasons.
21. Re:So how often is it used legitimately? by Anonymous Coward · 2010-10-07 14:33 · Score: 0
  
  Ok, once we start speaking German, I'll remove das filter.
22. Re:So how often is it used legitimately? by KingAlanI · 2010-10-07 16:08 · Score: 1
  
  that is an actual German word, says Google - their compound concoctions never cease to amaze me. :)
  
  --
  I listen to both RIAA and non-RIAA stuff if I like the music, tangential business/politics nonwithstanding.
23. Re:So how often is it used legitimately? by TheRaven64 · 2010-10-07 21:22 · Score: 1
  
  It's relevant to the point that I was replying to - that soft hyphens are more common in German because it is harder to insert hyphens automatically. Soft hyphens are just hints to an automatic hyphenation system. It will attempt to find a place to break the word. In American, the correct place to do this is based on phonetics, while in English it is based on the derivation of the word. It's quite hard to do this correctly automatically (although it's quite easy to do it almost-correctly and use soft hyphens to correct it). In German, the preferred hyphenation points are between words in compound words, and a simple wordlist can be used to find these relatively quickly. This means that hyphens are more likely to be useful in English than in German - you only need them in cases where automatic hyphenation isn't going to do the right thing.
  
  --
  I am TheRaven on Soylent News
24. Re:So how often is it used legitimately? by Registered+Coward+v2 · 2010-10-07 22:15 · Score: 1
  
  So, when I get an email with a link to www.Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz.de, should I avoid clicking the link, or what?
  No, just think about what you are having for dinner and be sure you prepare and eat it within the rules.
  
  --
  I'm a consultant - I convert gibberish into cash-flow.
25. Re:So how often is it used legitimately? by Anonymous Coward · 2010-10-08 01:58 · Score: 0
  
  doesn't matter... the site appears to be down anyway.
26. Re:So how often is it used legitimately? by Nicopa · 2010-10-08 02:18 · Score: 1
  
  FWIW I've been using in webpages for years...
27. Re:So how often is it used legitimately? by Anonymous Coward · 2010-10-08 02:18 · Score: 0
  
  Long words are usually compound words and they are correctly broken at the word boundaries.
  True, but that only works for humans. A web browser, in order to do this, would have to a) identify the language of a web page and b) consult a word list to find out how to break the word into compounds; not necessarily practical.
  There ARE ways to do automatic hyphenation in German without resorting to wordlists (check out the babel package for LaTeX), but it still wouldn't be very practical for web browsers.
28. Re:So how often is it used legitimately? by treeves · 2010-10-08 07:42 · Score: 1
  
  That's because it doesn't exist (AFAIK). I just made it up, mostly as a joke. And that's also why I was surprised to get modded informative. It is a real German word, as I just googled "long german words" and found it in one of the top results. It means "Beef meat labeling transfer overview" and refers to an actual law.
  
  --
  ...the future crusty old bastards are already drinking the Kool-Aid.
Obligatory Kajagoogoo by Anonymous Coward · 2010-10-07 09:47 · Score: 0, Interesting

Tongue-tied, (I'm) short of breath, don't even try
Try a little harder
Something's wrong, you're not naive, you must be strong
Ooh, baby, try
Hey girl, move a little closer.
You're
CHORUS:
Too shy shy
Hush hush, eye to eye
Too shy shy
Hush hush, eye to eye
Too shy shy
Hush hush, eye to eye
Too shy shy
1. Re:Obligatory Kajagoogoo by Anonymous Coward · 2010-10-07 11:06 · Score: 0, Informative
  
  Good job. Came here to blame Kajagoogoo for this.
  Offtopic mod needs to hush hush.
Good News! by hardburn · 2010-10-07 09:48 · Score: 2, Interesting

So now spam filters will pick up on soft hyphens used in URIs inside emails (when was the last time you saw one used legitimately?), making the spam easier to spot.

--
Not a typewriter
shy by Anonymous Coward · 2010-10-07 09:50 · Score: 3, Funny

No good shysters.
1. Re:shy by Anonymous Coward · 2010-10-07 10:25 · Score: 0
  
  No good shysters.
  No good shylocks.
HTML 5 by Dthief · 2010-10-07 09:54 · Score: 0, Offtopic

The advent of HTML 5 within the next couple years - and browsers that support it - is expected to solve many of these problems, because that specification finally standardizes how HTML code should be parsed by Web browsers, rather than leaving it up to individual platform vendors to develop their own interpretations of how the code should be parsed.

I bet 4pple is behind the spam trying to further promote 1-1TML-5.......$t3v3 J0b$ l0v3s v14gr4

--
www.RacquetUp.org - Helping Detroit Youth
SpamAssassin is not vulnerable to this by Khopesh · 2010-10-07 10:02 · Score: 4, Informative

Just tested this in SpamAssassin with http ://exa  mple.com (spaced to evade slashdot's own obfuscation-eliminator) - Result: The URL domain (example.com) is properly extracted without the obfuscation.
That said, SA is fully capable of detecting the obfuscation attempt itself (using a rawbody rule)...

--
Use my userscript to add story images to Slashdot. There's no going back.
1. Re:SpamAssassin is not vulnerable to this by Zarel · 2010-10-08 01:19 · Score: 1
  
  Erm, I don't think you know what "properly extracted" means.
  example.com doesn't lead to example.com, it leads to xn--example-nka.com, so if you extract the former instead of the latter, you're doing it wrong.
  
  --
  Want a high quality FOSS RTS game? Try Warzone 2100!
HTML 5 will save us by rudy_wayne · 2010-10-07 10:05 · Score: 1

from the article:

The advent of HTML 5 within the next couple years is expected to solve many of these problems, because that specification finally standardizes how HTML code should be parsed by Web browsers, rather than leaving it up to individual platform vendors to develop their own interpretations of how the code should be parsed.
Note the use of the phrase "should be". I see this a lot when reading about HTML 5. Are people really that stupid and/or naive that they think all browsers will follow the HTML 5 spec exactly? (yes Microsoft I'm looking at you)
1. Re:HTML 5 will save us by blair1q · 2010-10-07 10:14 · Score: 1
  
  standards can only tell you how things should be.
  they may tell you how things will break for you if you try to do things in a non-standard way, but they have no power to force you not to try.
2. Re:HTML 5 will save us by mortonda · 2010-10-08 03:50 · Score: 1
  
  Note the use of the phrase "should be".
  
  Yes, "should be". SHOULD has a very different meaning from MUST in standards documents.
The Wrongest Part by SuperKendall · 2010-10-07 10:08 · Score: 5, Funny

The thing that really grates on the nerves, is using a soft-hypen to sell Viagra.

--
"There is more worth loving than we have strength to love." - Brian Jay Stanley
1. Re:The Wrongest Part by blair1q · 2010-10-07 10:15 · Score: 1
  
  That's why we have /.
2. Re:The Wrongest Part by Anonymous Coward · 2010-10-07 10:22 · Score: 0
  
  They started out using hard hyphens, but when the hard hyphen had been around for four hours they contacted their doctor who quickly helped convert it to a soft hyphen.
3. Re:The Wrongest Part by Anonymous Coward · 2010-10-07 11:42 · Score: 0
  
  The thing that really grates on the nerves is using a soft-hypen to sell Viagra.
  You're doing it wrong; graters shouldn't go down there regardless of whether you're hard or soft...
4. Re:The Wrongest Part by amicusNYCL · 2010-10-07 11:55 · Score: 1
  
  I thought it was funny when I started receiving spams for "Viagra soft tabs". I thought that's what it was supposed to cure.
  
  --
  "Our two-party system is like a bowl of shit looking at itself in a mirror." - Lewis Black
Jumping humans, eh? by noidentity · 2010-10-07 10:11 · Score: 1

Spammers aren't shy about jumping humans flexible cognitive abilities

I'm not too worried about flexible cognitive abilities, but jumping humans do bother me too.
Journalism at its best by T+Murphy · 2010-10-07 10:18 · Score: 0, Offtopic

Let's take the summary (copy+pasted from the article) and summarize each sentence (compare to the summary if you think I exaggerate):

Spammers are using the soft hyphen. Spammers are using the soft hyphen. ...Spammers are using the soft hyphen. "Spammers are using the soft hyphen."
Yes, each sentence says a little bit more, but it still repeats the same fact over and over. I usually don't complain about slashdot summaries, but this was honestly painful to read. Just because you copy+pasted what TFA says doesn't mean it's okay.

--
My webcomic
1. Re:Journalism at its best by apoc.famine · 2010-10-07 10:45 · Score: 1
  
  That's why slashdot has editors, instead of being just a user-submitted story aggregation site...
  
  --
  Velociraptor = Distiraptor / Timeraptor
Not always by pavon · 2010-10-07 10:27 · Score: 4, Informative

It is only supposed to be rendered when the word is split across multiple lines.
For example if your text was "supercalifragilisticexpialidocious" then all of the following are valid rendering depending on where the render decides to start a new line:

supercalifragilisticexpialidocious
or

supercalifragilistic-
expialidocious
or

supercali-
fragilistic-
expialidocious
Why would soft-hyphen be legal in a URL? by JSBiff · 2010-10-07 10:33 · Score: 1

I don't get how you can put a soft-hyphen in a URL and have it work? It's a formatting character, it shouldn't ever be legal to have a formatting character as part of a URL? Are they registering domain-names with soft-hyphens in the name? Or is this a case where the browser 'helpfully' replaces a soft hyphen with a regular hyphen when actually trying to connect to the web server, but for some reason does NOT render they hyphen when displaying it to the user? It seems like the browser should behave consistently - if it doesn't render a hyphen when displaying it, it shouldn't render a hyphen when making the DNS lookup.
1. Re:Why would soft-hyphen be legal in a URL? by Anonymous Coward · 2010-10-07 11:03 · Score: 0
  
  It doesn't, which is the whole point. To a filter, slash&shydot&hy.com doesn't look like slashdot.com, so it won't block it. To the browser, however, your URL passes through easily.
  It seems like the simple solution is just to make filters ignore non-printable characters when looking for suspect URLs, but I don't write spam filters.
2. Re:Why would soft-hyphen be legal in a URL? by John+Hasler · 2010-10-07 11:20 · Score: 1
  
  Mod parent up. Why in the hell is such a character allowed in URLs at all?
  
  --
  Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
3. Re:Why would soft-hyphen be legal in a URL? by elronxenu · 2010-10-07 14:27 · Score: 1
  
  Indeed; it seems to be a good example why extending the DNS character set was not such a good idea. DNS should have readable domain names, and avoid using different characters with identical glyphs and non-printing characters.
My theory by Tolkien · 2010-10-07 10:37 · Score: 1

My theory of how this submission reads is as follows.

--
how is babby formed?
What is TFA talking about? by z-j-y · 2010-10-07 10:43 · Score: 1

It doesn't make any sense, probably just some nonsense to scare people into buying their product(symantec).
By using softhyphen in IDN, a spammer can get a spoof domain that looks like an authentic domain on screen. But how can that fool any spam filters?
1. Re:What is TFA talking about? by John+Hasler · 2010-10-07 11:23 · Score: 1
  
  Maybe it fools Symantic's spam filters.
  
  --
  Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
"to solve many of these problems" - ? by Anonymous Coward · 2010-10-07 12:20 · Score: 0

The advent of HTML 5 within the next couple years - and browsers that support it - is expected to solve many of these problems, because that specification finally standardizes how HTML code should be parsed by Web browsers, rather than leaving it up to individual platform vendors to develop their own interpretations of how the code should be parsed.

In other words, security flaws will be baked in so that vendors can't fix them without breaking standards compliance?
terrible summary by Laxori666 · 2010-10-07 13:19 · Score: 3, Funny

Is it just me or is this summary terrible? Every sentence says the same thing, just slightly reworded. In the summary, it's as if each new sentence doesn't give any additional information, but it's worded as if it does. Researchers have found that this summary is repetitive. Some say this can indicate the repetitiveness of a summary.
1. Re:terrible summary by lousyd · 2010-10-08 00:02 · Score: 1
  
  Yes, thank you. I was scanning the comments to this story just hoping that somebody would say that. 5 times they repeated the same thing!
  
  --
  If aspiration is a virtue, achievement cannot be a vice.
Can we use this ourselves? by Anonymous Coward · 2010-10-07 13:37 · Score: 0

Does this work on, say, "Smart"Filter?
Would be lovely if there was a way to craft your own bookmarks to bypass the damn filters. Say, by using a Greasemonkey script or something...
I'm there by SuperKendall · 2010-10-07 14:46 · Score: 1

Or I would have been, if that had been a real URL. Who could resist that?
Especially if you're into überwachungsaufgaben. Not that I am! No sir. Just saufgaben-curious.

--
"There is more worth loving than we have strength to love." - Brian Jay Stanley
SHY characters don't get noticed... by Riktov · 2010-10-07 16:07 · Score: 1

...unless they get a lucky break.
I think I learned that in junior high.
Very easy to block by Raul+Acevedo · 2010-10-07 16:07 · Score: 1

This is so easy to defeat with a simple regular expression in your spam filter. I doubt spammers will continue with this tactic for long.

--
In a real emergency, we would have all fled in terror, and you would not have been notified.
Better: by Anonymous Coward · 2010-10-07 17:49 · Score: 0

Soft Hymen for Viagra and Porn Spammers!
Makes it easier and less bloody Mary....
Schneier warned about it years ago... by Anonymous Coward · 2010-10-07 19:49 · Score: 0

The number of attacks made possible by allowing non-ASCII chars in URLs is huge. Bruce Schneier warned about it years ago. This ain't the first type of such an "attach". This definitely ain't the last. And security-conscious people knew about it years and years, if not decades, ago. The problem is that in our industry there are still people that don't understand that there are a *lot* of bad guys out there that will try to break havoc. For fun or for profit. As long as there will be decision makers who don't understand that security should the first concern we're fuxx0red.
David Freer symantec by ginbot462 · 2010-10-08 02:03 · Score: 1

Submitted by strelaoz on Thu, 10/07/2010 - 6:50pm.
David Freer (VP, Symantec Consumer Business Units - Norton, APJ) is a BIG LIAR! He lied to me for more than two and half years for my true feelings, time, and money. Also kept saying I am the only one in his life. Even this year on Feb. 2, he used company line to lead me to have phone sex with him. Until I found out there’s some other woman, he made up another lie and finally admitted he’s been living with her for a year. Later, I realized they were all lies. He actually has married March 2009. And now he just totally disappeared and not answering any phone calls, acting like “hit & run” irresponsible baby. Can you trust someone like this, with no ethics and integrity? The more unbelievable things are David Freer newly-wed wife - SUZY WALSHAM, she shamefully admitted she was the third person who broke up David Freer & his ex 12 years relationships, and mocking at me as the 3rd "unsuspected" person, as she agreed with his husband’s behaviors!!!!!! SHAME ON both of you, DAVID FREER & SUZY WALSHAM!!!!!!! (THEY BOTH WORK FOR SYMANTEC).
esides, do you know how hurtful it is? I've been thinking about committing suicide everyday and every second. Not to mention, crying all the times. 30 pounds loss. can hardly walk to the outside, and all the humiliation & harshness I have been taken behind the scene. I can survive till now, only because I took lots of medicines everyday. Can you understand the suffering?
And I have no ideas how he can get pastor wife's email address? through hacking into my email account? Did he sound sorry? or threatening? How dare of him to talk about God now. And what a sneaky liar! "IF I FLIRTED?" What does that mean? He said "I am hard. Are you wet? I like to use my tongue to lick you ..........
And I don't even have the chance to slap him or yell at him on the face or through the phone. They are the ones who made the mistakes, also the ones yelled at me and threatened me. There are no three of us needed to pick the pieces of the mess. ONLY ME. He totally recovered and has been doing lots of publicities. It seems like he's doing this to @#$%&! me off or to hurt me more by saying "see, I am OK. The company didn't mind. Suzy didn't mind. If you wanna die, it's not my business."
Apple is the best! Symantec shall look after their own shit business. Oh, right. Symantec is launching Norton everywhere for smartphone users. No Wonder. But what if the rumor is true that
IS IT TRUE THAT A SECURITY COMPANY IS ACTUALLY A HACKER COMPANY?
: David Freer
: RE: this is the way you say sorry?
: lily
: 2010216,,2:20
Firstly I apologise to both you and the pastors wife for having dragged you into a very large mess in my life.
I again very sincerely apologise for the hurt and damage I have done to you. I understand that your trust in men and your feelings are destroyed.
If I flirted with you on February 2nd then I did not mean to do that, I also apologise – it was never my intention to do that.
David
From:
Sent: Tuesday, February 16, 2010 2:00 PM
To: David Freer
Cc: lily
Subject: RE: this is the way you say sorry?
Have you thought from the beginning all i want is a sincere apology? but you wouldn't give it to me.
If i want a revenge, i would do it a week ago - to tell suzy, to tell janice, to tell people who work at symantec.
AND HOW DID YOU GET PASATOR WIFE’S EMAIL?
--- 10/2/16 ()David Freer
: David Freer
: RE: this is the way you say sorry?
:
: lily
: 2010216,,6:14
God talks about compassion, if you have any it is time to allow all three of us to pick up the pieces of this mess and move on with our lives.
David
From: Sent: Monday, February 15, 2010 11:34 PM
To: David Freer
Subject: this is the way you say sorry?
i don't know what you told suzy about to let her believe you & even say @#$%&! off on me. it shall

--
Atlas Shrugged : Thematic Story :: Battlefield Earth : Organized Religion
Not Rendered? by Spez · 2010-10-08 02:51 · Score: 1

Alright I did some testing in Chrome, Firefox, Internet Explorer and Opera (all latest versions)
simple link, with a SHY character in the link. Depending on the format of the link (with a http, or without), All 4 browser did the exact thing we expected them to do : The link either showed the hyphen and linked to a hyphened page correctly (when I say "Showed", I mean, that if you mouse-over the link, you see the hyphen in the task bar) or just didn't show it and didn't link to a hyphened page.
So, I don't see the problem in here... i call this FUD.

--
I wouldn't mind you in my head, if you weren't so clearly mad -Lews Therin Telamon
1. Re:Not Rendered? by Spez · 2010-10-08 02:56 · Score: 1
  
  Ah I re-re-read the summary. It's only to go through the Spam filtering system... forget I said anything.
  
  --
  I wouldn't mind you in my head, if you weren't so clearly mad -Lews Therin Telamon
PROOF that SpamAssassin is not vulnerable to this by Khopesh · 2010-10-08 07:09 · Score: 1

I'm a SpamAssassin developer, both on the official project and on a commercial derivative. Others on my commercial team independently verified my claim as well. I highly doubt we're all wrong.
That said, I decided to FULLY dig into the issue to see what's going on under the hood. In addition to a careful analysis of the spamassassin debug output, I spun up Wireshark to look at the actual DNS queries. Since SA knows what example.com is ([84234] dbg: uridnsbl: domain example.com in skip list), I had to use something else. I ran two tests: one on a nonexistant domain as separated by the SHY character in a manner that doesn't result in delimiting the latter portion into an existing domain, and then one as a heavily-spammed domain with a SHY character again breaking it into a nonexisting domain.
Analysis: Debug output from SA 3.3.1 and SVN trunk (rev 1005948, build reports version as 3.4.0-r929098) displays the SHY character (which my terminal renders as a space but after a paste, my browser does not) and uses \255 in its DNS lookups (older versions display it as \173 and I didn't capture the raw lookups). In addition to looking for the domain with the SHY character, it also queries without the SHY character. My live test confirmed a hit in URIBL for the defanged domain and no hit for the obfuscated one (I didn't test a real sample of the obfuscation -- presumably, the blocklists can learn the obfuscated domain in addition to the defanged one). I see no reference to the IDN syntax you mentioned.
A sample of the debug output (tweaked to convert the SHY to a space so it is distinguishable on the web):
$ grep obinemedic ~/url.eml.output |grep -i uribl |sed 's/r.obin/r obin/' Oct 8 15:06:55.950 [1570] dbg: dns: providing a callback for id: 64792/r obinemedic.ru.multi.uribl.com/A/IN Oct 8 15:06:55.950 [1570] dbg: async: starting: URI-DNSBL, DNSBL:multi.uribl.com.:r obinemedic.ru (timeout 15.0s, min 3.0s) Oct 8 15:06:55.955 [1570] dbg: dns: providing a callback for id: 40779/robinemedic.ru.multi.uribl.com/A/IN Oct 8 15:06:55.955 [1570] dbg: async: starting: URI-DNSBL, DNSBL:multi.uribl.com.:robinemedic.ru (timeout 15.0s, min 3.0s) Oct 8 15:06:55.985 [1570] dbg: uridnsbl: domain "robinemedic.ru" listed (URIBL_DBL_SPAM): 127.0.1.2 Oct 8 15:06:55.987 [1570] dbg: uridnsbl: domain "robinemedic.ru" listed (URIBL_AB_SURBL): 127.0.0.102 Oct 8 15:06:55.988 [1570] dbg: uridnsbl: domain "robinemedic.ru" listed (URIBL_WS_SURBL): 127.0.0.102 Oct 8 15:06:55.988 [1570] dbg: uridnsbl: domain "robinemedic.ru" listed (URIBL_JP_SURBL): 127.0.0.102 Oct 8 15:06:55.989 [1570] dbg: uridnsbl: domain "robinemedic.ru" listed (URIBL_SC_SURBL): 127.0.0.102 Oct 8 15:06:56.121 [1570] dbg: async: completed in 0.162 s: URI-DNSBL, DNSBL:multi.uribl.com.:robinemedic.ru Oct 8 15:06:56.121 [1570] dbg: uridnsbl: domain "robinemedic.ru" listed (URIBL_BLACK): 127.0.0.2 Oct 8 15:06:56.122 [1570] dbg: async: completed in 0.167 s: URI-DNSBL, DNSBL:multi.uribl.com.:r obinemedic.ru Oct 8 15:06:57.980 [1570] dbg: async: timing: 0.162 . DNSBL:multi.uribl.com.:robinemedic.ru Oct 8 15:06:57.980 [1570] dbg: async: timing: 0.167 . DNSBL:multi.uribl.com.:r obinemedic.ru

--
Use my userscript to add story images to Slashdot. There's no going back.