Has Google Broken JavaScript Spam Munging?
Baxil writes "For years now, Javascript munging has been a useful tool to share email addresses on the Web without exposing them to spammers. However, Google is now apparently evaluating Javascript when assembling summary text for web pages' listings, and publishing the un-munged email addresses to the world; and spammers have started to take advantage of this kind service." Anyone else seen this affecting their carefully protected email addresses?
You keep using that word. I do not think it means what you think it means.
CAn'T CompreHend SARcaSm?
Seriously, queue the obfuscation != security thing. If your email address is carefully protected, it is not displayed on a web page, obfuscated or not.
Really with the development of better OCR technologies and such comes the elimination of e-mail security by obscurity. If you don't want spam either A) have a decent spam filter (I don't think I've had a single piece of spam pass through G-mails filter and only one false positive) or B) don't share your e-mail address. Those are the only two ways to prevent spam that will continue to work.
Taxation is legalized theft, no more, no less.
That should be the title. That is, if it were newsworthy. Which it isn't.
Error 001
Security Scan and Virus Detection do not work with your operating system.
They're also parsing hex/decimal character entity armoured e-mails in exactly the same way. While not as safe as JavaScript, these have been mostly-invulnerable to spambots as well and are used by default in some web-based applications, like the Mercurial hgweb.cgi/hgwebdir.cgi scripts.
This can easily be fixed, and should be right away. If Google is turning JavaScript into text output, they can easily parse that output (just like the spammers currently are) and see if the text contains an e-mail address. And if it does, they should omit it from search results (unless the address was originally plain text and not obfuscated, in which case they can assume the author wants it searchable).
Dear Google:
Welcome to the "Impossible to do anything right" club.
Regards,
Wal-Mart,
Microsoft,
G. W. Bush
WTF? Over?
Spammers know how to process javascript too. The benefits of having Google index the page as a client would see it far outweighs someones belief that they were 'safe' from spammers.
Just because you disagree doesn't make it offtopic or flamebait.
Google's becoming a spammer's paradise. gmail is quickly moving up the ranks as the mail service of choice for comment spammers (for acct verification). You can see the top spam domains at StopForumSpam.com. I think gmail would be at the top except for others' longer history. Nearly all spammers nowadays use gmail on the forum I watch after.
.. for two email addresses that have been posted (rendered through javascript) since early 2007. I am talking 100+ spams per day instead of 5-10.
Since the sites where the addresses are posted have not gone up in popularity, I was wondering what happened. This theory provides a plausible explanation.
JoeB
http://layoffsupportnetwork.com
So much content on the web these days is spat out by document.write(), I'm not surprised at all that google evaluates certain javascripts in order to get any content to index.
Even done a "View Source" on a google mail or google maps page? The web is now javascript.
nowadays, half of the pages I try to visit don't render at all without javascript. Somtimes the main content is missing (you just get the headline, the links that go on the sides, and the ads), somtimes it's just a blank page. It seems like all these traditional news organizations just _have_ to be "web 2.0" to appear relevant again.
Google needs to index the page, they don't have much choice.
--
Stay tuned for some shock and awe coming right up after this messages!
The spammers WILL get your email address. Be it web trawling, google searchers, or stealing email address off of compromised computers, the spammers will get, and then resell, you email address.
Trying to keep the spammers from getting your email address is a lost cause, and not a battle worth fighting.
Test your net with Netalyzr
Your email address will almost certainly get out. If not by a spambot then through an unscrupulous merchant.
That's why spam filtering is better than email hiding. Gmail's spam filter, for example, is very good. I get spam in my Inbox about once a quarter.
Google's job is to turn human-readable pages into machine-searchable pages. So it will always seek to expand what it can read: images, Flash, JavaScript, etc.
It's best not to hide in the direction that technology is advancing.
I assume if you load your obfuscation code from script.js and put script.js in robots.txt that you will be safe, although that is sort of a pain.
What would be nice is if google created a new tag in the lines of rel="nofollow" which would be an in-line way to keep the engine from seeing content.
When things get complex, multiply by the complex conjugate.
http://mailhide.recaptcha.net/
weinersmith
They're probably spidering the "generated source" of a page, which means any content rendered with JavaScript is now spiderable and indexible [sic, I'm sure] -- what your eyes can see, Google will index.
Google is doing a lot of new things now, like listening to audio files and changing speech to text. Complete parsing of SWF files, including media and XML files called by the SWF. They can pull text off of images as well.
If everything else seems to fail, try these convoluted, big captchas generated based on Graphviz graphs. Link : http://snowflakejoins.com/grapcha/index?text=slashdot
http://revj.sourceforge.net
Considering how much machines belong to one or another botnet, encripting it somehow in a web page dont protect your email from a contact that belongs directly or indirectly to one. As soon you start to try to use your email, the risks of getting in some spammers list start to raise. And that includes posting it in a web page under any encryption and get a mail from a visitor (probably the main reason of posting there the email) which machine is already owned.
... will it mung?
Privacy is terrorism.
A better method is to have a Contact Me form that doesn't display your e-mail address anywhere on it. Yes, you'll get spammers filling it out, but you can cut down on those with some simple techniques. For example, make a "Phone Number" field and set the CSS display attribute to none. Normal users won't see this field and won't fill it out. Spam-bots will see it and attempt to fill it out. Then, have your submission script silently fail to send to e-mail if the "Phone Number" is filled out. (If you toss an error, the spammer might figure out the trick.) No method is fool-proof, of course, but this is much better than putting your e-mail address on your webpage and hoping that someone doesn't de-mung it.
My sci-fi novel, Ghost Thief, is now available from Amazon.com.
Like this is the only way to protect emails published on the web from spambots... I could list a few, but my favourite is to publish a well done (not easily broken) captcha img in some host I have easy acess to. If I want I can just delete that image, or add an expiration timers so that after a few days that image won't show up anymore.
Google Wave may mean that web sites and blogs will be implemented as embedded Waves. The wave demo at http://wave.google.com/ shows how this would work for blog comments & galleries.
In this demo, they basically hint that because of this, Google is rethinking what embedding & javascript mean on a page because they envision a future where the content can and will live anywhere and won't be represented by static HTML.
As you point out, this is already happening, albeit to a lesser degree than I think Google anticipates.
In order to prevent SPAMbots once and for all, you should require that everyone interested in contacting you first drive to the next geohash http://www.wiki.xkcd.com/geohashing/Main_Page in the region of your choosing, wearing a lumberjack outfit and carrying a case of jolt cola.
Then, and only then, does the read quest begin...
-Taylor
Worldwide Military budgets: $2100 billion. Worldwide Space Exploration budgets: $38 billion. Really, world? Really?
When they learn to subtract pi, we're all hosed.
So, the ability to process JavaScript outside of a browser is somehow Google-specific?
Frankly, this was inevitable. If JavaScript is processed by a computer in one application, it can be processed by a computer in another application, and the latter may be more Evil(tm) than the former. So what if Google stops parsing JavaScript in their summaries? How hard is it for the spammers to get a parser of their own and not even touch Google's servers?
That's why I've never really trusted those munging hacks.
Demanding constant attention will only lead to attention.
My simple method seems pretty well help up - I just randomly use the HTML control characters instead of the ASCII character in some spots. e.g. instead of "e", use or
from 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
to 45 2F 6E 40 3C DF 10 71 4E 41 DF AA 25 7D 31 3F
How about "pay to email"?
I register with a pay-to-email site, and give it my actual email address. It gives me my new publicly visible email address. Anyone who wants to can send me an email through this service if they pay me an amount of money that I set. After I receive the email, I can refund the sender. The pay-to-email site takes a 10% cut on all un-refunded emails.
Sound like a winner?
Education is the silver bullet.
Do they mention these ever?
I assume that, if a human can figure out the e-mail address, a spammer can too. After all, if nothing else they'll simply hire an IT sweatshop over in Asia or Africa to scan the pages for addresses at a dollar an hour or a nickel an address. JS obfuscation doesn't even take that, if your browser can evaluate the Javascript then the spammer's page-scraping software can too. So I assume that the only obfuscation that'll work is one that renders a human unable to read the address, at which point why bother putting the address there at all. And if all else fails, the well-known spammer tactic of just shotgunning every possible e-mail address in a domain will find anything their other tricks didn't (just like the auto-dialers that dial every number in a given exchange will find even unlisted, unpublished, known-only-to-the-owner phone numbers).
The only viable defense is at the mail-server level. The spammers will get your address, so prepare your mail server to deal with them. Reject connections from known residential/dial-up netblocks that shouldn't be contacting your mail server directly. Apply SpamAssassin and other filtering to incoming mail. Use reliable blacklists (evaluate their policies yourself against your own tolerance for false positives, and remember that the spammers don't want you to use any blacklists because using them stops them from spamming). Use what your filters learn by blocking netblocks that generate too many filter-rejected messages. You can't stop them from sending that first SYN, but you can decide whether to SYN-ACK or NAK them.
It's a hack. When moving technology forward, you need to pick your battles when asking "should we not improve this service? It will break the hacks"?
All in all, you are displaying text on a page. Google's job is to take text that humans can read and make it text that humans can find.
I agree, spam is a problem, but this kind of obfuscation will only get you so far. It's the same argument that can be said about MP3s. If you can hear it, we can steal it. Same as "if you can see it."
Spam stinks, but in the end, even with these tricks, you are making your address public. Public information will be harvested by mortals and robots alike.
I don't think the spammers got his email address from Google. I mean, to do that they'd have to send a fairly narrow query to Google -- something like 'chibi jesus' -- and then scrape the results ... just scraping the cached page wouldn't help -- that contains JS, not the email address. Plus, I imagine Google would notice if a bot started sending lots of search queries its way.
It's far more likely that spammer bots are now actively processing JS. As others on this thread have pointed out, it ain't hard to do.
Go somewhere random
Looks like you're already hosed.
We hope your rules and wisdom choke you / Now we are one in everlasting peace
What happened to "Do No Harm"?
The spammers WILL get your email address. Be it web trawling, google searchers, or stealing email address off of compromised computers, the spammers will get, and then resell, you email address. Trying to keep the spammers from getting your email address is a lost cause, and not a battle worth fighting.
I don't get any spam at my personal account. No blacklisting or bayesian filters necessary. I just don't give my personal e-mail address to companies, nor do I display it on the Internet. I also have a sneakemail address that I only give to companies, and that one actually doesn't receive spam either. Go figure.
History. I haven't updated my front page in years.
You last updated that page 8 months ago.
Like this:
www.certainkey.com/dm.
Needs some crypto computation to decrypt. User needs to click on a "Get my Email" button. Works on iphone.
I publically list my email whenever I need to. If I want someone to email me something, I say, "Send it to itoltz@gmail.com". In fact, if HTML is allowed where ever I'm writing that, I'll even be so kind as make it a mailto link (i.e. <a href='mailto:itoltz@gmail.com'>itoltz@gmail.com</a>).
And you know what? I almost never get spam in my inbox. I'd say a piece squeaks through Gmail's filters every few months (though when it does, I usually seem to get 2-3 similar spams over the course of a day or two).
Granted, not everyone has the option of using gmail, and for those who do not everyone is comfortable with the idea of using it. That's fine. But the point is, if gmail is that good at filtering out spam, anyone else can be too.
Isn't there a META tag which tells Google's bot not to archive your site (Google Cache) in the first place? I believe it's the No-Archive tag.
Has this ever been done before: Instead of posting your email on a website, post a link to another website which stores contact information and require users to fill out a captcha before they see your email address. (I realize this is obtrusive, and time consuming. Just curious)
http://mailhide.recaptcha.net/
Problem solved, with the only remaining CAPTCHA that hasn't been automatically broken.
This only works for as long as spammers don't care about it. I think anyone who can figure out the HTML resulting from javascript, can also figure out the style of an element.
What's really funny about this problem is that we used to talk about using captchas to tell the robots apart from the meatbags, so that you could discriminate against robots. But now people want the robots to make sense of their page (so that they get referrals from Google) but they don't want the robots to make sense of their page (so that their email box doesn't get referrals from spambot). You're on the web or you're not. Choose.
"Believe me!" -- Donald Trump
I've found it rather effective so far to obfuscate my email address via an intermediate PHP or Perl script. You just have the script redirect to a "Mailto" location and the browser handles it normally. Unfortunately, the visitor will get a blank page, but bots seem confused by it. I guess it's only a matter of time until they figure this trick out, too; but so far I haven't gotten any spam in several years and have no spam blocking software. (Posting anonymously so no jerk-offs think it's funny to submit my address to spammers.)
For everyone's information: the page the author links to as the one that has javascript munging also has a noscript tag with the email out in the open. Guess what Google and spammers' email-crawlers really do? ;)
Apple has "Mac vs PC", Microsoft has "Laptop Hunters", Linux has recession
Yay, Google. Judging by the responses I've seen so far, it seems most of us think this is a step forward for the search engine. That said, why don't we use this story as an opportunity to have a productive conversation about e-mail address security in a world where JavaScript's effectiveness is dwindling? Here's one from A List Apart that uses some fancy mod_rewrite stuff. http://www.alistapart.com/articles/gracefulemailobfuscation/ I know we've got a lot of geniuses and experts in here. Don't be modest! Show off how smart you are! And yes, the next brilliant security measure will someday be pummeled by a robot that some spammer puts together, but hell if that ain't just exciting! We're helping people build better, "smarter" robots, and criminals are some of society's greatest innovators.
Welcome to Slashdot. Replace this text with your desired signature before replying to a story.
You can still post your email address in a monospace font with the CSS line-height attribute set to zero pixles... This has the effect of displaying your email address on screen, but making it difficult for harvesters to grab.
something like:
<div style="font-family: 'courier new', courier, monospace; line-height: 0px;">
c a f s z c @ e s a e c m <br>
r y i h a h n t c p . o
</div>
I have had my email address on every single page of 4 medium-traffic sites (about 1,500 visits a day) in both plain text and in a mailto: link. I use google apps for domains for my email and I get a spam mail about once a month.
The issue here is overblown.
Place the javascript which deobfuscates the email address in a separate file and put that or the folder it's in in the robots.txt.
Seems to me that Google only produces that nice little page summary (which here included the guys obfuscated email address) when you haven't put a page description META tag in the header. For some reason google will use the FOOTER of the page if there is no header. MS Bing however does not use the META description, but seems to take anything similar to it in the body of the page.
sudo mount --milk --sugar
SpamGourmet - I can't begin to say how awesome this is.
Da Blog
Very simple solution, prosecute Google under DMCA for running a circumvention device! :)
Just un-munge on a mouseover.
This has broken CSS munging.
It is actually really easy: http://spidering-lessons.blogspot.com/2009/06/spidering-102-how-to-write-basic-script.html
There's thus no point whatsoever in any form of address obfuscation or munging: it's a complete waste of time indulged in only by the clueless, delusional few who haven't been paying attention to what's gone in during the past decade. What's truly ironic is how many of these people are actually running Windows and thus stand a reasonably good chance of having their own system be the point at which their address(es) are harvested.
A far better point to critique Google on would be their pointless munging of addresses in Usenet news articles -- spammers have had their own Usenet feeds for MANY years and all Google's done is make the archives less useful for everyone else.
Run the script on some event that the Google will not emulate.
For example: [Write me] where the link has something like href="javascript:decodeMail();"
(And at best program the web form that will submit it to you on the server side without revealing your address ;-)
Well, I've got to get back to work. When I stop rowing, the slave ship just goes in circles.
I'm more concerned that the sponsored links featured in gmail have recently been featuring generic soma and viagra substitutes. How do I report these ads as spam ?
And no, I wasn't reading a spam message at the time.
I developed this technique independently some time ago. So far none of the obscured addresses have been exposed.
Since the Googlebot doesn't appear to download referenced Javascript files, simple put the obscuring function into another file....
The text field below is a tool to create your own Javascript email address obscuring script. Enter your email address in the box and press the "OBSCURE!" button. You can then copy the resulting script and place it anywhere on your webpages where you want your email address to appear.
Wont fool me twice, better safe than sorry, better safe than sorry.
Disclaimer: I am not god.
We may not be created equal
But we can be treated equal.