Domain: code-is-law.org
Stories and comments across the archive that link to code-is-law.org.
Stories · 2
-
Hotmail & Yahoo Mail Using Secret Domain Blacklist
Frequent contributor Bennett Haselton writes: "Hotmail and Yahoo Mail are apparently sharing a secret blacklist of domain names such that any mention of these domains will cause a message to be bounced back to the sender as spam. I found out about this because — surprise! — some of my new proxy site domains ended up on the blacklist. Hotmail and Yahoo are stonewalling, but here's what I've dug up so far — and why you should care." Read on for much more on how Bennett figured out what's going on, and why it's a hard problem to solve.On December 7th I sent out a normal batch of emails to the Circumventor mailing list, where I send out new proxy sites for getting around Internet filters. I registered seven new domains and sent each domain to one seventh of the list; the list contains about 420,000 addresses, so each one went to about 60,000 people. (Each new site is only sent to a random subset of the list, so that a blocking company can't just subscribe one address to the list and block all new sites as soon as they're mailed out.)
The list is also comprised of 100%-verified-opt-in addresses, meaning that a new subscriber has to reply to a confirmation message in order to be added to the list. That's considered the gold standard for responsible mailing, but major email providers keep finding new ways to block the emails as "spam," which sometimes provide interesting insights into how the filters work behind the scenes.
After the last mailing, for example, all of my newly registered domains got disabled by the registrar because two of the domains had been incorrectly blacklisted by the Spamhaus Domain Block List. It took two days to discover the problem and then several hours to trace the problem to Spamhaus, although once I found Spamhaus's automated form I was able to get the domains un-blacklisted immediately. So the registrar re-enabled the domains a few hours later, although the traffic to the domains never returned to its previous levels. Spamhaus, meanwhile, continues to claim the DBL is a "zero false-positive" list, and has yet to acknowledge the error or contact me to help get to the bottom of how it happened. Well, they know how to reach me.
At least this time around, my domains didn't get disabled. Instead, the messages rolled out for a few hours with no problem (replies from users indicated that at least some hotmail.com and yahoo.com users were receiving them), until bounces abruptly started coming in from hotmail.com and yahoo.com addresses saying:
----- Transcript of session follows -----
... while talking to mta5.am0.yahoodns.net.:
>>> DATA
<<< 550 Message Contains SPAM Content
554 5.0.0 Service unavailableAfter pummeling my address with bounce messages (to the point where my own Gmail account started bouncing because it was getting hammered with so many bounce messages from Hotmail and Yahoo), when the dust finally settled, I tried reproducing the error by sending test messages from my server's IP address to a test Hotmail account. It turns out that out of the seven different URLs that I had been mailing to our users, four of the domains in those URLs would generate a "550 Message Contains SPAM Content" error when sent from my IP to a Hotmail address, and the other three did not. The message didn't have to contain the banned domain in the From: address; the message would get blocked if it even mentioned the domain anywhere in the message body. (This only happened when sending from my own IP address at peacefire.org. It didn't happen if I tried sending a message from my Gmail account to a Hotmail address, even if the message contained one of the four banned domain names, so the issue probably won't reproduce if you try sending a test message yourself.)
But interestingly, Yahoo Mail started bouncing my messages at about the same time — out of the seven domain names, the same four domain names were being bounced by Yahoo Mail as by Hotmail, also with the error "550 Message Contains SPAM Content." That's far too unlikely to be a coincidence, so it looks as if Hotmail and Yahoo Mail are using a common secret blacklist of domain names that cause a message to be blocked as spam. (As it happens, the other three domains were also being bounced by Yahoo Mail with the error "Message Contains SUSPECT Content" — as opposed to "SPAM Content" — while those three domains were not blocked by Hotmail at all. That of course is aggravating, but the real clue lies in the fact that both Yahoo Mail and Hotmail were giving "SPAM Content" errors to the exact same subset of domains.)
I don't want to publish the list of all seven domain names here, so as not to make it too easy for censorware companies to block them all, but one of the four blacklisted domains was 'golflanding.com.' (All of the new domains I register are nonsensical two-word combinations, since those are the only .com domains that are likely to be (1) still available and (2) easy to remember.) As soon as it seemed like Hotmail and Yahoo Mail were working off of a common blacklist, I checked to see if Spamhaus had screwed up again and listed our domains, but none of the seven domains were on Spamhaus's lists.
I looked up golflanding.com on the blacklistalert.org service, which checks against all major spam blacklists, but no hits were listed there either (except for on some defunct services which haven't been updated in years).
So if Hotmail and Yahoo Mail are both using the domain blacklist, perhaps it's a list compiled by one company and then licensed to the other, or perhaps it's a third-party list not widely known to the public. (Hotmail uses their own SmartScreen filter, but I've found nothing online about Yahoo using it as well.) It's conceivable that one or more of the domains might have gotten blacklisted as a result of Hotmail or Yahoo users clicking their "This is spam" button. However, Hotmail allows newsletter publishers to view data about what percent of their messages to Hotmail users are being flagged by users as "spam," and when I looked up the stats for our IP, they showed a "complaint rate" of less than 0.1% (usually the rest of people hitting 'Junk Mail' to unsubscribe from the list). Assuming that the complaint rates are similar for Yahoo Mail, it's unlikely that the domains got blacklisted as a result of user complaints, unless the blacklist trigger has a ridiculously low complaint threshold.
Neither the Hotmail postmaster site nor the Yahoo postmaster site mention anything about a list of domain names that could cause a message to be blocked for mentioning the domains in the message body. Yahoo Mail does provide a support form for newsletter publishers to send inquiries about why their mail is being blocked; I submitted that on Saturday and started a thread with email "support," although so far their response has just been to copy and paste articles from the Postmaster site, with tips like "Send email only to those that want it." Each time, I reply saying, No, this is not the problem, the problem is that the domains in the messages are getting incorrectly blacklisted, and each time, support cheerfully sends me another article. If I'm not literally talking to a bot, I might as well be.
I opened a similar ticket with Hotmail, and they sent me a form letter saying that the emails were being blocked because of SmartScreen, and that as a matter of policy, they would refuse to fix any errors being made by the SmartScreen filter. Waiting to see if I get a reply from a human next.
So why should you care? Well, for one thing, if you care about users in China and Iran being able to receive proxies to get around their Internet blockers, right now Hotmail and Yahoo are thwarting these proxies more effectively than those countries' own censors are. Yes, these are real people who really do write back to me after a mailing goes out, telling me about how they were able to use the proxies to receive banned political information, and sometimes how long the proxy lasted before the censors blocked it. This week, they had to do without.
But more importantly, this is an example of a general problem: That there are certain types of issues, like blocking of legitimate mail by spam filters, where the "free market" does not deliver the best experience to consumers, and the costs get passed on to everybody. Sometimes the problems could be solved with some effort, but the effort does not get made, because people believe that the free market will solve the problem, or that it already has.
In theory, if consumers have enough information about different companies and their services, the companies can compete to provide the best product to users. The problem is that if one type of information is systematically hidden from users — in this case, the fact that their mail provider is blocking mails from reaching them — then the "theory" falls apart. Since spam getting into your inbox is a visible problem, but missed email messages are an invisible problem, Hotmail's incentive is not to give the user the best experience, but rather to err on the side of blocking legitimate messages — even if the user might prefer to get slightly more spam, than to miss one important email that they were waiting for.
This means we're not just talking about a few messages getting caught in filters, which could happen even in an efficient marketplace. We're talking about a permanent equilibrium where the user gets a sub-par experience by default — a trade-off that causes them to miss more messages than they want to — and senders have to pay the cost of overcoming the marketplace inefficiencies. (Which means if the sender is a business you buy from or a charity you support, the costs get passed on to you.)
Pretty much the entire financial cost of sending email, is attributable to the failure of the "free market" to motivate email providers to deliver non-spam emails into their user's inboxes. If a company or organization uses an email list hosting company like AWeber or Constant Contact to email their users, they pay a fee of about $1 per month for every 100 users on their list (which would run me about $4,000 per month). That fee doesn't go towards bandwidth — even a 1-million-subscriber list, emailed once a month, would use less than 3 GB per month of bandwidth, which is what GeoCities was was giving away for free 10 years ago. What you're paying for is the fact that AWeber and Constant Contact have friends in the right places at Hotmail, Yahoo, and Gmail, so if your mails are getting blocked, they know the people to call to fix the problem. If you run your own list instead of paying a hosting fee to AWeber or Constant Contact, you'll end up paying other costs indirectly, through loss of income when your messages don't reach recipients, or in time and money spent trying to fix the issue. (I have to take this option anyway, since I send different URLs to different random subsets of my list, which is not supported by AWeber or Constant Contact.)
On the other hand, if the market actually "worked" — if email providers did reliably deliver non-spam messages to their users — a company or charity could run their own list for virtually zero cost, and would be able to keep all of that money. (I incur no up-front fees for running my own list; all of the costs are the time spent trying to get Yahoo, Gmail, and Hotmail to stop blocking it.) So every time you donate to a charity or buy from an online retailer, a little bit of that money goes towards the cost of that organization having to fight past marketplace failures in order to get their email to you.
I don't think there's an easy algorithmic solution, like crowdsourcing Facebook complaints or using random-sample voting on Digg. Generally, I just think we need more awareness of the fact that, under certain conditions (including those surrounding email deliverability), the "free market" is virtually guaranteed to arrive at a non-optimal solution. One manifestation of that awareness would be if Hotmail, Yahoo Mail, and Gmail created public points of contact where legitimate email publishers could find out why their emails were blocked, and had real humans responding to the messages and fixing the problems. By default, the imperfect information in the marketplace leads toward an equilibrium that errs on the side of blocking too much legitimate email, so anything that pushes the equilibrium back towards more legitimate messages getting delivered will improve the experience for users and lower costs for senders.
Besides, there's a more basic ethical issue here. If you're Hotmail and you tell your users that you're providing them with "email accounts," then those users expect those accounts to work — including having the ability to receive mails from mailing lists that they've signed up for. Helping legitimate emails get through to users is not just a matter of addressing a marketplace inefficiency, it's a matter of honesty.
Larry Lessig's book "Code is Law" describes how default choices built into the architecture of the Internet and other environments — the "code" — can steer our behavior in ways that we might not choose otherwise. I'm making essentially the same point in saying that some problems are not fixed by market forces, because people are not aware of the problem at all. I think the evidence and the reasoning are straightforward in this case, but it's hard to convince people who have adopted it as an axiom that whatever the free market arrives at, must be the solution. My favorite single sentence in Lessig's book was, "Put your Ayn Rand away." I could imagine the years of pushing against dogmatic fanaticism that led him to write that sentence, and I knew how he felt.
-
Code and Other Laws of Cyberspace
Lawrence Lessig - the name may be familiar from the Microsoft trial - has written an excellent book, which I've taken my time reviewing because I felt I had to read it twice to grasp the full import. Code and Other Laws of Cyberspace covers the real future of your liberties on the internet, and it is not a happy book. Code and Other Laws of Cyberspace author Lawrence Lessig pages 297 publisher Basic Books rating 10/10 reviewer Michael Sims ISBN 0-465-03912-X summary A gloomy look at the forces which shape the internet.Slashdot isn't the first to review this book. Declan McCullagh (Wired), Andy Oram, and Carl Kaplan (NY Times) have all taken a look at it, he's been interviewed, there's an audio debate (mp3 format) between Lessig and McCullagh, and at least a couple of other places have all mentioned it and it is, at this writing, 134 on Amazon.com's best-seller list. I was privileged enough to receive a review copy of the book some time ago, but my review has been delayed because the book is too deep to easily sum up. It's a book about law, and about policy, and about the internet, which doesn't require any grounding in any of the above, but it seems like it would be appropriate for people at almost any level of knowledge - if you know more, you'll get deeper insights, and if you know less, you'll get the basics. A fractal book, in other words. An almost philosophical work, disguised as a law book.
To start with, Lessig's book is a counter to John Perry Barlow's Declaration of the Independence of Cyberspace. Barlow had a good idea, a good goal, but he was totally and completely wrong about how to achieve it, and his declaration and the mindset it embodies has and will do great harm to the future of civil liberties on the internet.
Cyberspace is not and has never been independent of real life, or of government. What it has been is a place where the rules of real life were hard to enforce. That doesn't mean that the rules don't exist - just that it has been hard to make people obey them. The problem for people, like me, who like this state of affairs, this lack of enforcement, is that there's no reason cyberspace has to remain in its current state.
Cyberspace wasn't designed to enforce real-world rules. Such enforcement wasn't built in to the code that runs the internet, was consciously avoided in the early internet designs, and therefore regulators have been working in an environment unfriendly to them. Copying of digital works is easy. Transmitting and receiving content, even forbidden content, is easy. Etc.
But just because it was designed that way once, does not mean that it need be that way in the future. There are tremendous forces (business and government) that would prefer an internet which is friendly and cooperative to regulators. The people building the internet of tomorrow are not professors and geeks, they're CEO's and to a lesser extent, bureaucrats. If the architecture of the internet is "adjusted" to favor regulation instead of disfavor it - and the current internet builders all have reasons to favor regulability - regulating behavior on the internet is not impossible, it's trivial. Lessig has a short chapter on "is-ism", the belief that just because something is, so must it always be. Applied to the internet, this is "We are free, and will always be so." Wrong, wrong! The internet is totally man-made, and what man has made, man can change.
It is hard for me (or Lessig) to emphasize this point too much: the people who claim that we should keep our hands off the internet are completely playing into the hands of government and business. While the net-libertarians have buried their heads in the sand, the net is being changed, constantly, to favor regulation by business and by government.
Lessig takes a look at the infrastructure of the internet and how it is changing for the worse. There's another terrible flaw in thinking about the internet, which runs roughly: "whatever restrictions are placed, someone of technical competence can get around them". This is not true, not if the architecture is designed to support those restrictions rather than oppose them.
The internet, says Lessig, is about to "flip" from "unregulable" to "totally regulable". When that occurs (neither Lessig nor I think there's an "If" involved), who will be regulating the place? Currently corporations, with guidance from government - guidance coming in the form of regulations like CALEA, which make demands not on individuals, but on the code. Once the code is altered to be conducive to regulation, regulation follows naturally.
Lessig makes a great point about open source software. Closed source code which incorporates regulation (censorware is the easiest example, but there are many others) means that the people who are regulated can't even tell exactly what regulation is occuring. When the source code is available, you can at least tell exactly what you can and cannot do, or exactly how your privacy is being infringed. Open source code is inherently less suited to enforcing regulation on users.
I can't do justice to the book without rewriting it. Lessig is deeply skeptical about the ability of the U.S. government to initiate policies which promote, rather than denigrate, the civil liberties we have come to take for granted in cyberspace. Government is busy selling off our freedom to corporations through mechanisms such as ICANN. But no one else is going to do it - and with a government actively hostile to liberties or even one that adopts a hands-off approach, freedom in cyberspace is headed downhill at a tremendous pace.
I recommend this book to almost anyone who cares about the future of the internet. It's well-written - he's a good teacher. It's got some awesome examples - like how Communist Vietnam is more effectively libertarian than the U.S., because it doesn't have the infrastructure of control that we do. It is a scholarly work, but the footnotes are pushed off to the end - they alone are worth the price of the book to a serious student, but someone looking to just read can skip them without problems. It's a deep and thus far unmatched view of what will shape the net of tomorrow, the most inspiring book I've read this year.
Some of Lessig's other papers and articles are available on his home page. The book has a promotional website as well, available at code-is-law.org or what-declan-doesnt-get.com.
Pick this book up at fatbrain.com.