Microsoft Releases 'Caller-ID For Email' Specs
gfilion writes "Microsoft has released a draft specification for Caller-ID for email, 'to address the widespread problem of domain spoofing' - the concept is similar to SPF, but is using XML. There's already an Caller-ID to SPF converter in the works. A few weeks ago, Microsoft discussed compatibility between the projects with Meng Weng Wong (SPF's project leader), but most SPF users are against using XML, so nothing has come of it thus far." We recently covered a brief article mentioning Microsoft's anti-spam work, though this is a clearer indication of their intentions. Update: 02/26 21:36 GMT by T : NewsForge is carrying a brief article with FSF counsel Eben Moglen's take on the draft; Moglen says it is "encumbered with unclear and unnecessary patent license claims."
At least this is one area where MS will have a real problem using their monopoly to enforce a closed standard. A solution that doesn't work for people that don't use MS software just isn't going to fly.
Having done work on (opt-in) HTML newsletters for clients, I know that email clients used are really varied - more varied than web browsers for instance.
Whats to stop a spammer from signing up for a free email account with a false name, blast out a few thousand messages, drop the account (it'll be closed anyway by abuse), wipe hands and repeat?
True, I see how this may help stop some spam, but it also means (if I understood the article correctly) that everyone can find out where I mail from... and in some instances that could be a problem too.
Everything in the world is controlled by a small, evil group to which, unfortunately, no one you know belongs.
I've had the unfortunate experience of attempting to generate XML using Microsoft's MSXML object. What a piece of crap! In an attempt to completely abstract the format, the objects are obfuscated beyond reason. Even the simplest things require ridiculous complexity: just to escape-out special characters requires instantiating a new "entity" element in the middle of the text string element.
And I still haven't figured out how to make the thing give me a CRLF at the end of each element. No, XML doesn't require the whitespace, but it would have sure made it easier for my clients to read the file!
But the worst part is that I *succeeded* in using MSXML. Now, if I wanted to go back to just writing a text file (which I do!), I can't -- my code is tangled up in the objects to the point that it would take a complete rewrite.
That's the simple reason why, every time I hear about Microsoft doing something with XML -- like this proposal to use XML as part of email identification -- I cringe in ph33r.
Stressed? Me? Of course not. Stress is what a rubber band feels before it breaks, silly.
Why not have *real* caller-ID for email authentication? Before you can get on my white-list, you have to call a phone number for some sort of challenge-response
So every person that wants to email you, now has the added burden of phoning some system and following the voice menu options? I think that most people will simply not bother and won't send the email at all.
Email is a great tool and easy to use. Even existing challenge-response systems have been found to have many problems. Let's not ruin email, by taking away the best parts of it. Any authentication needs to be seamless and the details should be hidden from end-users.
As part of an overall spam identification and scoring system, the MS standard and the Yahoo proposed standard are both interesting pieces of the puzzle. They are hardly solutions to the spam problem in and of themselves and unilateral implementation of either protocol as an absolute requirement for acceptance of incomining communication by either Hotmail or Yahoo would likely be met with a varacious subscriber backlash which would result in decision being revered within hours.
Did you consider that e-mail are used outside the US? I am certainly not going to pay a trans-atlantic call each time I want to send an e-mail to a new guy in the US. What about people that don't speak English? What about people who don't have a phone, or don't have a number on a system that supports caller id? With the advent of IP phones, this would become more and more common.
I guess the Joe-Jobbers will be hard at work trying to find all the ways of spoofing SPF.
Zombie writers will be in even greater demand from the spam factories.
Apart from spammers using zombified users email accounts, are there any other possible ways around SPF?
Having read the executive summary and skimmed a few pages, the general precepts make sense.
At the very least, the transitional phase of mass implementation of SASL or similar (which IMO should be mandatory for mail servers anyway) is a Good_Thing_(tm)
Granted it will take a lot of time and effort for the second phase to be reached, but anything which cuts down on spam gets my vote!
smile, it makes everyone else wonder what you're up to
Not sure if this is mentioned in the .doc, but _ep.microsoft.com already appears to be doing this:
> <a>207.46.71.29</a><a>194.121.59.20</a><a>157.60.2 16.10</a><a>131.107.3.116</a><a>131.107.3.117</a>< a>131.107.3.100</a>" "</m></out></ep>"
_ep.microsoft.com. 1H IN TXT "<ep xmlns='http://ms.net/1' testing='true'><out><m>" "<mx/><a>213.199.128.160</a><a>213.199.128.145</a
This is a good idea, and we (tinw) has discussed this many times before, and various implementations already exists (that is - verifying the sender domain, not the specific MS implementation).
Now, what bothers me is this line:
Microsoft believes that it has patent rights (patent(s) and/or pending applications(s))
Given the latest stories on how easy it is to patent everything "over there", I am pretty sure MS is granted this patent. Now I don't know about you, but this geek ain't licensing nothing from MS.
In the license Microsoft grant implementers there is the following nasty clause:
If you distribute, license or sell a Licensed Implementation, this license is conditioned upon you requiring that the following notice be prominently displayed in all copies and derivative works of your source code and in copies of the documentation and licenses associated with your Licensed Implementation:
"This product may incorporate intellectual property owned by Microsoft Corporation. If you would like a license from Microsoft, you need to contact Microsoft directly."
Isn't this incompatible with the GPL?
Rich
I think that's a pretty expansive definition of SPAM. Does everything annoying become SPAM? I see popups as advertising (and something that mozilla effectively killed for me), and SPAM as fraud.
If you blog it...
Sorry, I don't care what tools are available, parsing a comma delimited file when the records are reasonably simple in structure will always be easier. XML is really only usefull when the data resists structure.
Documents are really the only place where I can see XML adding any benifit. ( Unless more bits in the stream are considered benifit. )
XML is the best data format; unless your data needs to be read or written by a human or a computer.
Sort of. You don't REALLY need a DTD - you only need one if you are validating the XML. XML can still be used as a generic ad-hoc hierarchical data format... of course you'd only want to do so because by now XML parsers are pretty ubiquitous and it makes it as good a choice as P-lists, or any other ad-hoc format.
Assuming you don't have a DTD, you don't have a specification of what's in the files syntactically, let alone semantically. Maybe you can reverse engineer most of this (the tag "name" is likely to contain a name, etc.) but there will always be freakish exceptions and ambiguities that even DTDs and XML-Schemas don't address.
And the overhead of using XML is enormous.. All those possible encodings, character sets, namespaces, etc. S-expressions are really much, much nicer is you just want to parse without a formal syntax specification. And they've been around "forever".
Most irksome though, are so-called "XML databases".. Argh! I suppose the people who think that's a good idea also love "CSV databases" or "XLS databases"..
SCO employee? Check out the bounty
It's a fact of life that MS Exchange lives in corporate environments but ISPs and everyone use sendmail (or a sendmail derivative) for mail routing over the Internet.
It's actually in MS's interests to work with sendmail on an open protocol to do spam filtering properly (whatever that protocol is ultimately).
Remember that TCP/IP is an open standard and MS supports TCP/IP open protocols like FTP, HTTP, POP3, SMTP, etc. already in their products so this is no different.
Gentoo Linux - another day, another USE flag.
The skinny is: while spf on its own can't do prevent zombies from sending mail, if the upstream host routes port 25 through its own servers it can control this.
For example, my upstream hosts, Nildram, block all port 25 traffic outbound and inbound unless and until they have checked your (static) ip for open-relay-ness and then put you on a whitelist.
If all ISPs were like that, and spf were to become widely adopted, spam would be toast.
J.
You're only jealous cos the little penguins are talking to me.
Oh Pleeeeeze yourself.
I ain't bashing Microsoft and I don't spell it with a '$' either. I've spent the last 14 years programming using their tools and operating systems, so quit with thinking i'm an OSS zealot.
So read my comment again - i'm not bashing them, and at least they're doing something about spam. But for such a simple datastream, with the throughput needed, it seems unnecessary to bloat it (cpu and memory wise) by having to use an XML parser, regardless of which evil/non evil company designed it.
Would YOU like your mail to be delayed because some bright spark decided to go all trendy and use XML in the mail processing rather than something which just does the job?
Basically, it's a very poor re-implementation of SPF, with all of SPF's disadvantages and none of its advantages.
Under the MSFT scheme, the TXT records are verbose, likely requiring several records where SPF will probably fit in one. They have a hare-brained scheme to parse Received: headers to get around certain problems. Their scheme is absurdly complex.
And neither SPF nor MSFT's scheme do anything about spam coming from <>, cracked Windoze machines, or "valid" throwaway accounts. They also make forwarding more difficult than it should be.
Is there a website out there that tracks the different technological solutions to spam, with pro/con explanations?
Wow. I looked at MS' proposal as well as SPF's, and darn if MS didn't do much better.
First: SPF's webpage is mostly slogans about how it makes the world better, but you have to dig around a lot to find out how their scheme works. Mostly you'll just find more of the same self-hugging and no real technical info.
Secondly: MS' scheme seems simple enough, just one addition to DNS (list those mailservers allowed to send mails from your domain), and a very nice, standard-compliant way of handling the mobile-user problem:
If you're away from home and you're sending from your name12@somefreemail.com account, and you want your From: line to be your standard Me.Myself@my-own-domain.cx, whatever actual account you're sending through, then just make sure that your Sender: is name12@somefreemail.com and you're set. This is a nice alternative if you can't list your freemail ISP's mailserver in your DNS (maybe you don't know its IP address, or it's changing all the time).
Maybe SPF's scheme is similar, but they sure didn't mention any Sender: header there. Seemed to be some home-cooked up non-standard header, and a lot of talking about forwarding not working etc.
The only thing I didn't like with MS' scheme is the XML thing, why would you want to put XML in your DNS records? Nothing else in DNS is XML. Oh well.
Given the effectiveness of caller-id when it comes to the spammers of the phone world, I don't think it's the best model. Basically, caller-id allows anybody who has a PBX connected with digital trunks to the network to forge whatever caller-id information they want. Most telemarketers left it blank. Lots of legit companies send the id information for their main switchboard number, no matter what actual phone line the call is travelling down.
I haven't seen a mail filter that will bounce E-Mails based on whether or not they're encrypted to your obnoxiously large PGP key that takes 30 seconds to encrypt to on a 2GhZ pentium or signed by someone on your whitelist. I suppose one could be written...
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
There is something called copyright law. Microsoft or any other company cannot just go and resell your software on their own terms.
Unless you grant them a license.
Which appears to be precisely what their license requires you to do. It's not clear to me precisely what you're licensing to them, maybe it's just any patents you hold on the techniques used, but it doesn't say that. What it says is that you grant them an unlimited license to "make, use, sell, offer to sell, import, and otherwise distribute Licensed Implementations", which certainly sounds like you're giving them permission to do what they like with your software.
I may be misreading this, but that's what the plain language seems to say. I'd want to get a legal opinion before I'd interpret it any other way.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
And that is why Microsoft is using it I'm sure. They have a bunch of nice GUI tools that parse XML, so anything they do now has to be XML.
It's the same as the way they do email. If I switch to source edit view, my simple text message (e.g. Got It.) balloons into ten lines of generated HTML gobbledygook. Yes, I really need to specify the font for *each* line...even the ones that are blank.
I really hope that the standard is not set by MS. Something very simple (this is who can transmit for this domain) could turn into something ugly. I can write SPF declarations by hand. Chances are that their XML declarations will be twenty times as long and will need tools to create them. Yes, the XML parsing tools are ubiquitous, but a simple format doesn't require a parsing interface to feed you info. I see no reason not to make a human readable interface.
from my understanding of the licence: If I want to implement a compliant implementation, I can go right ahead. (as long as I promise not to bother MS about patents that I might own on this technology).
If I then sell or distribute the software I wrote: Fine.
You however get to pay MicroSoft to use my software.
Oh, and they've included a GPL incompatible advertizing clause.
Doesn't sendmail already have a similar feature turned on by default? You have to explicitly enable "accept_unresolvable_domains" in your sendmail.mc file or mail from servers with no reverse lookups will be rejected.
According to
bash# for x in $(antiword callerid_email.doc); do echo $x; done|wc -l
this is a thirteen thousand word document.
Can someone explain in a sentence or two what's different about what MS is proposing and what sendmial already offers?
MIME isn't damage, MIME is a hack to fix the crippled SMTP message format. Maybe you are only interested in sending ASCII text messages-- and that's very hardcore of you and all-- but the rest of the world is interested in sending pictures, documents, text in languages other than English (well over a hundred, and you're fucking well right the "Standard" should support them), etc., and your underdeveloped message format just can't properly deal on its own. Maybe you should read up on the subject.
Text itself isn't a drawback-- XML is generally represented as text-- but a message format that is defined only for transmitting text just doesn't cut it now that we're out of Green Terminal Land and into the World Where People Use Computers to Do Stuff.
And you're missing the point of my remark about XML libraries. The problem is not that parsing email is hard, but that there's no standard for an internal representation of an email message, and if there was it would probably be completely non-interoperable with the rest of the world. XML has the DOM and SAX, among others. This means a whole world of functionality, in the form of libraries and technologies that understand XML via DOM or SAX, is available to the program author. You can transform the message into another format using XSLT, access and modify the message content and headers with XPointer, find references to and merge in external resources with XInclude, extend the message format using namespaces (thereby allowing anyone who doesn't care about your extension to safely ignore it), transform the message (with XSLT) into XHTML and provide rich formatting with CSS (both of which can be found in reusable libraries), and so on and so forth.
You use XML, you get all of the above essentially for free. You go with some application-specific grammar, and you can either limit your email to plaintext or you can reinvent all of those wheels. But I know how much you reet haxorz hate usability and interoperability... maybe we can hook you all up with some nice teletypes.
Yes, I have worked with real data. Why is it that so many people on slashdot assume that if someone disagrees with them that they must be ignorant?
By moving from comma-delimited to XML you don't solve the problem, you just move it. What happens if someone includes text in a record that just happens to close your field? I know there are answers to that, but they are not very different from those with comma separated lists.
BTW: To my knowledge Microsoft is the only developer brain-dead enough to try and solve the comma-in-a-field problem with quotes around the entry. But then again they are the ones who are trying to use XML for everything now, so I guess it fits.
The correct way to do it is escape them with slashes, which is way less complicated than you make it sound.
',' becomes '/,'
'/' becomes '//'
NEWLINE becomes '/n'
Thats it! Any other escape sequences would just be for added human readability, and would be needed in XML for the same purpose.
Your comments really underscore my problem with XML. It claims to fix many problems, but in fact it just makes them more opaque. (Much like OOP, but thats another matter.)
At least you stayed away from the idiotic notion that I always hear about XML providing a standard format for structuring data. In reality it is no more standard than plain text. Which of these is correct?
<LUSER><UID>12<UID><NAME>Biff</NAME></LUSER>
<LUSER UID="12"><NAME>Biff</NAME></LUSER&g t;
<LUSER UID="12"><NOMBRE>Biff</NOMBRE></LUSER>
And Isn't this easier to read?
LUSER,12,Biff
IMHO: XML is excelent in a DocBook like implementation where the data will not fit into a clean record structure, but for all other implementations that I have seen it is snakeoil. It's more dificult for humans, more dificult for machines, and claims to fix a lot of problems that it just sweeps under the rug.
BTW: I manage a data retention system (not a relational database) that stores about 50GB/day and has to be kept on local storage for a full month. The data is replicated between two remote locations and backed up daily. If I had to move the data from comma-delimited to XML, our costs would more than double for bandwidth, storage, and labor (switching tapes). That doesn't even include the extra processing that would need to be done to reference the data. I'm not sure my boss would call that "a few bytes".
XML is the best data format; unless your data needs to be read or written by a human or a computer.
Wow, that was weird. It looked fine in preview. Let's try this again...
The difference is that XML-handling libraries all handle this automagically
(usually by encoding angle brackets within text data). Yes, it's possible
to have a library that does other escaping schemes automatically, but
there's still the issue of human-readability...
> <LUSER><UID>12<UID><NAME>Biff</NAME></LUSER>
> <LUSER UID="12"><NAME>Biff</NAME></LUSER&g t;
These will parse out to the same thing. And yes, if the records are all this
simple, and all *the same*, XML is unnecessary. But the minute the records
get even remotely complex, especially if some of the records have bits of
information that other records don't have, the human readability gets lost
in a sea of stuff like
LUSER 7125,Johnson,Biff,G.,,Jr.,,462-3203,44833
LUSER 6784,Johnston,Maria,,Taylor,,,468-1708,44833
Then you need a better structure than CSV. Is XML the only option? No.
But XML has the advantage of being fairly intuitive and strongly resembling
something (HTML) that everyone and his dog (thinks he sort of) knows.
<luser id="7125" zip="44833">
<name last="Johnson" first="Biff" middle="G." suffix="Jr.">
<phone>462-3203</phone></luser>
<luser id="6784" zip="44833">
<name last="Johnston" first="Maria" maiden="Taylor">
<phone>468-1708</phone></luser>
Yeah, it's longer. One order of magnitude longer than the CSV, not so much
longer than some of the other options. In many circumstances, the extra
length is a good tradeoff. I don't understand the desire to bash XML every
time it comes up, just because it's a buzzword. Sure, it's a buzzword, and
using XML doesn't really add inherent value, but it doesn't detract, either.
I still maintain, it's a perfectly valid choice.
> BTW: I manage [stuff]
That's nice. I'm TCG at a public library. We work daily (as does every
library) with a format called "MARC Records", and let me tell you, XML
looks mighty attractive.
Cut that out, or I will ship you to Norilsk in a box.
The difference is that XML-handling libraries all handle this automagically
How is that different? I could write a library to parse a CSV in about 10 minutes. Oh wait that is different. How long does it take to write a decent XML library? How many lines create how many bugs?
I think your points are fair, and not knowing anything about "MARC Records" I can't really comment on how XML would work for it.
I believe that there are good aplications for XML, but my reaction to it comes from the fact that people try to apply it in all sorts of places where it doesn't belong. (Like in a network protocol to validate emails) Bad programing bothers me because it makes bad programs that I may be forced to use at work. If this takes hold I will end up involved in tracking down email problems, and instead of being able to use a simple split command to break down the data I'll have to deal with mountains of useless tags.
My favorite mis-application of XML was made by Cisco for a network load-balancing device. They built an XML interface for bringing servers in and out of rotation, and it was the only way to automate the process. It never worked right, and even there own tools could never do the job reliably. I don't know how many hours I spent pulling my hair out on that one. A high-school kid should have been able to write that interface in 10 minutes, but using XML it was a nightmare.
We're probably closer in our thinking than our posts let on. I still don't see a single problem that XML solves for structured data, but for documents it has no equal. In the real world I'm sure there are all sorts of places where the line between structured data and document data is blury.
BTW: I love your "Lamejoke Generator".
XML is the best data format; unless your data needs to be read or written by a human or a computer.