Did you even bother to read the article? Yes, it has to be labeled as spam, but the label isn't defined. As a matter of fact, the label is up to the spammer to decide! The FTC is PROHIBITED by this law from defining the label. So how are you supposed to filter out mail based on an arbitrary label defined by the sender?
I'm not happy with this law either. It's not going to reduce spam. However, not to be the total pessimist, I have always had the impression that most laws sketch out basic guidelines that are then spelled out explicitly by regulations from the exectutive branch (FTC, FCC, etc). Does it really explicitly say the FTC is prohibited from indicating how spam must be labeled? My interpretation is that the law's intention is to mandate the executive branch to spell out how messages should be labeled. I could be wrong.
CORBA, DCOM, Java RMI,.NET remoting, and similar technologies are tightly coupled and try to hide the network boundary. In contrast, the direction SOAP is headed is loose coupling with the network boundary explicit. This was all explained clearly in a talk from Don Box at the Microsoft PDC.
If A is a service that is used by B, C,..., Z, tight coupling means that if A is changed, then it breaks B, C,..., Z. SOAP, when used in a smart way, means looser coupling, so that A can be updated without breaking the dependent apps. Therefore, for services that could potentially be used by a large number of applications, loose coupling is an absolute necessity.
Hidden network boundaries mean that we don't plan to take the performance hit when a request must traverse a network boundary. With explicit network boundaries, the architect takes the performance hit into consideration. Requests that cross network boundaries have a large granularity, to lessen the performance impact of the network traversal.
Search for some of Don Box's writings on the web for a better analysis of the direction of SOAP vs. the old object-oriented RPCs.
I' ve even saved about 2000 spam emails to help train the filters.
I think this is backwards. The real value of trainable filters, like the Bayesian filters, is to find the mail that is valuable to you -- in other words, to avoid false positives. When filters are customized to find the good email, they become more effective against spam because the spammers cannot anticipate how the filters will treat their spam. Examples of good words that the filter should search for include the names of your family members, the names of organizations that you are involved with, words in the signatures of your family and friends, words in your own signature (which is typical sent in a reply to a message you sent), and so on.
I have never understood how sender verification is a solution to the spam problem. It seems to me that it just legitimizes spam -- or, more correctly, makes spam look more inviting to "legitimate" marketers. Imaging having to opt-out a few times a day, because every business everywhere now thinks that email marketing is legitimate. After all, they are following the law, many users don't seem to mind, and they include a valid link to opt out.
I suppose a sender verification system would have some impact on bad spam. But many people, even I to some extent, think potential anonymity is a "feature" of the email system, not a bug.
Well, he obviously knows what Fortune 500 CxOs are thinking. And, he sees a real market need for a highly reliable global inter-network. Let him go out and build one. If the market need is there, it should be sure thing, right?
It's funny how he talks about the need for innovation. In other sentences he talks about the "early" days of the Internet, as if those days are past. If there is still room for a lot of innovation, then I would say we still are in the early days, relatively speaking. When we think of a mature infrastructure, we generally think of the end of innovation. So, which is it, Mr Scalvos? Are we still in the early stages, where innovation is possible? Or are we at a mature stage, with little innovation likely?
Just for comparison, let's look at another commercialized network: the public switched telephone network (PSTN). There's been a lot of innovation there, hasn't there? Yeah, we have 64kbs bandwith, yet the audio bandwith has been limited to 3 KHz for decades.
Verisign plans to monetize their responsibility for the.com and.net domains. Their claim is that this is good for users.
Fine.
Is it possible that other caching DNS servers could also redirect traffic from the Site Finder service to their own Site-Finder-like service? Couldn't AOL, MSN, Earthlink -- any ISP for that matter -- just set their caching DNS server to redirect from Verisign's Site Finder to their own search page? Assuming Verisign restores Site Finder, I would love to see that they are unable to make money from it, because every DNS server that sits between users and the Verisign servers redirects the Site Finder IP address.
<sarcasm>Come to think of it, that would be really innovative of AOL and MSN. And it would be good for users.</sarcasm>
I ordered a workstation from Sun just a couple of years ago. After ample time for the shipment to arrive, I still had not received it. I called up Sun's customer service, trying to find out what happened to my workstation. I was told something to the effect that the shipping company had taken responsibility for the shipment, and that I had to take it up with that company. That was all the help I could get from Sun. I concluded after this incident that Sun really only pays attention to their biggest customers. Little guys are ignored. I really could not believe how they just claimed that they had no further responsibility in the sale. The customer service representative even told me that if I didn't receive the workstation, Sun would not refund my money or send another workstation. Instead, she told me I had to work it out with the shipping company.
"As a heavy but non-technical computer user it has been extremely frustrating for me to encounter 404 errors. Naturally, they happen at the busiest times," said Roy S. Lahet, vice president of Planning for Mercy Behavioral Health. "Alternative suggestions instead of a project-stopping 404 is a welcome and functional improvement to my use of the Web and related searches. It is difficult for me to see a downside to this user friendly enhancement."
If you type a domain name incorrectly, you don't get a 404 error. It's a lie. A 404 error comes from a web server that cannot find the page you are requesting. The Site Finder service does not eliminate 404 errors.
It makes me mad that they print these lies. They know better than that.
But with ever-more-complex software, that limitation may become a bottleneck, making Athlon 64's ability to address a whopping terabyte (1000GB) of physical memory very attractive.
When will we ever learn to stop using terms like "whopping." What is "whopping" today is "wimpy" in a few years. It's been that way since computers were invented. Think about a new 386 computer with a "whopping" 8 MB of memory, or a whopping 700 MB hard drive.
Read the sentence again -- the one about a "whopping terabyte (1000GB) of physical memory" -- and you see that the sentence has the very same meaning if the term "whopping" is removed.
Here's a cheaper idea: I tell everyone I know to start the subject line with "goat" if they want to e-mail me. Then I filter all e-mail without "goat" as the first word in the subject...
Yes, this is a very simple, and very good idea.
There are many variations on the idea, one of the best being the use of an alias that contains the password. If your email address is john.doe@example.com, then you could use the alias john.doe+goat@example.com. The nice thing about this varation is that the "password" is stored in the address book with the email addresses, without any changes to the address book.
But it would be a simple thing to include an extra field in a web form where you add your email "password." Imagine an e-tailer allowing you to optionally include a password, so that you could reliably filter their email to you. It's a good solution to the whitelisting problem.
In short, using a knowledge proof in order to send email to someone is smart. It works in many situations, like when you want to receive and filter email from friends and family. It would also work for mailing lists. The basic idea is to be a moving target to the spammers.
Source: Posted to IETF mailing list by Vernon Schryver.
you have discovered the Ultimate Final Perfect Solution To The Spam Problem (UFPSTTSP).
you are the first to think of the UFPSTTSP.
you were motivated to find the UFPSTTSP because you know it is impossible to filter more than 99% of spam with fewer than 0.1% false positives by any of several currently available mechanisms.
despite being the inventor of the UFPSTTSP, you are unfamiliar with "false positive," "false negative," "UBE," "tarpit," "teergrube," "Brightmail," "Postini," "SpamAssassin," "DNS blacklist," "HELO," "RBL," or "mail envelope."
you plan to make money by licensing the idea of the UFPSTTSP.
you are deeply offended when people do not agree that you have found the UFPSTTSP.
you cannot name several potentially fatal flaws in the UFPSTTSP.
you think all you need to do to get the UFPSTTSP implemented and deployed is to publish an RFC.
you don't recognize the difference between deploying and implemeting the UFPSTTSP.
you plan to publish an RFC mandating the UFPSTTSP but have no idea that RFC 2223 or RFC 2026 exist.
you have no idea of the relevance of "consensus" or "IESG approval" to publishing RFCs.
you think all RFCs have the same standing.
you think that spammers won't ignore, subvert, or exploit the UFPSTTSP if you publish it as an RFC.
the UFPSTTSP depends on spammers or mail recipients changing their behavior without any immediately gain.
the UFPSTTSP won't be effective until it has been deployed at more than 60% of SMTP servers and you don't think that's a problem.
the UFPSTTSP is trivial to implement and deploy, but you have done neither.
you feel your job is done after having explained the UFPSTTSP, and that "programmers" will drop everything to implement it.
you think that a violation of an RFC by an SMTP client or server is good and sufficient reason to reject all mail from the system's domain.
you think that SMTP has no authentication and have never heard of SMTP-AUTH, SMTP-TLS, S/MIME, or PGP or think they're irrelevant to the lack of authentication in SMTP.
you think that the fact that most SMTP servers do not authenticate the SMTP clients of strangers is a major bug in SMTP instead of a major feature and expression of a primary design goal.
despite discovering the UFPSTTSP, you don't know the meanings of MTA, MUA, SMTP server, or SMTP client.
the UFPSTTSP requires a small number of central servers for validating email, serving as "pull servers" for bulk mail, or anything else.
the UFPSTTSP requires that anyone wanting to send mail obtain a certificate and that such certificates would be checked by all SMTP servers.
you think that useful certificates of a person's identity that certifies not only that the person has that name but has no other certified names are cheap and easy.
you think that most Internet users would willingly pay more than $5/month to avoid spam, and don't know the per-user price point for anti-virus software or data.
you don't see why a certificate that binds a name to a user is useless against spam unless it also certifies that the user has no other names.
the UFPSTTSP involves ISPs issuing certificates to users and that the same ISPs that don't terminate the accounts of spammers or don't investigate prospective customers enough to refuse service to spammers will refuse certificates to spammers.
you've never heard of RFC 2554 or RFC 2487 and the UFPSTTSP has something to do with authentication.
the UFPSTTSP involves replacing SMTP.
you routinely send single "LARTS" or reports of single examples of objectionable mail to more than two dozen addressees.
your definition of spam differs significantly from "unsolicited bulk email".
It will be standards-compliant to its specified version number. If you're compliant with HTML 4.01 today,...
You are absolutely correct. The DOCTYPE tag does actually mean something! It indicates the version of HTML.
Just wondering though. Why do people think that a DTD is required to view a document? A DTD is required for validating documents, which is a matter for authoring, not viewing.
In fact, a stylesheet is required in order to view document. That makes sense if you consider that all HTML browsers have a default, built-in style sheet.
The other articles linked to don't seem to make a good case. They seem to argue that micropayments will succeed because content producers need them to succeed. But in a marketplace that is driven by consumers, not producers, that's not much of an argument.
Shirky is right on in his discussion of "mental transaction costs." I think another way to view the same phenomenon is as a bunch of toll booths on the Information Highway. Driving from Washington, DC to New York City comes to mind. You drive a few miles, then pay toll. Drive a few more miles, then pay yet another toll. Etc. It's not pleasant.)
However, McCloud does make a few good points. Funny how all of his good points center around the sale of music, and not what we typically think of as web content. And therein I think lies the gist of the argument in favor of micropayments. If the experience of micropayments is like driving a toll road (Shirky's view), then micropayments will definitely fail. On the other hand, if the experience of micropayments is like shopping on eBay or going to a yard sale, then micropayments will work. The key difference here, is that in the latter case, the user is looking for and expecting a shopping experience.
So, then what are some situations where micropayments might just work? First, I think they might work for content that is not traditionally considered web content, such as music. Selling popular songs for $0.99 constitutes micropayments in this view. But I can think of other examples, some of which already exist: Paying for search information for investigative purposes. (You can get all kinds of information from many public databases about individuals.) Paying for live events. (Currently, you can pay $1 and get live coverage of ML baseball games for one day. This is in contrast to the subscription, the only option available at the beginning of the season.) Researchers can get access to archived, peer-reviewed journal articles. Ordinary computer users can get desktop themes. Powerpoint users can buy templates. There are many more instances where micropayments would work.
In summary, the situations where micropayments would work are (1) non-traditional web content (2) in the context of a marketplace, where buyers come seeking to buy.
Other programs that they stopped further development on:
Internet Explorer
Notepad
Wordpad
Sound recorder
HyperTerminal
Paint
etc.
My point is, that there are much better options for most of these applications. Surely this is better for Mozilla mail, Eudora, Mulberry, Pegasus, and other third party email client applications.
Re:SMTP should have been replaced long ago
on
Replacing SMTP?
·
· Score: 2, Interesting
I have heard it said that X.400 failed because X.400 email addresses failed the "business card test." In other words, X.400 email addresses are too large to fit nicely on a business card.
I understand your point. However, it is also a well-known fact that people are very bad at estimating risk. Therefore, the users of file trading services may not make a well-calculated decision, but rather an emotional decision. With an advertising campaign on TV and radio, they just might be scared out of using file trading services.
I have seen a spam message where they used a CSS stylesheet retrieved from a CGI script to track messages. I'm not sure, but I think that technique may even work against Mozilla with image retrieval turned off.
Now, here's a funny story. I was at the FTC Spam Forum a while back. There were some of the more responsible email marketers there -- you know, the ones that send out regular newsletters for opt-in subscribers -- and they were whining and complaining because spammers have spoiled "rich" email for them. Just a few years back they had visions of eventually being able to send email with flash, animated graphics, fancy styles, and so forth. And now they realize that people don't want to receive those kinds of emails because of spam (and to some extent viruses). So they whined about it. I guess for them email is "push" marketing, while for the rest of us, email is a way to communicate with co-workers and friends. Who needs HTML to say "wanna go get some lunch?"
It's interesting, though, to think about what would happen if SCO won.
Because of the GPL, they can't sell Linux licenses. They could ask anyone who has used Linux to pay them for past damages, which is like extortion, IMHO.
One would think that they would hope to boost sales of their Unix products as a result of Linux FUD. But this is where it gets really interesting. A good alternative to SCO's Unix products is Solaris x86. And Sun has none of the restrictions that other Unix vendors have. So, if SCO makes everyone mad, and succeeds at forcing companies to stop using Linux, they can use Solaris x86, or even FreeBSD. My point is, that there are proprietary alternatives to SCO's Unix products available for any CIOs that decide to go that route.
But analysts categorically disagreed with that viewpoint last week. âoeSCO is not trying to destroy Linux,â said DiDio of the Yankee Group. âoeThat's silly. This is about paying royalties.â
This is absolutely wrong. It is about destroying Linux. Linux is under GPL. Therefore, if any company thought they could pay a royalty to SCO and continue to use Linux, they are deceived. If there is such code in Linux, then to use that code would violate the GPL. Therefore, if SCO cares at all about IP, it must tell everyone to stop using Linux. If SCO tells them to continue to use Linux, but pay a royalty, then they show that they don't care about intellectual property, unless its their intellectual property. How hypocritical. SCO really is trying to destroy Linux.
This whole SCO vs. Linux situation is so full of FUD. The analysts saw as much as 80 lines of code that appeared to be identical. So, out of millions of lines of code, perhaps a fraction of one percent appears very similar to the Unix code. Yet, according to SCO, that must be a very critical fraction of one percent, because
SCO contends that by co-opting code from Unix, Linux has severely damaged SCO's intellectual property. According to some estimates, the company collected annual revenue of between $200 million and $250 million on Unix System 5 software before the rise of Linux. After Linux reached the mainstream, those revenue figures dropped to about $60 million a year.
What's going on here? I think SCO is trying to imply that the code that is the same in Linux and Unix is randomly sampled, meaning that we can then infer that a much larger portion of the Unix code was copied. How else can we explain it? How could stealing -- if it is in fact stealing -- a fraction of one percent of the code base result in "severely" damaging SCO's intellectual property? No, it must be much more than a fraction of one percent. The inference that a much larger portion of Unix code was copied is intended to spread FUD.
So, some analysts saw sections of as much as 80 lines of code that appear to be copied, and then conclude
âoeIf IBM wants to cure this problem, they could start by buying all the appropriate licenses and then paying SCO a billion dollars,â Claybrook said. âoeBut SCO now says that a billion may not be enough to cover their damages.â
There is a serious disconnect here. A few hundred lines of code may have been copied from Unix into Linux -- that being a small fraction of one percent -- and analysts conclude that because of that IBM should pay SCO $1 billion!? Huh?
Someone has to stand up to the rights of the thousands of developers who put in volunteer time to make their contribution to the Linux code base, agreeing to license their code under GPL. They didn't ask for SCO's Unix code to be mixed with their own code. SCO is showing no respect for the intellectual property of those developers. Rather, SCO is trying to make billions off intellectual property that SCO does not own, all the while preaching the morality of respecting intellectual property.
There are some examples of wild success with reuse... though they seem to me to be more success though definition. All of those shell scripts that are built from individual command line tools are examples of reuse, where each command line tool represents a unit of software available for reuse. But, I think we all think of reuse more at the code module level... a function, or class, or small library. And it is at this level that I think we fail miserably, and it is my contention that we fail because we can't easily find the candidates for reuse.
Software applications today are more complex than ever. They can only continue to become more complex if there is significant code reuse. In fact, there is more code reuse today than ever before. The code that is reused is not necessary at the level of libraries, but at a much larger scale. The best examples of code reuse:
Relational database systems for data storage
Web servers as a hosting environment for server applications
Web browsers as a hosting environment for client applications
Operating systems (considered in the broad sense of a collection of APIs available to all developers)
Using a whitelist in combination with a bayesian filter is just one thing you can do. There are plenty of other things.
You could look for the message ID of a message you sent in the header fields of received messages (specifically, the in-reply-to header field). If you find it, it means that the received message is likely to be a reply to a message you sent.
You could look for a phrase from your signature, which could indicate that someone sent a reply and included your original message.
Besides the words in your signature, you could program in certain other words that automatically trigger a classification as non-spam. Those words might include the names of trademarked products that your company sells or similar types of words. Of course, this is just overriding some of the learning that presumably would happen automatically. But if these are very important words, then you must insist that nothing else the filter does can override the classification as non-spam, and thereby avoid false positives.
In summary, I think that bayesian classifiers, as Paul Graham proposes them, are just too naive. The addition of a few heuristics could make a big difference.
I'm not happy with this law either. It's not going to reduce spam. However, not to be the total pessimist, I have always had the impression that most laws sketch out basic guidelines that are then spelled out explicitly by regulations from the exectutive branch (FTC, FCC, etc). Does it really explicitly say the FTC is prohibited from indicating how spam must be labeled? My interpretation is that the law's intention is to mandate the executive branch to spell out how messages should be labeled. I could be wrong.
CORBA, DCOM, Java RMI, .NET remoting, and similar technologies are tightly coupled and try to hide the network boundary. In contrast, the direction SOAP is headed is loose coupling with the network boundary explicit. This was all explained clearly in a talk from Don Box at the Microsoft PDC.
If A is a service that is used by B, C, ..., Z, tight coupling means that if A is changed, then it breaks B, C, ..., Z. SOAP, when used in a smart way, means looser coupling, so that A can be updated without breaking the dependent apps. Therefore, for services that could potentially be used by a large number of applications, loose coupling is an absolute necessity.
Hidden network boundaries mean that we don't plan to take the performance hit when a request must traverse a network boundary. With explicit network boundaries, the architect takes the performance hit into consideration. Requests that cross network boundaries have a large granularity, to lessen the performance impact of the network traversal.
Search for some of Don Box's writings on the web for a better analysis of the direction of SOAP vs. the old object-oriented RPCs.
I think this is backwards. The real value of trainable filters, like the Bayesian filters, is to find the mail that is valuable to you -- in other words, to avoid false positives. When filters are customized to find the good email, they become more effective against spam because the spammers cannot anticipate how the filters will treat their spam. Examples of good words that the filter should search for include the names of your family members, the names of organizations that you are involved with, words in the signatures of your family and friends, words in your own signature (which is typical sent in a reply to a message you sent), and so on.
I suppose a sender verification system would have some impact on bad spam. But many people, even I to some extent, think potential anonymity is a "feature" of the email system, not a bug.
It's funny how he talks about the need for innovation. In other sentences he talks about the "early" days of the Internet, as if those days are past. If there is still room for a lot of innovation, then I would say we still are in the early days, relatively speaking. When we think of a mature infrastructure, we generally think of the end of innovation. So, which is it, Mr Scalvos? Are we still in the early stages, where innovation is possible? Or are we at a mature stage, with little innovation likely?
Just for comparison, let's look at another commercialized network: the public switched telephone network (PSTN). There's been a lot of innovation there, hasn't there? Yeah, we have 64kbs bandwith, yet the audio bandwith has been limited to 3 KHz for decades.
Fine.
Is it possible that other caching DNS servers could also redirect traffic from the Site Finder service to their own Site-Finder-like service? Couldn't AOL, MSN, Earthlink -- any ISP for that matter -- just set their caching DNS server to redirect from Verisign's Site Finder to their own search page? Assuming Verisign restores Site Finder, I would love to see that they are unable to make money from it, because every DNS server that sits between users and the Verisign servers redirects the Site Finder IP address.
<sarcasm>Come to think of it, that would be really innovative of AOL and MSN. And it would be good for users.</sarcasm>
(BTW, I did eventually receive the workstation.)
If you type a domain name incorrectly, you don't get a 404 error. It's a lie. A 404 error comes from a web server that cannot find the page you are requesting. The Site Finder service does not eliminate 404 errors.
It makes me mad that they print these lies. They know better than that.
When will we ever learn to stop using terms like "whopping." What is "whopping" today is "wimpy" in a few years. It's been that way since computers were invented. Think about a new 386 computer with a "whopping" 8 MB of memory, or a whopping 700 MB hard drive.
Read the sentence again -- the one about a "whopping terabyte (1000GB) of physical memory" -- and you see that the sentence has the very same meaning if the term "whopping" is removed.
Yes, this is a very simple, and very good idea.
There are many variations on the idea, one of the best being the use of an alias that contains the password. If your email address is john.doe@example.com, then you could use the alias john.doe+goat@example.com. The nice thing about this varation is that the "password" is stored in the address book with the email addresses, without any changes to the address book.
But it would be a simple thing to include an extra field in a web form where you add your email "password." Imagine an e-tailer allowing you to optionally include a password, so that you could reliably filter their email to you. It's a good solution to the whitelisting problem.
In short, using a knowledge proof in order to send email to someone is smart. It works in many situations, like when you want to receive and filter email from friends and family. It would also work for mailing lists. The basic idea is to be a moving target to the spammers.
You Might Be An Anti-Spam Kook If...
Source: Posted to IETF mailing list by Vernon Schryver.
you cannot name several potentially fatal flaws in the UFPSTTSP.
You are absolutely correct. The DOCTYPE tag does actually mean something! It indicates the version of HTML.
Just wondering though. Why do people think that a DTD is required to view a document? A DTD is required for validating documents, which is a matter for authoring, not viewing.
In fact, a stylesheet is required in order to view document. That makes sense if you consider that all HTML browsers have a default, built-in style sheet.
You make very good points.
Shirky is correct.
The other articles linked to don't seem to make a good case. They seem to argue that micropayments will succeed because content producers need them to succeed. But in a marketplace that is driven by consumers, not producers, that's not much of an argument.
Shirky is right on in his discussion of "mental transaction costs." I think another way to view the same phenomenon is as a bunch of toll booths on the Information Highway. Driving from Washington, DC to New York City comes to mind. You drive a few miles, then pay toll. Drive a few more miles, then pay yet another toll. Etc. It's not pleasant.)
However, McCloud does make a few good points. Funny how all of his good points center around the sale of music, and not what we typically think of as web content. And therein I think lies the gist of the argument in favor of micropayments. If the experience of micropayments is like driving a toll road (Shirky's view), then micropayments will definitely fail. On the other hand, if the experience of micropayments is like shopping on eBay or going to a yard sale, then micropayments will work. The key difference here, is that in the latter case, the user is looking for and expecting a shopping experience.
So, then what are some situations where micropayments might just work? First, I think they might work for content that is not traditionally considered web content, such as music. Selling popular songs for $0.99 constitutes micropayments in this view. But I can think of other examples, some of which already exist: Paying for search information for investigative purposes. (You can get all kinds of information from many public databases about individuals.) Paying for live events. (Currently, you can pay $1 and get live coverage of ML baseball games for one day. This is in contrast to the subscription, the only option available at the beginning of the season.) Researchers can get access to archived, peer-reviewed journal articles. Ordinary computer users can get desktop themes. Powerpoint users can buy templates. There are many more instances where micropayments would work.
In summary, the situations where micropayments would work are (1) non-traditional web content (2) in the context of a marketplace, where buyers come seeking to buy.
For anyone who's interested, here a link to the voting record of the Senate on the USA PATRIOT Act of 2001:
USA PATRIOT Act voting record
How many people realize that the bill passed the Senate by a vote of 98 to 1?
All your domains are belong to us!
You don't need Google. Just post it to one of the Usenet binary groups. Use some kind of steganography to hide it in a few images.
Other programs that they stopped further development on:
My point is, that there are much better options for most of these applications. Surely this is better for Mozilla mail, Eudora, Mulberry, Pegasus, and other third party email client applications.
I have heard it said that X.400 failed because X.400 email addresses failed the "business card test." In other words, X.400 email addresses are too large to fit nicely on a business card.
I understand your point. However, it is also a well-known fact that people are very bad at estimating risk. Therefore, the users of file trading services may not make a well-calculated decision, but rather an emotional decision. With an advertising campaign on TV and radio, they just might be scared out of using file trading services.
I have seen a spam message where they used a CSS stylesheet retrieved from a CGI script to track messages. I'm not sure, but I think that technique may even work against Mozilla with image retrieval turned off.
Now, here's a funny story. I was at the FTC Spam Forum a while back. There were some of the more responsible email marketers there -- you know, the ones that send out regular newsletters for opt-in subscribers -- and they were whining and complaining because spammers have spoiled "rich" email for them. Just a few years back they had visions of eventually being able to send email with flash, animated graphics, fancy styles, and so forth. And now they realize that people don't want to receive those kinds of emails because of spam (and to some extent viruses). So they whined about it. I guess for them email is "push" marketing, while for the rest of us, email is a way to communicate with co-workers and friends. Who needs HTML to say "wanna go get some lunch?"
Because of the GPL, they can't sell Linux licenses. They could ask anyone who has used Linux to pay them for past damages, which is like extortion, IMHO.
One would think that they would hope to boost sales of their Unix products as a result of Linux FUD. But this is where it gets really interesting. A good alternative to SCO's Unix products is Solaris x86. And Sun has none of the restrictions that other Unix vendors have. So, if SCO makes everyone mad, and succeeds at forcing companies to stop using Linux, they can use Solaris x86, or even FreeBSD. My point is, that there are proprietary alternatives to SCO's Unix products available for any CIOs that decide to go that route.
This is absolutely wrong. It is about destroying Linux. Linux is under GPL. Therefore, if any company thought they could pay a royalty to SCO and continue to use Linux, they are deceived. If there is such code in Linux, then to use that code would violate the GPL. Therefore, if SCO cares at all about IP, it must tell everyone to stop using Linux. If SCO tells them to continue to use Linux, but pay a royalty, then they show that they don't care about intellectual property, unless its their intellectual property. How hypocritical. SCO really is trying to destroy Linux.
This whole SCO vs. Linux situation is so full of FUD. The analysts saw as much as 80 lines of code that appeared to be identical. So, out of millions of lines of code, perhaps a fraction of one percent appears very similar to the Unix code. Yet, according to SCO, that must be a very critical fraction of one percent, because
What's going on here? I think SCO is trying to imply that the code that is the same in Linux and Unix is randomly sampled, meaning that we can then infer that a much larger portion of the Unix code was copied. How else can we explain it? How could stealing -- if it is in fact stealing -- a fraction of one percent of the code base result in "severely" damaging SCO's intellectual property? No, it must be much more than a fraction of one percent. The inference that a much larger portion of Unix code was copied is intended to spread FUD.
So, some analysts saw sections of as much as 80 lines of code that appear to be copied, and then conclude
There is a serious disconnect here. A few hundred lines of code may have been copied from Unix into Linux -- that being a small fraction of one percent -- and analysts conclude that because of that IBM should pay SCO $1 billion!? Huh?
Someone has to stand up to the rights of the thousands of developers who put in volunteer time to make their contribution to the Linux code base, agreeing to license their code under GPL. They didn't ask for SCO's Unix code to be mixed with their own code. SCO is showing no respect for the intellectual property of those developers. Rather, SCO is trying to make billions off intellectual property that SCO does not own, all the while preaching the morality of respecting intellectual property.
Software applications today are more complex than ever. They can only continue to become more complex if there is significant code reuse. In fact, there is more code reuse today than ever before. The code that is reused is not necessary at the level of libraries, but at a much larger scale. The best examples of code reuse:
You could look for the message ID of a message you sent in the header fields of received messages (specifically, the in-reply-to header field). If you find it, it means that the received message is likely to be a reply to a message you sent.
You could look for a phrase from your signature, which could indicate that someone sent a reply and included your original message.
Besides the words in your signature, you could program in certain other words that automatically trigger a classification as non-spam. Those words might include the names of trademarked products that your company sells or similar types of words. Of course, this is just overriding some of the learning that presumably would happen automatically. But if these are very important words, then you must insist that nothing else the filter does can override the classification as non-spam, and thereby avoid false positives.
In summary, I think that bayesian classifiers, as Paul Graham proposes them, are just too naive. The addition of a few heuristics could make a big difference.