Hey, I'm not a huge fan of the Iraq war but I don't need to pay 10 bucks to go to a political convention, ehr, movie theater to see some dufus Moore tell me about it. He has been on my bad side ever since Bowling for Columbine (I'm *FROM* Littleton Colorado, about 5 miles from Columbine High School) and his self-righteous "I won an Oscar and, d*** it, that makes me eligible to have valuable political input on international affairs and turn an enjoyable entertainment event into a political circus so that on my Bowling for Columbine DVD they can write the slogan 'From the man who defied Bush'" appearance at the Oscars.
The man had a limited interest in facts in Bowling for Columbine and an obvious agenda for which Columbine was exploited to promote, I hear that same accusation was made for a previous "documentary" that I didn't see nor do I remember its name, and it sounds like he used the same format in 911... clips of reality with interviews of people interspersed. That doesn't make it right or accurate and certainly doesn't mean it's fair. It presents Moore's political views, just like Bowling for Columbine did.
Moore is a yellow "journalist" that turns "the high drama of life into a cheap melodrama that leads to stories being twisted into the forms best suited for sales by the hollering newsboy." Moore looks for and exploits controversy and the hardships of others for personal gain and I, for one, do not plan on rewarding that kind of movie-making with my money. I'm sure millions of others will, though, so Moore's ego will grow even larger, his pompous attitude even more pronounced, and his general level of annoyance even higher. And I'm sure he'll get another Oscar next year and he'll probably have to make some political commentary there, too--either accusing the American public of being blind and reelecting Bush, or taking credit for Kerry being the new president. That's my prediction and come the Oscar's we'll see if I'm right.
That's besides the point, anyway. We're talking about competition on the last-mile copper pair that terminates in consumer households.
True, but I'd say there are limits to what the Baby Bells can charge for the last mile when other delivery forms (cable, satellite) are there too. They can price themselves out of the market even if the alternative providers don't use their last mile of copper.
We can go 'round and 'round with opinions, though. I'm going to wait to see how the price is affected. My prediction is that the consumer will be adversely affected.
I agree, we can go round and round. We'll have to see what prices do.
In addition to the Baby Bells, many locations have a cable Internet alternative and pretty much all locations in the U.S. have a 2-way satellite Internet alternative. Competition already exists, much of which the Baby Bells don't control.
Therefore, when you exit regulation the natural reaction is to raise prices, let service fall off and enjoy your freedom.
That might (maybe) make some amount of sense if you are a monopoly, but the logic falls apart if there's competition which forces you to innovate and keeps your prices in check.
And you don't think that price decrease is because the technology is pretty much ubiquitous and widely used?
No, that has very little to do with it. Competition drives prices down, not whether or not the technology is widely used. If it were widespread usage that drove prices down Windows would be cheaper than Linux.:)
Depends what you want to use it for. GPS accuracy with commercial receivers will usually get you within about 5 feet with 90%+ confidence. This is fine for a car navigation system. It leaves something to be desired if you are 1) Wanting to use it for aircraft/landing navigation and 5 feet can put you way off centerline on approach or 2) Are using it for surveying purposes and knocking off 5 feet off the width of a plot of land that is 60 feet wide is going to make a big difference in how much land is available to the owner (and his neighbor!). That said, there are professional-grade GPS systems that I understand are significantly more accurate and significantly more expensive.
I don't know how this compares with the anticipated accuracy of Gailelo. It also sounds like Galielo's additional accuracy is only available if you want to pay the European piper a subscription fee while GPS accuracy is free of charge.
That's changed. They now have some very nice rest stops on US287 (the highway from Dallas to Amarillo). And, if push comes to shove, there's a town with a McDonalds every 30 miles or so.
You know, Texas rest stops are already really nice. I can't speak to west Texas, but around Houston/Dallas/San Antonio/Austin they already tend to be clean and well maintained.
I agree. In fact, I can attest to the rest stops all the way up the I-35 cooridor (Laredo, San Antonio, Austin, Waco, Ft. Worth) and then northwest on US-287 through the panhandle of Texas, northeastern New Mexico, and then up I-25 to Denver. All along the way the rest stops are excellent. They all feel clean and I certainly have never felt they were unsafe.
I was especially surprised on my last trip along that path to find a very elaborate and beautiful rest stop on US 287 about 2 hours northwest of Wichita Falls. US287 used to only have picnic areas which is not surprising since it's not an interstate... but the new rest stops were truly amazing. And one on each side of the highway even though they could have just built one and sent you crossing the highway since it has crossovers.
Umm, inflation and growth in the economy are 2 different things.
Of course they are, and both affect the national debt as expressed in real dollars and as a fraction of GDP, and neither having anything to do with Congress' fiscal responsibility or lack thereof. Just like I said.
Pretty much all the certifications are BS as far as I'm concerned. Heck, a college degree is of marginal value in this field since technology moves a heck of a lot faster than academia. I did that whole college thing. I didn't learn a thing. I could have taught most of my computer science professors (or at least been their colleagues) and I completely tested out of all the computer classes they gave me the option to test out of (still had to pay for them, of course).
But certifications? To me, a certificate such as MSCE and the like are a good indication that someone feels the need to make themselves look better than they are. Take your certifications and shove them where the sun don't shine--let me see some working solutions you have created. Not just on the job, but what have you done in your own time? I'd be far more interested in hiring someone that, on their own initiative, learned some topic at home and developed something based on that knowledge which demonstrates knowledge and ability in the field than someone who has a certification which means they went through the motions to get the certification.
Experience and examples of past work are gold. Just about everything else is Monopoly funny money and checks written against empty accounts. That's not to say that everyone with ceritications is an idiot, but I'm immediately skeptical of anyone that would mention such a certification prominently on their resume.
Unfortunately analyzing past technical work and accomplishments are beyond the capability of most HR departments.
The national debt in real dollars did decrease. But that doesn't seem to explain the discrepenacy, either.
That the debt briefly went down in "real dollars" while not decreasing in absolute dollars simply means that the economy briefly grew faster than the growth in the debt. As I said in another reply in this thread, that is absolutely no reflection on the fiscal responsibility of the politicians in power at the time that were still apparently spending more than they collected. It's simply a reflection of a rapidly growing economy immediately prior to its subsequent explosion.
If we were to subscribe to this logic we could just balance the budget, ignore the debt, and claim the debt is going down in "real dollars" because we haven't added to it.
Please show me where the national debt went down according to the U.S. Treasury. Your graph shows per capita debt in constant 2000 dollars. You're getting a slight downward curve based on factors that don't have anything to do with Congressional fiscal discipline and that in no way counters my assertion that neither party has demonstrated any budgetary discipline in years.
Besides, are you trying to say that Bush's *triple dip* recession was due to the bubble? What sort of wierd tripple-burst bubble are you claiming that we had? Ongoing wars? He started them! Afghanistan was arguably justified, but the biggest economic damage - Iraq - was soley the neocon's doing.
I think the others that replied to your post more than adequately answers this part of your troll. Get over the Bush-bashing already. It got old about 3 years ago. I'm not even a Bush fan and I'm sick of the mindless Bush-bashing.
for some reason the surplus/deficit per year and the change in the national debt don't seem to match up. Anyways, there was an $87 billion surplus in 2000, and that's taking into account that we borrow $160 billion a year from Social Security (so if you count it the other way, it was more like $240-250 billion). We may run a deficit this year of over $700 billion. That's scary.
It's smoke and mirrors, nothing more. There was never a surplus. If there had been the debt would have gone down. It didn't. There was no surplus.
Is spending into deficit the sort of policy that you like? Is that *responsible*? If it is not fiscally responsible, which is the party of fiscal responsibility?
Neither party has reduced the national debt for decades. Congress is just fiscally irresponsible, period. It doesn't matter which party it is, neither has shown any capability in reducing the debt.
Clinton got a bump near the end of his presidency since he was riding the bubble, but even when we were supposedly running a "surplus" the debt never went down a single year. It didn't even stay constant a single year. Translation: We didn't really have a surplus. Just as Clinton got a bump from riding the bubble and basically doing nothing for 8 years, Bush has gotten slammed hard by the bursting bubble and ongoing wars.
Face reality, the debt is going to go up regardless of who wins in November.
Then your filter isn't so attuned to your ham as you think. You claimed that your filter knows your email so well that an incoming email doesn't just have to be neutral, but downright good. If you can email yourself an encyclopedia entry from a new, neutral account, that's not the case. At that point, you're not recognizing ham, because you can't. You're recognizing spam, and will be hampered somewhat by dilution.
Dilution doesn't work.:)
See below.
If I'm having 80% failure on encyclopedia attacks after 2 months, that's getting cose to worthless.
How many ham and spam are in your Bayesian statistics? It's obviously not the time that makes Bayesian improve in accuracy, it's the amount of data.
Me: An unknown ham will look like noise, and pure noise shouldn't be filtered by Bayesian--only spam.
You: That's contrary to what you stated earlier, and is precisely what I originally claimed. But if you start down that road, then dilution does become a problem - spams don't have to look like your ham, simply like noise, as your filter has to let noise through since it can't tell ham from noise.
Ok, either I'm missing your point or you are missing mine. I went back through the thread and I'm not sure where I contradicted myself. So let's try again. We have several possibilities:
1. Ham, which is what you know for a fact you want to see and is probably from people you've talked to before or on topics you normally discuss.
2. Spam, which is what you know for a fact you don't want to see.
3. Ham Noise, which is mail you want but it might be someone you've never heard from talking about an unusual topic that you don't usually talk about (although I would think the email you receive should either be from someone you've talked to before or on a topic you've discussed before. An unknown person emailing you out of the blue about a topic you never discuss strikes me as relatively unlikely, even for admins on the far side of the bell curve).
4. Spam with noise, which is spam which is definitely spam that you know you don't want to see, but has "noise" injected to "dilute" it.
I hope we can agree that the first two are the "extreme" cases and are easily recognized by Bayesian.
So the question is, is there a difference between "ham noise" and "spam with noise?" The answer is definitely yes.
If someone I've never heard from before sends me an email out of the blue discussing the meaning of life (which is unlikely to start with), that's ham noise. There's not going to be anything particularly innocent nor particularly damning about it. It's going to be quite neutral and a Bayesian filter is going to let that through unless the spam threshold is set aggresively low.
However, if a spammer sends a spam that's trying to sell me Viagra and is using standard spammer tricks (hiding dictionary attacks in white text, using red fonts to make their sales pitch stand out, including links to domains we've never seen before or using IP addresses instead of domains, using lots of HTML comments to break up words, etc.) and also embeds the exact same noise as the neutral message above, does that spam magically become neutral? Definitely not. Bayesian only looks at the most interesting aspects, or terms, of the message. While there wouldn't be anything particularly interesting in "ham noise" that would lead to a high spam score, a spam is going to be just as spammy with or without a bunch of neutral text. The neutral text would only "dilute" the spam score if every word is included in the spam probability calculation. I don't know of any Bayesian implementation that recommends that approach precisely for this reason.
You look at the 15 most interesting terms (at least in the Graham-advocated approach); those that are furthest from 0.50, so you're looking at only terms that are extremely spammy or extremely innocent. All that neutral text is
Me: Just because you have lots of different types of ham doesn't mean it's any harder for Bayesian to identify it.
You: Au contraire, given the way Bayes' rule works, a posteriori probabilities are intimately related to the statical variance.
P(spam|X) = P(X|spam)P(spam)/(P(X|spam)P(spam) + P(X|ham)P(ham)) is the adapted Bayes rule as it works with spam, where P refers to conditional or overall probabilities, and X is a given mail signature. For high-variance ham, the problematic term is P(X|ham), which will result in little difference between noise and ham. Put it this way - if you can email yourself a page from an encyclopedia (without spam) and it isn't flagged as spam, then your filter can't tell ham from noise.
Again, that's not a problem. If you mail yourself a page from an encyclopedia with no spam then it shouldn't be flagged as spam. The purpose of the Bayesian filter isn't to differentiate ham from noise, the purpose is to differentiate ham from spam. The only question is whether the insertion of noise in spam has any significant effect on the ability of a Bayesian filter to detect spam. It shouldn't, at least once it is properly trained.
As I have already conceded, the use of random words may prolong the training period somewhat in unusual situations such as yours where you receive a lot of mail from unknown senders talking about a large number of topics. But you are definitely out of the ordinary when compared to the bulk of email users. The use of random words may prolong your training period somewhat, but it's going to have almost no effect on a more typical user of email. Certainly, the use of random words cannot achieve the spammers' ultimate goal of defeating Bayesian or making it worthless.
If we have 10 descriptors, and each is even binary, and we need at least 10 datapoints per cell to get statistics, that means we need at least 10,000 messages. That should give some idea of the problem. Less variance makes the space more dense and inherently more manageable.
This is consistent with what I said earlier: The dilution caused by the spammers' use of random words may require that a new Bayesian user be patient for a longer period of time before Bayesian reaches optimum filtering levels. But I don't believe anything has contradicted my statement that a ham doesn't have to look like the rest of your ham for it to not be filtered. An unknown ham will look like noise, and pure noise shouldn't be filtered by Bayesian--only spam. So an unknown ham just has to look different than spam. And if your ham doesn't look different then spam, well, I feel for you.:)
The question is, is your filter identifying ham or spam? Also, what does a "good" message look like? If one has a diversity of "ham" relative to its population size, then it's hard to characterize them. At that point, the task of identifying spam is almost solely based on the characteristics of the spam, as ham can look like anything. If it's a reasonable assumption that ham is ill-defined, then masking can go a long way to getting spam through.
Just because you have lots of different types of ham doesn't mean it's any harder for Bayesian to identify it. In the end, it's a simple game of statistics. And it's a game that works very well. One ham doesn't have to look like all the other ham, it just has to look different than spam.
Additionally, people who receive lots of different types of ham are in the definite minority. The vast majority of email users have a relatively short list of contacts that'll eventually produce some fairly predictable ham. Those of us that have receive lots of email on lots of subjects from lots of never-written-before users are in the definite minority. And a minority of spam is going to get through our filters anyway. At that point the spammers will be targetting a minority of the minority, and that minority is extremely anti-spam... sounds like a losing business model to me.
On an emprical level, we have two observations: 1) your Bayesian filter is working fine with "encyclopedia" spams, and 2) mine isn't. I've been training mine for 2 months, and it catches 100% of word salads, and maybe 20% of "encyclopedia" spams. That's a real problem. I think 2 months training should certainly be enough. The question is, why is it not working, because it's clear that it's not. We'll probably agree that the root cause is that your database is older, broader, and better characterized. I would guess that this allows your ham to be better characterized, while mine is more fuzzy. In other words, my filter may be partially handicapped compared to yours.
My Bayesian corpus was started in May 2003--just over a year ago. It actually hovered around 99.5% for the first 3 months, then was in the 99.8x% range for about 4 months, and it hasn't dipped below 99.9% for the last 5 months and has been peaking at around 99.98%. My corpus has 9518 good messages and 133,466 spams. The few spams that get through these days are actually some bounces from viruses (which I don't count as spam nor do I report them as spam which is why they still get through from time to time), one or two foreign-language spams, and a few spams that were getting through because they were using whitelisted email addresses from the same domain (I have since modified the whitelist to work on the NAME of the person rather than the email address).
I agree with you, you probably just don't have a finely-tuned Bayesian filter yet. But that's not an inherent flaw in Bayesian, it's just a matter of being patient. If you keep with it Bayesian is going to work great for you--the dilution tactics might just mean that you have to be patient in training your Bayesian filter longer than was necessary a year ago. The end result will be the same, though.
Also, while you are training the Bayesian filter, alternative filters are definitely a plus. In the filter I developed and use (see sig line), the user has the option of enabling common keyword filters that has an updated list of known spam phrases, domains, etc. This helps detect spam while the Bayesian filter is still getting up to speed. Such standard filters are a very important part of helping tune the Bayesian filter initially without having to depend entirely on the user. Once the Bayesian filter is trained, the archaeic keyword filters can be disabled. At this point I don't use the keyword filters at all--I depend entirely on Bayesian.
Perhaps my Thbird filter is just too new - my old Mozilla database was huge, but I started over a few months ago.
I'm guessing that's it. Things like this will cause a much more severe reaction when the corpus is small.
Me: It doesn't matter if the encyclopedia entry "dominates" the spam text...
You: Not so sure about that. If a spam consisted of the words "Buy my viagra," that would be a spam. If those three words were interspersed through an article, I highly doubt it would be tagged as spam. So dilution should be a factor. I don't know exactly how Thbird implements it, but in standard Bayes theory, this is a problem.
It'd only be a problem if you're using some Bayesian filter that works on word pairs or context. The simple Bayesian filter proprosed by Graham almost two years ago is simply based on tokens. It doesn't matter where they appear in the body of the message, just that they appear. So the word "Buy my Viagra" is going to be identical to having those same words spread throughout the article. Considering spammers like to try to embed words in small fonts or white-on-white color, the simple approach proprosed by Graham makes much more sense than a more complicated multi-word Bayesian filter that looks for word combinations.
Me: It's not enough to be "neutral" you have to be downright good.
You: Only if you have the threshold on your filter cranked down pretty far.
I think that's wrong. I was going to say that your experience is very different than mine but, actually, I think that's wrong.
Due to the way Bayesian works, if you have 40k of completely neutral words and, say, 5 or 6 spammy words, that's going to get tagged as spam regardless of whether you set your threshold to 90%, 50%, or 30%. Neutral words that have a spam probability of, say, 50% just aren't going to be considered for determining whether a message is spam or not. Those words lose importance in the spam decision. The best the spammer could do is try to dilute so many words that all your words were "neutral" and no words were "good" and, thus, it'd be impossible to determine spaminess since no word would be particularly good or spammy. But in reality it's not possible to dilute the value of all words, and diluting the value of the good words is particularly difficult since those same words will be getting flagged as spammy by other users who don't have those same words as "good" words. Not to mention they don't know what your good words are to start with.
Me: Unless they can send a messager with headers that are close to what my friends' mails' have, unless they know my friends' names, unless they know the topics I often discuss, they're just not going to be able to break through my Bayesian filter by "swamping" it with neutral text. It just doesn't make a difference.
You: Then you've implemented your filter to approximate a whitelist, while most people implement theirs to be more like a blacklist. Particularly for those of us who need to be reachable by people who have never emailed us before, cranking down the level that far isn't an option. As such, neutral things have to be classified more as ham than spam.
Uh, no, sorry. Perhaps I misstated myself. If a spammer wants to get through, he is going to have to do the above (know my friends name, topics I discuss, etc.) to get their spam through and probably have to lose most of the content of the spam he wants me to see. If he wants to tell me to "Buy my Viagra" at the very least he's going to have to know some characteristics of my "good" words and even then he's going to have a hard time getting through if he's talking about Viagra, using red font color, etc. A completely neutral, non-spam message from someone I've never heard from before is going to be neutral and, as such, won't be filtered. Very, very few spams are "neutral." Even when they trying to dilute Bayesian filters by using random words, their messages are still very
It may not increase false negatives, but it has decent chances of increasing false positives which is a much greater problem. My best guess is that spammers are hoping that once enough random words are classified as spam words, real emails with those words will start being classified as spam. If they can force enough false positives, people will start turning off bayesian filtering.
That won't work. Please review other responses regarding Bayesian and/or read some papers on Bayesian filtering. Once you understand how it works you will see why this approach can't work. If you want me to explain it to you, I will, but it would be redundant. It's been explained many times before.
It would appear, though, that they're not very bright yet -- they're not targeting the low-scoring words. I expect that'll change before too long. What'll happen to your filter when all of the lowest scoring words it knows suddenly become the highest-scoring?
How in the world are the spammers going to target my low-scoring words? Let's see, some of my low-scoring words:
1. Header "EDS". Probably because I know someone that works at EDS.
2. Header "BAY1". Who knows where that comes from, but one of my frequent contacts must have that in a header.
3. Body "ADC". Probably because I talked about A/D Converters from time to time.
4. Body "BCD". Probably because I talk about Binary Coded Decimal from time to time.BR
6. Body "GND". Probably talking about electrical grounds.
Anyway, that's a few of my sub-1% Bayesian tokens. How does that compare with yours? Or the low-scoring tokens of an accountant? Very little overlap I'd suspect. So how in the world is a spammer going to target low-scoring terms? If they knew them then they'd just slide their spam right past these filters. But they don't know them, they can't know them, and even if they somehow hacked into your system and got your Bayesian statistics, it won't help them get past anyone elses.
Random words and text insertion basically represents spammers kicking and flailing as they drown in the sea of Bayesian anti-spam filters.
What they do is copy an encyclopedia entry and put it at the bottom of their spam. The thing is usually a few paragraphs long, so that textually it dominates the message. The subjects are fairly random, and are occasionally educational.The problem is that the text of this doesn't trip the "too many strange words" flag that's used for word salads. My Thunderbird filter is really having trouble with these. Anyone else having trouble with these spams?
I've seen excerpts from books, the Constitution, etc. I haven't had a message like that get past my filter ever, as far as I know. Unless they got dang lucky and sent you an encyclopedia entry for a topic you often discuss it shouldn't have any significant effect. It doesn't matter if the encyclopedia entry "dominates" the spam text. If the spam is spammy and the encyclopedia text is "neutral" (which it will be unless the spammer gets lucky and picks a topic you often discuss) then all the neutral words in the world aren't going to compensate for a few good spammy words. It's not enough to be "neutral" you have to be downright good. Unless they can send a messager with headers that are close to what my friends' mails' have, unless they know my friends' names, unless they know the topics I often discuss, they're just not going to be able to break through my Bayesian filter by "swamping" it with neutral text. It just doesn't make a difference.
Re:One of the best things Google/GMail could do
on
Gmail Spam Filter Testing
·
· Score: 5, Insightful
Spammer is trying to do two things: 1. break any Bayesian filter used on that mail server/inbox. Adding noise to the filter will allow more mail through as "questionable". This might still be tagged as spam, but not as readily as it would be without the added noise
Except that won't work, as anyone that understands Bayesian filtering will tell you. In the case of every message with "random words" I've checked recently, the random words actually increased the spam score of that message. Why? Because it seems the random words aren't so random and either the same spammer is using the same "random words" over and over or various spammers are using sets of the same words. Over time most of the "random words" they use actually become great indicators of spam since my real email doesn't typically contain the random words they use.
In one recent analysis, 10 random words were inserted by the spammer. He got lucky and 1 of those words actually had a very low score for my Bayesian corpus. Unfortunately (for him), the other 9 words had scores of 99.99%! His use of random words literally nuked any possibility of him getting through my filter.
Anyway, random words will not help spammers get through Bayesian filters. But it seems that many people (both spammers and non-spammers) think it will. But, hey, that's good for me: as long as "random words" is seen by spammers as a viable solution to Bayesian filters, my Bayesian filter will continue to work and will not have to deal with any innovative way to get around the filter (if any exists).
You know, I had this exact same idea several years ago but I figured it couldn't possibly be [b]that[/b] obvious so I figured I was just wrong. Rats.:)
But if the trojans are sufficiently capable of reading an Outlook mail folder and extracting email addresses, surely they could easily look up the SMTP servers configured?
Simple. ISPs should throttle users on their SMTP servers. Say, maximum 10 messages every 15 minutes with a maxmum of 500 messages in a 24 hour period. If it exceeds that further SMTP transactions are prevented until either the customer calls in and specficially asks for a higher daily quote of SMTP transactions or until the time period expires and he can send again.
* The numbers I used are examples. You'd probably want to fine tune these numbers based on how many emails a typical user normally sends, etc. And perhaps business customers would receive a larger quota, etc. But the logic itself makess sense. If you want to avoid these limitations then get your own dedicated server somewhere for $100/month. The spammers will, but at least you won't have 80% of the spam coming from zombied residential PCs.
Of course, I don't think that they realize that if I manage to find a way to get out of the way, the person in front of me, or the person in front of them, or the person in front of them, might not. (think full lanes) Usually the first thing that pops in my head when someone does that is, "Who the hell do you think *you* are?"
No kidding. I personally don't hang out in the passing lane. I use them to pass and then move right. But it's amazing in the city on major multi-lane roads (not highways) where there really isn't a "passing lane" but just a bunch of lanes that pretty much all 3 lanes will be going 40mph and there are cars in front of me in all lanes as far as the eye can see and some dufus comes up behind me, tailgating me, flashing his lights suggesting that I get out of the way. WTF? Where in the world does he think he's going to go if I get out of the way, one car-length ahead?
It's always amusing seeing idiots doing that or weaving in and out of rush-hour traffic only to pull up next to them again at the next stop light. A lot of good all that pressure, light flashing, and weaving did them.
The man had a limited interest in facts in Bowling for Columbine and an obvious agenda for which Columbine was exploited to promote, I hear that same accusation was made for a previous "documentary" that I didn't see nor do I remember its name, and it sounds like he used the same format in 911... clips of reality with interviews of people interspersed. That doesn't make it right or accurate and certainly doesn't mean it's fair. It presents Moore's political views, just like Bowling for Columbine did.
Moore is a yellow "journalist" that turns "the high drama of life into a cheap melodrama that leads to stories being twisted into the forms best suited for sales by the hollering newsboy." Moore looks for and exploits controversy and the hardships of others for personal gain and I, for one, do not plan on rewarding that kind of movie-making with my money. I'm sure millions of others will, though, so Moore's ego will grow even larger, his pompous attitude even more pronounced, and his general level of annoyance even higher. And I'm sure he'll get another Oscar next year and he'll probably have to make some political commentary there, too--either accusing the American public of being blind and reelecting Bush, or taking credit for Kerry being the new president. That's my prediction and come the Oscar's we'll see if I'm right.
True, but I'd say there are limits to what the Baby Bells can charge for the last mile when other delivery forms (cable, satellite) are there too. They can price themselves out of the market even if the alternative providers don't use their last mile of copper.
We can go 'round and 'round with opinions, though. I'm going to wait to see how the price is affected. My prediction is that the consumer will be adversely affected.
I agree, we can go round and round. We'll have to see what prices do.
That might (maybe) make some amount of sense if you are a monopoly, but the logic falls apart if there's competition which forces you to innovate and keeps your prices in check.
No, that has very little to do with it. Competition drives prices down, not whether or not the technology is widely used. If it were widespread usage that drove prices down Windows would be cheaper than Linux. :)
I don't know how this compares with the anticipated accuracy of Gailelo. It also sounds like Galielo's additional accuracy is only available if you want to pay the European piper a subscription fee while GPS accuracy is free of charge.
I agree. In fact, I can attest to the rest stops all the way up the I-35 cooridor (Laredo, San Antonio, Austin, Waco, Ft. Worth) and then northwest on US-287 through the panhandle of Texas, northeastern New Mexico, and then up I-25 to Denver. All along the way the rest stops are excellent. They all feel clean and I certainly have never felt they were unsafe.
I was especially surprised on my last trip along that path to find a very elaborate and beautiful rest stop on US 287 about 2 hours northwest of Wichita Falls. US287 used to only have picnic areas which is not surprising since it's not an interstate... but the new rest stops were truly amazing. And one on each side of the highway even though they could have just built one and sent you crossing the highway since it has crossovers.
Of course they are, and both affect the national debt as expressed in real dollars and as a fraction of GDP, and neither having anything to do with Congress' fiscal responsibility or lack thereof. Just like I said.
But certifications? To me, a certificate such as MSCE and the like are a good indication that someone feels the need to make themselves look better than they are. Take your certifications and shove them where the sun don't shine--let me see some working solutions you have created. Not just on the job, but what have you done in your own time? I'd be far more interested in hiring someone that, on their own initiative, learned some topic at home and developed something based on that knowledge which demonstrates knowledge and ability in the field than someone who has a certification which means they went through the motions to get the certification.
Experience and examples of past work are gold. Just about everything else is Monopoly funny money and checks written against empty accounts. That's not to say that everyone with ceritications is an idiot, but I'm immediately skeptical of anyone that would mention such a certification prominently on their resume.
Unfortunately analyzing past technical work and accomplishments are beyond the capability of most HR departments.
That the debt briefly went down in "real dollars" while not decreasing in absolute dollars simply means that the economy briefly grew faster than the growth in the debt. As I said in another reply in this thread, that is absolutely no reflection on the fiscal responsibility of the politicians in power at the time that were still apparently spending more than they collected. It's simply a reflection of a rapidly growing economy immediately prior to its subsequent explosion.
If we were to subscribe to this logic we could just balance the budget, ignore the debt, and claim the debt is going down in "real dollars" because we haven't added to it.
Besides, are you trying to say that Bush's *triple dip* recession was due to the bubble? What sort of wierd tripple-burst bubble are you claiming that we had? Ongoing wars? He started them! Afghanistan was arguably justified, but the biggest economic damage - Iraq - was soley the neocon's doing.
I think the others that replied to your post more than adequately answers this part of your troll. Get over the Bush-bashing already. It got old about 3 years ago. I'm not even a Bush fan and I'm sick of the mindless Bush-bashing.
It's smoke and mirrors, nothing more. There was never a surplus. If there had been the debt would have gone down. It didn't. There was no surplus.
Neither party has reduced the national debt for decades. Congress is just fiscally irresponsible, period. It doesn't matter which party it is, neither has shown any capability in reducing the debt.
Clinton got a bump near the end of his presidency since he was riding the bubble, but even when we were supposedly running a "surplus" the debt never went down a single year. It didn't even stay constant a single year. Translation: We didn't really have a surplus. Just as Clinton got a bump from riding the bubble and basically doing nothing for 8 years, Bush has gotten slammed hard by the bursting bubble and ongoing wars.
Face reality, the debt is going to go up regardless of who wins in November.
Dilution doesn't work. :)
See below.
If I'm having 80% failure on encyclopedia attacks after 2 months, that's getting cose to worthless.
How many ham and spam are in your Bayesian statistics? It's obviously not the time that makes Bayesian improve in accuracy, it's the amount of data.
Me: An unknown ham will look like noise, and pure noise shouldn't be filtered by Bayesian--only spam.
You: That's contrary to what you stated earlier, and is precisely what I originally claimed. But if you start down that road, then dilution does become a problem - spams don't have to look like your ham, simply like noise, as your filter has to let noise through since it can't tell ham from noise.
Ok, either I'm missing your point or you are missing mine. I went back through the thread and I'm not sure where I contradicted myself. So let's try again. We have several possibilities:
1. Ham, which is what you know for a fact you want to see and is probably from people you've talked to before or on topics you normally discuss.
2. Spam, which is what you know for a fact you don't want to see.
3. Ham Noise, which is mail you want but it might be someone you've never heard from talking about an unusual topic that you don't usually talk about (although I would think the email you receive should either be from someone you've talked to before or on a topic you've discussed before. An unknown person emailing you out of the blue about a topic you never discuss strikes me as relatively unlikely, even for admins on the far side of the bell curve).
4. Spam with noise, which is spam which is definitely spam that you know you don't want to see, but has "noise" injected to "dilute" it.
I hope we can agree that the first two are the "extreme" cases and are easily recognized by Bayesian.
So the question is, is there a difference between "ham noise" and "spam with noise?" The answer is definitely yes.
If someone I've never heard from before sends me an email out of the blue discussing the meaning of life (which is unlikely to start with), that's ham noise. There's not going to be anything particularly innocent nor particularly damning about it. It's going to be quite neutral and a Bayesian filter is going to let that through unless the spam threshold is set aggresively low.
However, if a spammer sends a spam that's trying to sell me Viagra and is using standard spammer tricks (hiding dictionary attacks in white text, using red fonts to make their sales pitch stand out, including links to domains we've never seen before or using IP addresses instead of domains, using lots of HTML comments to break up words, etc.) and also embeds the exact same noise as the neutral message above, does that spam magically become neutral? Definitely not. Bayesian only looks at the most interesting aspects, or terms, of the message. While there wouldn't be anything particularly interesting in "ham noise" that would lead to a high spam score, a spam is going to be just as spammy with or without a bunch of neutral text. The neutral text would only "dilute" the spam score if every word is included in the spam probability calculation. I don't know of any Bayesian implementation that recommends that approach precisely for this reason.
You look at the 15 most interesting terms (at least in the Graham-advocated approach); those that are furthest from 0.50, so you're looking at only terms that are extremely spammy or extremely innocent. All that neutral text is
You: Au contraire, given the way Bayes' rule works, a posteriori probabilities are intimately related to the statical variance. P(spam|X) = P(X|spam)P(spam)/(P(X|spam)P(spam) + P(X|ham)P(ham)) is the adapted Bayes rule as it works with spam, where P refers to conditional or overall probabilities, and X is a given mail signature. For high-variance ham, the problematic term is P(X|ham), which will result in little difference between noise and ham. Put it this way - if you can email yourself a page from an encyclopedia (without spam) and it isn't flagged as spam, then your filter can't tell ham from noise.
Again, that's not a problem. If you mail yourself a page from an encyclopedia with no spam then it shouldn't be flagged as spam. The purpose of the Bayesian filter isn't to differentiate ham from noise, the purpose is to differentiate ham from spam. The only question is whether the insertion of noise in spam has any significant effect on the ability of a Bayesian filter to detect spam. It shouldn't, at least once it is properly trained.
As I have already conceded, the use of random words may prolong the training period somewhat in unusual situations such as yours where you receive a lot of mail from unknown senders talking about a large number of topics. But you are definitely out of the ordinary when compared to the bulk of email users. The use of random words may prolong your training period somewhat, but it's going to have almost no effect on a more typical user of email. Certainly, the use of random words cannot achieve the spammers' ultimate goal of defeating Bayesian or making it worthless.
If we have 10 descriptors, and each is even binary, and we need at least 10 datapoints per cell to get statistics, that means we need at least 10,000 messages. That should give some idea of the problem. Less variance makes the space more dense and inherently more manageable.
This is consistent with what I said earlier: The dilution caused by the spammers' use of random words may require that a new Bayesian user be patient for a longer period of time before Bayesian reaches optimum filtering levels. But I don't believe anything has contradicted my statement that a ham doesn't have to look like the rest of your ham for it to not be filtered. An unknown ham will look like noise, and pure noise shouldn't be filtered by Bayesian--only spam. So an unknown ham just has to look different than spam. And if your ham doesn't look different then spam, well, I feel for you. :)
Just because you have lots of different types of ham doesn't mean it's any harder for Bayesian to identify it. In the end, it's a simple game of statistics. And it's a game that works very well. One ham doesn't have to look like all the other ham, it just has to look different than spam.
Additionally, people who receive lots of different types of ham are in the definite minority. The vast majority of email users have a relatively short list of contacts that'll eventually produce some fairly predictable ham. Those of us that have receive lots of email on lots of subjects from lots of never-written-before users are in the definite minority. And a minority of spam is going to get through our filters anyway. At that point the spammers will be targetting a minority of the minority, and that minority is extremely anti-spam... sounds like a losing business model to me.
On an emprical level, we have two observations: 1) your Bayesian filter is working fine with "encyclopedia" spams, and 2) mine isn't. I've been training mine for 2 months, and it catches 100% of word salads, and maybe 20% of "encyclopedia" spams. That's a real problem. I think 2 months training should certainly be enough. The question is, why is it not working, because it's clear that it's not. We'll probably agree that the root cause is that your database is older, broader, and better characterized. I would guess that this allows your ham to be better characterized, while mine is more fuzzy. In other words, my filter may be partially handicapped compared to yours.
My Bayesian corpus was started in May 2003--just over a year ago. It actually hovered around 99.5% for the first 3 months, then was in the 99.8x% range for about 4 months, and it hasn't dipped below 99.9% for the last 5 months and has been peaking at around 99.98%. My corpus has 9518 good messages and 133,466 spams. The few spams that get through these days are actually some bounces from viruses (which I don't count as spam nor do I report them as spam which is why they still get through from time to time), one or two foreign-language spams, and a few spams that were getting through because they were using whitelisted email addresses from the same domain (I have since modified the whitelist to work on the NAME of the person rather than the email address).
I agree with you, you probably just don't have a finely-tuned Bayesian filter yet. But that's not an inherent flaw in Bayesian, it's just a matter of being patient. If you keep with it Bayesian is going to work great for you--the dilution tactics might just mean that you have to be patient in training your Bayesian filter longer than was necessary a year ago. The end result will be the same, though.
Also, while you are training the Bayesian filter, alternative filters are definitely a plus. In the filter I developed and use (see sig line), the user has the option of enabling common keyword filters that has an updated list of known spam phrases, domains, etc. This helps detect spam while the Bayesian filter is still getting up to speed. Such standard filters are a very important part of helping tune the Bayesian filter initially without having to depend entirely on the user. Once the Bayesian filter is trained, the archaeic keyword filters can be disabled. At this point I don't use the keyword filters at all--I depend entirely on Bayesian.
I'm guessing that's it. Things like this will cause a much more severe reaction when the corpus is small.
Me: It doesn't matter if the encyclopedia entry "dominates" the spam text...
You: Not so sure about that. If a spam consisted of the words "Buy my viagra," that would be a spam. If those three words were interspersed through an article, I highly doubt it would be tagged as spam. So dilution should be a factor. I don't know exactly how Thbird implements it, but in standard Bayes theory, this is a problem.
It'd only be a problem if you're using some Bayesian filter that works on word pairs or context. The simple Bayesian filter proprosed by Graham almost two years ago is simply based on tokens. It doesn't matter where they appear in the body of the message, just that they appear. So the word "Buy my Viagra" is going to be identical to having those same words spread throughout the article. Considering spammers like to try to embed words in small fonts or white-on-white color, the simple approach proprosed by Graham makes much more sense than a more complicated multi-word Bayesian filter that looks for word combinations.
Me: It's not enough to be "neutral" you have to be downright good.
You: Only if you have the threshold on your filter cranked down pretty far.
I think that's wrong. I was going to say that your experience is very different than mine but, actually, I think that's wrong.
Due to the way Bayesian works, if you have 40k of completely neutral words and, say, 5 or 6 spammy words, that's going to get tagged as spam regardless of whether you set your threshold to 90%, 50%, or 30%. Neutral words that have a spam probability of, say, 50% just aren't going to be considered for determining whether a message is spam or not. Those words lose importance in the spam decision. The best the spammer could do is try to dilute so many words that all your words were "neutral" and no words were "good" and, thus, it'd be impossible to determine spaminess since no word would be particularly good or spammy. But in reality it's not possible to dilute the value of all words, and diluting the value of the good words is particularly difficult since those same words will be getting flagged as spammy by other users who don't have those same words as "good" words. Not to mention they don't know what your good words are to start with.
Me: Unless they can send a messager with headers that are close to what my friends' mails' have, unless they know my friends' names, unless they know the topics I often discuss, they're just not going to be able to break through my Bayesian filter by "swamping" it with neutral text. It just doesn't make a difference.
You: Then you've implemented your filter to approximate a whitelist, while most people implement theirs to be more like a blacklist. Particularly for those of us who need to be reachable by people who have never emailed us before, cranking down the level that far isn't an option. As such, neutral things have to be classified more as ham than spam.
Uh, no, sorry. Perhaps I misstated myself. If a spammer wants to get through, he is going to have to do the above (know my friends name, topics I discuss, etc.) to get their spam through and probably have to lose most of the content of the spam he wants me to see. If he wants to tell me to "Buy my Viagra" at the very least he's going to have to know some characteristics of my "good" words and even then he's going to have a hard time getting through if he's talking about Viagra, using red font color, etc. A completely neutral, non-spam message from someone I've never heard from before is going to be neutral and, as such, won't be filtered. Very, very few spams are "neutral." Even when they trying to dilute Bayesian filters by using random words, their messages are still very
That won't work. Please review other responses regarding Bayesian and/or read some papers on Bayesian filtering. Once you understand how it works you will see why this approach can't work. If you want me to explain it to you, I will, but it would be redundant. It's been explained many times before.
How in the world are the spammers going to target my low-scoring words? Let's see, some of my low-scoring words:
1. Header "EDS". Probably because I know someone that works at EDS.
2. Header "BAY1". Who knows where that comes from, but one of my frequent contacts must have that in a header.
3. Body "ADC". Probably because I talked about A/D Converters from time to time.
4. Body "BCD". Probably because I talk about Binary Coded Decimal from time to time.BR 6. Body "GND". Probably talking about electrical grounds.
Anyway, that's a few of my sub-1% Bayesian tokens. How does that compare with yours? Or the low-scoring tokens of an accountant? Very little overlap I'd suspect. So how in the world is a spammer going to target low-scoring terms? If they knew them then they'd just slide their spam right past these filters. But they don't know them, they can't know them, and even if they somehow hacked into your system and got your Bayesian statistics, it won't help them get past anyone elses.
Random words and text insertion basically represents spammers kicking and flailing as they drown in the sea of Bayesian anti-spam filters.
I've seen excerpts from books, the Constitution, etc. I haven't had a message like that get past my filter ever, as far as I know. Unless they got dang lucky and sent you an encyclopedia entry for a topic you often discuss it shouldn't have any significant effect. It doesn't matter if the encyclopedia entry "dominates" the spam text. If the spam is spammy and the encyclopedia text is "neutral" (which it will be unless the spammer gets lucky and picks a topic you often discuss) then all the neutral words in the world aren't going to compensate for a few good spammy words. It's not enough to be "neutral" you have to be downright good. Unless they can send a messager with headers that are close to what my friends' mails' have, unless they know my friends' names, unless they know the topics I often discuss, they're just not going to be able to break through my Bayesian filter by "swamping" it with neutral text. It just doesn't make a difference.
Except that won't work, as anyone that understands Bayesian filtering will tell you. In the case of every message with "random words" I've checked recently, the random words actually increased the spam score of that message. Why? Because it seems the random words aren't so random and either the same spammer is using the same "random words" over and over or various spammers are using sets of the same words. Over time most of the "random words" they use actually become great indicators of spam since my real email doesn't typically contain the random words they use.
In one recent analysis, 10 random words were inserted by the spammer. He got lucky and 1 of those words actually had a very low score for my Bayesian corpus. Unfortunately (for him), the other 9 words had scores of 99.99%! His use of random words literally nuked any possibility of him getting through my filter.
Anyway, random words will not help spammers get through Bayesian filters. But it seems that many people (both spammers and non-spammers) think it will. But, hey, that's good for me: as long as "random words" is seen by spammers as a viable solution to Bayesian filters, my Bayesian filter will continue to work and will not have to deal with any innovative way to get around the filter (if any exists).
Simple. ISPs should throttle users on their SMTP servers. Say, maximum 10 messages every 15 minutes with a maxmum of 500 messages in a 24 hour period. If it exceeds that further SMTP transactions are prevented until either the customer calls in and specficially asks for a higher daily quote of SMTP transactions or until the time period expires and he can send again.
* The numbers I used are examples. You'd probably want to fine tune these numbers based on how many emails a typical user normally sends, etc. And perhaps business customers would receive a larger quota, etc. But the logic itself makess sense. If you want to avoid these limitations then get your own dedicated server somewhere for $100/month. The spammers will, but at least you won't have 80% of the spam coming from zombied residential PCs.
No kidding. I personally don't hang out in the passing lane. I use them to pass and then move right. But it's amazing in the city on major multi-lane roads (not highways) where there really isn't a "passing lane" but just a bunch of lanes that pretty much all 3 lanes will be going 40mph and there are cars in front of me in all lanes as far as the eye can see and some dufus comes up behind me, tailgating me, flashing his lights suggesting that I get out of the way. WTF? Where in the world does he think he's going to go if I get out of the way, one car-length ahead?
It's always amusing seeing idiots doing that or weaving in and out of rush-hour traffic only to pull up next to them again at the next stop light. A lot of good all that pressure, light flashing, and weaving did them.