I worked at an R & D lab and our policy was that any system (laptops mainly) that could be expected to leave the physical security of the building had to have all data encrypted. We used a program that encrypted the entire harddrive and then required a passkey in order to decrypt at boot. At the time I left they had not yet got as far as instituting such a policy for flash drives, though I expect they have by now.
This won't protect against a malicious employee or a determined attacker, but should fix the problem of data left around accidently.
This law will be a lot of fun when someone writes a virus or worm that goes around looking for open shares and uploads movies or parts of movies to them.
I'm sure theres still some Windows security flaws left that would let a worm run some code to open a share and download a movie from one of the worm's previous victims, so a victim wouldn't even need to leave shares open for this to happen.
Its just a matter of time really since it would be so nasty. Worse than deleting all your files, worse than calling 911 on your modem and getting you in trouble, could you imagine the panic when the computer virus that lands you three years in jail gets out in the mainstream news?
I'm sure there's some bored and skilled loser out there who would want the infamy of doing this.
Well, if every car had one of these devices, the traffic lights could be programmed to switch intelligently based on the approaching traffic.
I hate late at night, when the lights green as no one is going through, and then just as a few cars get to the light it turns red even though there are no cars waiting to go the other way, and then when a car finally approaches, the light turns back. What a waste. Some lights have pressure sensors, but they only can tell if cars are currently waiting. Something that could tell the light when traffic was approaching, how far back it was and how heavy it was, we could have much better traffic lights.
But don't you think this is an attempt at intimidation rather than a real lawsuit?
Thats possible, but I think it may actually be an attempt to muddy the story a bit. The story as its been in the media has been pretty simple, and the supposed security is so incredibly simple that everyone sees it as a sham.
I think they are hoping to turn a corner both in press coverage and public perceptions and turn this into an 'evil hacker circumvented our technology illegally' from an 'assinine copy protection so monumentally stupid a toddler could bypass it'.
Monumental Stupidity doesn't cut it on Wall Street any more now like it did during the bubble. My only guess is that the company knew how weak its product was and had hoped to pull one over on the public.
Note how they are not suing Microsoft for designing this 'circumvention mechanism' into windows, but are suing the person who blew the whistle on their technological weakness.
You are correct. However a segway is not a battery. It is much more than that and has the capacity on board to handle low-battery conditions in a safe way. This is in fact, what the firmware upgrade that segway is putting in the recalled devices do.
If the segway doesn't have enough charge to operate safely, it will stop operating until charged.
Many electronic devices work this way, they will refuse to operate below a certain voltage, rather than try to "limp along" and possibly cause damage.
Guess what: machine runs out of fuel? It can't do its job. Duh...
You are, of course, correct-- but I think the issue here is graceful failure. Its one thing if my car stops moving when it runs out of gas. Its another thing if my car flips over on its roof and bursts into flames when it runs out of gas.
I think the issue here, is that if the Segway doesn't have enough juice for it to operate safely, it should stop moving completely.
When your car gets low on gas it doesn't deactivate the airbags, un-fasten the seatbelts, and turn off all the safety features to get that extra 600 feet.
Who is going to lose their job because telemarketers wont be allowed to call people who arent going to buy their products anyway?
I was confused by this too. I would have thought that the telemarketing industry would be thrilled to have a list of people who are not interested in their calls and will not buy from them. This way they can concentrate their efforts on calls where there is a chance of success and not waste their time on people who at best will just hang up on them, and possibly intentionally waste their time or abuse them.
However, the Telemarketing industry has realized that a substantial part of their revenue comes not from offering people a product that they want and filling a need, but in tricking people into making purchases of things that they don't want or need. They worry that the best 'marks' for their services are people who know they are too weak-willed to refuse to buy and will sign up for the list.
Basically, the industry wants to make sure they retain access to these people so they can continue to rip them off. It makes sense really, if the person was really interested in a product, they'd probably go out and buy it themselves. The whole telemarketing / salesman "hard sell" is about selling to those people who don't want or need the product in order to move more units.
Ironically, the people who sign up for the 'don't call' list may actually be a more fertile group for telemarketing activities than those who do not. (At least that seems to be the industry's worry)
You're replying directly to a posting that doesn't understand how fragmented words make the Bayesian filter's work easier, until you get down to one-letter fragments.
I wrote that parent post, and your comment leads me to believe that I was unclear in my implication.
None of the html techniques listed will get past the filter for ever. Given the same exact message twice, the second time, the filter will gobble it once it has been marked as spam the first time.
If the word 'penis' appears in spam constantly, the filter catches that. If the word PExNIS occurs, constantly, the filter catches that too.
But if the 'x' is replaced with a different substring in every spam, so the same token never appears in multiple spams, the standard tokenizers, whether n-tuplet based or word based will fail.
Even if I have in my spam box 1000 spams containing 'Penis',
A new email containing PcarEcatNcabIcanS provides no information to that filter if we are using tokens or n-tuplets as the features entered into the filter. Of course, once that string is seen once, it is learned.
Basically, if the stakes become large enough,
spammers can adapt just as fast as your Bayesian filter does. The filter can never adapt to find spamminess in a manner that can not be expressed by the features being input. It is not the n-tuplets in a message which make spam spam. Its the meaning and the human-understanding generated in the reader of that message. THe n-tuplets / words/ tokens, etc. are simply an approximation so we have something to feed the filter. To make a perfect spam filter, we'd need a feature extraction algorithm which produces the same features/qualities that the reader sees from the message.
Using Text Classification techniques in a spam filter is overall a good idea. (Bayesian systems are only one system for text classification, but they seem to be getting all the attention when it comes to spam)
The problem, though, is that they don't work on raw text. The text must first be 'featurized', using either a Feature Selection or Feature Extraction algorithm.
The 'Bayesian' part of anti-spam filters is pretty robust, and should theoretically be able to handle almost all tricks spammers through at them, but the current state of Feature Selection is pretty embryonic.
All of the tricks in the article fool the tokenizers currently used into producing features inconsistent between spams. No consistency == No classifier. The problem is that a email is not a 'bag of words', but we classify them as if they are.
What we need, is to extract features which are more similar to the types of features a human looking at the message would use to make the spam / not-spam determination.
There is a lot of ongoing research in this general area, but to the best of my knowledge, nothing has made it into spam filters yet.
In the mean time, a lot can be gained but running the Feature Selector / Bayesian filter on the email after its been rendered. Ideally, the filter needs to see exactly what the user will. Anything less is a disconnect between the two that will allow spammers to get to the user messages that get past the filters.
One good feature that could be extracted from an email and fed to a filter, would be statistical analysis results of rendered vs. not rendered text in the email. Look at the amount, type, and distribution of non rendered text, etc. in spam vs. ham
Adding extra modifications such as -prices or +review will definately help the precision of your search, finding a higher percentage of non-shopping sites, but it will also hurt your recall. What if the best, most in-depth review on-line for a particular product happens to contain the word prices?
Perhaps a line in the review "I've compared prices with similar products, and this one comes out on top"
You really do have to try all kinds of combinations, and scan through pages and pages of unrelated results to be sure not to miss something good.
This isn't a limitation of google, however, its really a limitation of keyword based searching.
If one of our servers was going around biting people, I'm not sure our sysadmins would know what to do with it either. Probably try to stay away from it and send in some disposable MCSE's to feed it.
Re:Not interested in being acquired?
on
Darl McBride Interview
·
· Score: 5, Interesting
"You go back to SCO's brand in the 1990s and it was Unix on Intel. SCO was primed to seize the multibillion-dollar server market of Unix on Intel that hit in the early 2000s that has in fact shifted over to Red Hat."
He's falling for a logical fallacy here. 'Unix on Intel' caught on largely because of Linux and its liberal licensing. No proprietary Unix vendor ever made substantial in roads in this area, and I doubt any would have. When ever asked about the benefits of Unix-on-Intel, the answers people give for it are the general openness of the platform and it being less expensive than a proprietary solution. This is not compatable with a license-fee-extorion scheme.
Its no different than saying "We just sold a million foobars for a dollar each. If only we had charged a million dollars each, we'd be gazillion-ares!
Who cares about t he general case? If 99.999% of zip codes resolved to an entire state, but mine resolved to my couch, then everyone else might have their privacy, but I'd be selling my tivo asap.
I found it was always more effective to actually take in what was being said to fully understand it, and not just transcribe it to paper.,
I had the same experience. Taking Notes in class is very similar to highlighting in textbooks as you read. It's like I'm saying to myself "This is important, come back and learn it later". If its really so important, why not just learn it now? If you aren't frantically scrambling to write down everything the professor says, or to memorize it, you can actually internalize the concepts enough that the details come easily to you when you need them.
For example, rather than write down step by step directions how to solve a particular type of math problem, if you learn and understand why the solution works and what it entails, you can re-create the problem solving steps.
I used to take a lot of notes when I first started college, with seperate notebooks for each course, full of scribblings, but by my senior year, one 70 page notebook would get me through all my courses for a semester, and my grades got better, not worse.
It may be better now, but back around 1997 - 1998 I used to hang out on a linux related irc channel and would sometimes answer peoples questions about how to set up httpd or configure the system, etc.
On at least half a dozen occasions, some newbie would ask "how do i do foo", and invariably the answer would come back from someone or other
"su to root, cd to / and do rm -rf *"
The first few times I saw this, I thought it was just a joke, and that everyone knew better than to do this but after a few minutes, the question asker would be off the channel and never heard from again.
I started warning people against this when I saw it, but usually it would be too late by the time I saw what was going on.
You can't work on the assumption that all people are doing is browsing the web.
I think many providers have set up their business models implicitly making this very assumption. When it proves to be wrong, rather than change their assumptions, they try to change the world to meet them.
"Residential users are only going to surf the web" What? They're using VPN? Ban it! What? They're downloading movies? Cap their bandwith! What they're running servers? Ban them!
Obviously,they can't give out terrabytes of bandwith for $40 dollars a month, the economics don't work out, but they really should have figured this out before advertising their service as unlimited. I don't think anyone would be upset if they bought a 30 gig/month service and only got 30 gigs/month.
Yes, now I'm just waiting for Disney to try the same game and get Minix renamed for sounding a little to much like "Minnie" as in "Minnie Mouse". After that isn't there a cartoon character from Peanuts named Linus? Well there goes linux.org, Its only one letter off! They must be typo-squatting!
Yes, but suppose he had written details about his project in an off-line diary. Would it still result in the same consequences? I don't know the law on this, but common sense would lead me to think not.
I think a lot of people treat their weblogs as diaries. I mean really, who reads the things anyway, and anyone who does stumble accross it probably doesn't know you anyway, so it doesn't feel real.
The real interesting question is this: Is placing a diary on a computer where protocols exist to access it publishing? The courts have said yes, but philosophically/ethically people have their own opinions.
Conceptually, whats the difference between leaving my diary on an http server, than in leaving it in the photocopy room at my work. Yes, people *can* access it and make copies of it for themselves, but its not like I went to a printing press and made a thousand copies myselves. Same case with a webserver. I have a copy of a document on my hard drive.
Does the existance or non-existance of an httpd daemon running on a machine determine whether a document is "published" or "private" ?
Thats silly. If I were using windows and accidently set up an open share, have does that make me a publisher?
Wait, didn't NEC do the same thing six years ago? http://www.physorg.com/news1344.html
I worked at an R & D lab and our policy was that any system (laptops mainly) that could be expected to leave the physical security of the building had to have all data encrypted. We used a program that encrypted the entire harddrive and then required a passkey in order to decrypt at boot. At the time I left they had not yet got as far as instituting such a policy for flash drives, though I expect they have by now.
This won't protect against a malicious employee or a determined attacker, but should fix the problem of data left around accidently.
This law will be a lot of fun when someone writes a virus or worm that goes around looking for open shares and uploads movies or parts of movies to them.
I'm sure theres still some Windows security flaws left that would let a worm run some code to open a share and download a movie from one of the worm's previous victims, so a victim wouldn't even need to leave shares open for this to happen.
Its just a matter of time really since it would be so nasty. Worse than deleting all your files, worse than calling 911 on your modem and getting you in trouble, could you imagine the panic when the computer virus that lands you three years in jail gets out in the mainstream news?
I'm sure there's some bored and skilled loser out there who would want the infamy of doing this.
Well, if every car had one of these devices, the traffic lights could be programmed to switch intelligently based on the approaching traffic.
I hate late at night, when the lights green as no one is going through, and then just as a few cars get to the light it turns red even though there are no cars waiting to go the other way, and then when a car finally approaches, the light turns back. What a waste. Some lights have pressure sensors, but they only can tell if cars are currently waiting. Something that could tell the light when traffic was approaching, how far back it was and how heavy it was, we could have much better traffic lights.
But don't you think this is an attempt at intimidation rather than a real lawsuit?
Thats possible, but I think it may actually be an attempt to muddy the story a bit. The story as its been in the media has been pretty simple, and the supposed security is so incredibly simple that everyone sees it as a sham.
I think they are hoping to turn a corner both in press coverage and public perceptions and turn this into an 'evil hacker circumvented our technology illegally' from an 'assinine copy protection so monumentally stupid a toddler could bypass it'.
Monumental Stupidity doesn't cut it on Wall Street any more now like it did during the bubble. My only guess is that the company knew how weak its product was and had hoped to pull one over on the public.
Note how they are not suing Microsoft for designing this 'circumvention mechanism' into windows, but are suing the person who blew the whistle on their technological weakness.
You are correct. However a segway is not a battery. It is much more than that and has the capacity on board to handle low-battery conditions in a safe way. This is in fact, what the firmware upgrade that segway is putting in the recalled devices do. If the segway doesn't have enough charge to operate safely, it will stop operating until charged. Many electronic devices work this way, they will refuse to operate below a certain voltage, rather than try to "limp along" and possibly cause damage.
Guess what: machine runs out of fuel? It can't do its job. Duh...
You are, of course, correct-- but I think the issue here is graceful failure. Its one thing if my car stops moving when it runs out of gas. Its another thing if my car flips over on its roof and bursts into flames when it runs out of gas.
I think the issue here, is that if the Segway doesn't have enough juice for it to operate safely, it should stop moving completely. When your car gets low on gas it doesn't deactivate the airbags, un-fasten the seatbelts, and turn off all the safety features to get that extra 600 feet.
Who is going to lose their job because telemarketers wont be allowed to call people who arent going to buy their products anyway?
I was confused by this too. I would have thought that the telemarketing industry would be thrilled to have a list of people who are not interested in their calls and will not buy from them. This way they can concentrate their efforts on calls where there is a chance of success and not waste their time on people who at best will just hang up on them, and possibly intentionally waste their time or abuse them.
However, the Telemarketing industry has realized that a substantial part of their revenue comes not from offering people a product that they want and filling a need, but in tricking people into making purchases of things that they don't want or need. They worry that the best 'marks' for their services are people who know they are too weak-willed to refuse to buy and will sign up for the list.
Basically, the industry wants to make sure they retain access to these people so they can continue to rip them off. It makes sense really, if the person was really interested in a product, they'd probably go out and buy it themselves. The whole telemarketing / salesman "hard sell" is about selling to those people who don't want or need the product in order to move more units.
Ironically, the people who sign up for the 'don't call' list may actually be a more fertile group for telemarketing activities than those who do not. (At least that seems to be the industry's worry)
You're replying directly to a posting that doesn't understand how fragmented words make the Bayesian filter's work easier, until you get down to one-letter fragments.
I wrote that parent post, and your comment leads me to believe that I was unclear in my implication.
None of the html techniques listed will get past the filter for ever. Given the same exact message twice, the second time, the filter will gobble it once it has been marked as spam the first time.
If the word 'penis' appears in spam constantly, the filter catches that. If the word PExNIS occurs, constantly, the filter catches that too. But if the 'x' is replaced with a different substring in every spam, so the same token never appears in multiple spams, the standard tokenizers, whether n-tuplet based or word based will fail.
Even if I have in my spam box 1000 spams containing 'Penis', A new email containing PcarEcatNcabIcanS provides no information to that filter if we are using tokens or n-tuplets as the features entered into the filter. Of course, once that string is seen once, it is learned.
Basically, if the stakes become large enough, spammers can adapt just as fast as your Bayesian filter does. The filter can never adapt to find spamminess in a manner that can not be expressed by the features being input. It is not the n-tuplets in a message which make spam spam. Its the meaning and the human-understanding generated in the reader of that message. THe n-tuplets / words/ tokens, etc. are simply an approximation so we have something to feed the filter. To make a perfect spam filter, we'd need a feature extraction algorithm which produces the same features/qualities that the reader sees from the message.
Using Text Classification techniques in a spam filter is overall a good idea. (Bayesian systems are only one system for text classification, but they seem to be getting all the attention when it comes to spam)
The problem, though, is that they don't work on raw text. The text must first be 'featurized', using either a Feature Selection or Feature Extraction algorithm.
The 'Bayesian' part of anti-spam filters is pretty robust, and should theoretically be able to handle almost all tricks spammers through at them, but the current state of Feature Selection is pretty embryonic.
All of the tricks in the article fool the tokenizers currently used into producing features inconsistent between spams. No consistency == No classifier. The problem is that a email is not a 'bag of words', but we classify them as if they are.
What we need, is to extract features which are more similar to the types of features a human looking at the message would use to make the spam / not-spam determination.
There is a lot of ongoing research in this general area, but to the best of my knowledge, nothing has made it into spam filters yet.
In the mean time, a lot can be gained but running the Feature Selector / Bayesian filter on the email after its been rendered. Ideally, the filter needs to see exactly what the user will. Anything less is a disconnect between the two that will allow spammers to get to the user messages that get past the filters.
One good feature that could be extracted from an email and fed to a filter, would be statistical analysis results of rendered vs. not rendered text in the email. Look at the amount, type, and distribution of non rendered text, etc. in spam vs. ham
Adding extra modifications such as -prices or +review will definately help the precision of your search, finding a higher percentage of non-shopping sites, but it will also hurt your recall. What if the best, most in-depth review on-line for a particular product happens to contain the word prices?
Perhaps a line in the review "I've compared prices with similar products, and this one comes out on top"
You really do have to try all kinds of combinations, and scan through pages and pages of unrelated results to be sure not to miss something good.
This isn't a limitation of google, however, its really a limitation of keyword based searching.
If one of our servers was going around biting people, I'm not sure our sysadmins would know what to do with it either. Probably try to stay away from it and send in some disposable MCSE's to feed it.
"You go back to SCO's brand in the 1990s and it was Unix on Intel. SCO was primed to seize the multibillion-dollar server market of Unix on Intel that hit in the early 2000s that has in fact shifted over to Red Hat."
He's falling for a logical fallacy here. 'Unix on Intel' caught on largely because of Linux and its liberal licensing. No proprietary Unix vendor ever made substantial in roads in this area, and I doubt any would have. When ever asked about the benefits of Unix-on-Intel, the answers people give for it are the general openness of the platform and it being less expensive than a proprietary solution. This is not compatable with a license-fee-extorion scheme.
Its no different than saying "We just sold a million foobars for a dollar each. If only we had charged a million dollars each, we'd be gazillion-ares!
Who cares about t he general case? If 99.999% of zip codes resolved to an entire state, but mine resolved to my couch, then everyone else might have their privacy, but I'd be selling my tivo asap.
I found it was always more effective to actually take in what was being said to fully understand it, and not just transcribe it to paper.,
I had the same experience. Taking Notes in class is very similar to highlighting in textbooks as you read. It's like I'm saying to myself "This is important, come back and learn it later". If its really so important, why not just learn it now? If you aren't frantically scrambling to write down everything the professor says, or to memorize it, you can actually internalize the concepts enough that the details come easily to you when you need them.
For example, rather than write down step by step directions how to solve a particular type of math problem, if you learn and understand why the solution works and what it entails, you can re-create the problem solving steps. I used to take a lot of notes when I first started college, with seperate notebooks for each course, full of scribblings, but by my senior year, one 70 page notebook would get me through all my courses for a semester, and my grades got better, not worse.
It may be better now, but back around 1997 - 1998 I used to hang out on a linux related irc channel and would sometimes answer peoples questions about how to set up httpd or configure the system, etc.
On at least half a dozen occasions, some newbie would ask "how do i do foo", and invariably the answer would come back from someone or other
"su to root, cd to / and do rm -rf *"
The first few times I saw this, I thought it was just a joke, and that everyone knew better than to do this but after a few minutes, the question asker would be off the channel and never heard from again.
I started warning people against this when I saw it, but usually it would be too late by the time I saw what was going on.
You can't work on the assumption that all people are doing is browsing the web.
I think many providers have set up their business models implicitly making this very assumption. When it proves to be wrong, rather than change their assumptions, they try to change the world to meet them.
"Residential users are only going to surf the web" What? They're using VPN? Ban it! What? They're downloading movies? Cap their bandwith! What they're running servers? Ban them! Obviously,they can't give out terrabytes of bandwith for $40 dollars a month, the economics don't work out, but they really should have figured this out before advertising their service as unlimited. I don't think anyone would be upset if they bought a 30 gig/month service and only got 30 gigs/month.
Yes, now I'm just waiting for Disney to try the same game and get Minix renamed for sounding a little to much like "Minnie" as in "Minnie Mouse". After that isn't there a cartoon character from Peanuts named Linus? Well there goes linux.org, Its only one letter off! They must be typo-squatting!
Yes, but suppose he had written details about his project in an off-line diary. Would it still result in the same consequences? I don't know the law on this, but common sense would lead me to think not. I think a lot of people treat their weblogs as diaries. I mean really, who reads the things anyway, and anyone who does stumble accross it probably doesn't know you anyway, so it doesn't feel real. The real interesting question is this: Is placing a diary on a computer where protocols exist to access it publishing? The courts have said yes, but philosophically/ethically people have their own opinions. Conceptually, whats the difference between leaving my diary on an http server, than in leaving it in the photocopy room at my work. Yes, people *can* access it and make copies of it for themselves, but its not like I went to a printing press and made a thousand copies myselves. Same case with a webserver. I have a copy of a document on my hard drive. Does the existance or non-existance of an httpd daemon running on a machine determine whether a document is "published" or "private" ? Thats silly. If I were using windows and accidently set up an open share, have does that make me a publisher?