The patent covers any method at all like Paul Grahm's method. He's discusses the patent here.
The patent claims boil down to using a probabilistic classifier to recognize spam. There are many claims, but they're mostly trivial elaborations. Probabilistic classifiers aren't new, and there's no claim they invented them. And it doesn't look like they had to solve any real technical hurdles to apply it. It's one of the most egregiously obvious patents I've seen in a while.
I say there's only one way to test whether an idea is obvious to people skilled in the field, and that's to pose the problem to people skilled in the field and see if they can find the solution. Anything less is a joke.
Not to diss Horvitz and Heckerman -- they're big names in Bayesian inference and Bayes nets. They've been behind a bunch of solid research.
I did exactly that. The techs couldn't get it to see either the network card I had in it (which worked fine under linux) or the network card they brought. I didn't care, so eventually they gave up and recorded their work as a "self install".
According to the symantec info, the worm code isn't in the broadcast emails. The emails just contain the social-engineering text and a link, and the victims download the code themselves.
The link is http://www.friendgreetings.com/pickup/pickup.aspx
But guess what? The page isn't available, and the machine doesn't answer pings. Either someone has DOS'd them, or they've been slashdotted by their own worms.
If you stretch a torus in the right direction, you get something that looks alot like a section of pipe. I don't know far you can stretch it and still have a practical bottle.
If someone can throw windows up on your X server, they can do worse than that. They can grab a screenshot (with xwd -root) or
sniff keystrokes with xkey or xspy. Nothing shows up on your screen at all.
Anyone running with xhost access control is asking for trouble. If you're security conscious, tunnel your X session over ssh.
It's not 1980's technology, it's 1860's technology. Thomas Edison's first patent was for an "Electronic Vote Recorder" system to do this. Congress rejected it, but most state legislatures have been using something like this for a century.
I suspect that Congress likes the current system. Voice votes let members hide their votes, and roll-call voting gives them more opportunities to play games.
According to Viant, a Boston-based market-research firm, 400,000 to 600,000 films are illegally downloaded from the Internet each day.
How many broadband users are there worldwide? I believe I've heard numbers around 10 million. Does the typical broadband user download a full-length movie every 3 weeks? Or are most movie downloads in a very-low-quality format that is plausible to download over at 56K?
As far as I know, there isn't any deity in any
religion named "God
Using it like a proper name, capitalized and with no articles, implies that it is a reference to the god of a monotheistic religion. Nobody uses it this way in contemporary American english unless they accept -- if only for purposes of argument -- the existence of the judeo-christian God.
When it's used as a common noun, e.g. "Athena is a god of the ancient greek religion", you're right.
Having a good language model is critical to this approach. With no language model, it's basically hunt-and-peck on a moving, one-dimensional keyboard (about 4.7 bits/character). With a good, but practical, language model, you'll need to choose among 5-7 likely options (2.4-2.8 bits/character). If you really get it right -- i.e. if the computer knows the language as well as a typical college grad -- there will be only two or three reasonably likely options on average.
I find that it works much faster if, before I start, I tell it to add training files (File menu). I've been using/usr/dict/words.
I find it completely intuitive. But then, I had this same idea six or eight years ago. I wondered how well it worked. I'm glad someone coded it so I can try it.
Whether or not you use this much separate hardware, I strongly suggest you plan for the day when you don't work for your current employer(s). Draw clear lines between your data and the company's data. If you're going to keep personal data at work, make sure you label it in a way that will make it clear to someone else.
Yeah, wasting an exorbitant amount of tax dollars, sure. Like the internet.
Yes, DARPA had one really great hit -- about 34 years ago.
Be cynical as you want, but DARPA is the one government agency which is really flexible and
has a vision. With the rise of corporate dependency on innovation, even in the academic
world, DARPA is one of the last bastions of basic research.
I can be awfully cynical about DARPA. My former employer's bread and butter was DARPA research. Which is to say that our primary products were proposals and billable hours. Many of those billable hours were spent documenting our activities -- presentations, review meetings, progress reports, final reports. Sometimes we had time for actual research, the direction of which changed with the whims of the DARPA program manager and was at best loosely correlated with the work proposed in the proposal. I'm not accusing my former employer of wrongdoing; that's the flavor of pointy hair induced by DARPA policies.
By the way, DARPA doesn't do basic research. In basic research, most of which is still done in universities, you give lip service to vague area of applications, but the real goal is understanding. DARPA's research goals are always applied -- i.e. the goal is always to produce something useful, not simply to understand the world. But it's "early R&D", farther from being applicable than most R&D, and too much of a long shot for most R&D organizations. The rule of thumb is that if nobody else in the Dept. of Defense thinks they know how to solve the problem, DARPA works on it. (This translator work seems to be an exception).
So most of DARPA's work is in the gap between basic research typical R&D. Ideas seem to get stuck in this gap for decades, which is why DARPA was created. But there's been too much pressure for short-term results for too long, so the agency is badly broken.
In the DARPA world, powerpoint isn't just for executive types. It's the primary mode of communication. For contractors, the quarterly PPT presentations are the most important contract deliverable. They're second only to proposals in their importance to the contracting firms. (Proposals get you the contract, presentations help you keep them.) Research results are tertiary.
The Miller article's analysis of the first sale doctrine is that it applies, and should continue to apply, to particular copies embodied in physical objects -- i.e. the physical books and CDs that a library lends. They also say first sale doesn't apply in the purely digital realm.
First sale doctrine is what lets libraries operate today, and it sounds to me like they'd say libraries could continue to lend the physical embodiments as they always have.
But if the physical embodiment has a form which can be copied easily, then what? If I burn a copy of a library CD, hasn't a distribution happened? And, if the library is open to the general public, isn't it a public distribution? I'd say a distribution happened either when I made the copy or when I returned the original to the library, but I'm not sure whether it was a public distribution made by the library or a private distribution made by me.
I like the general idea that making the copy is no longer the place to draw the line -- I've had the same idea myself. And distribution is the only replacement I can come up with. But I'm not sure I can see what the implications would be.
Oregon charges a fee to be have your number on the no-call list. I've always found this outrageous, since "not being harassed" isn't my idea of a service I should be charged for. But it's been this way since at least 1996.
OK, so it actualy prints "hello polyglots", but still, all it does is output a fixed string. Quelin's program does an actual computation, admittedly a relatively minor one.
Moreover, all of polyglot's languages are languages that people have actually used to write real code. I'm not saying they are all reasonable languages, but one can at least semi-plausibly claim that they were written to be useful. Befunge and BrainF*ck are both toy languages written expressly to be perverse in some way (Befunge to be uncompilable, and BrainF*ck to be absurdly minimalist.)
That said, I was disappointed at how separate the languages' code blocks were in Quilen's program. C and perl share most of the same code, but there are three completely separate code blocks and the work is mostly in getting each language to ignore the others' code. It's probably the only way it can be done, but it's really a short quadralingual wrapper around three separate programs, one of which is bilingual.
The lights could have been reprogrammed so to show everybody green lights all the time. Or to act almost normally, but to occasionally turn more lights green than their should be.
Or maybe that wouldn't have been possible -- it's not clear from the article whether the computers controlled the signals in detail or just sent sync signals to otherwise autonomous lights.
The article's hypothetical drive is 120TB -- 400 times as much space as you complain about filling up. I know video editing takes a lot of space, but would you keep 7 years of video? A movie a day for your whole life? Or 30,000 copies of the one you're working on? I doubt it.
I used to think that my files would expand to fill all available space, but not anymore. Different tasks, different tools, and different personalities mean different thresholds, but I think everyone has a threshold above which they won't keep their disks full. For me, my disks stopped being full around 10 gigs. My wife's antique PC (running msdos 2.0 and used strictly as a text editor) had a 15 meg drive and never went above 10% full. Obviously your threshold is higher, but I'm sure it exists.
Even after they hit their thresholds, most people's use will grow over time, but slowly. We'll also start writing things in ways that don't try to conserve disk space. Compression will be used almost exclusively for data transmission. Future filesystems will probably keep every version of every file ever saved. (Hopefully with an option to delete the occasional residue of an indiscretion and accidental copy of/dev/hda). But even these things won't increase our use by us more than a factor of 10 or so. If we really do get 120TB drives, we won't talk about buying new ones very often.
The only encryption that is unbreakable is one that cannot be decrypted. You can just do a brute force attack, applying every type of decryption technique with every key to the data until it is decrypted.
In principle it is possible to do a brute force attack like that, and it will produce a correct decryption. But it will also produce many incorrect outputs, and it will give you no information about which output is the correct one. So you still won't know the plaintext for the message.
If you restrict to trying one-time pad, the output will be every string of bits with the same length as the input. I think it's fair to say that by generating this list you haven't decrypted the message, since you could generate the same string of candidate outputs by exhaustive enumeration, without referring to the cyphertext at all.
algorithm example: long division
on
Deep Algorithms?
·
· Score: 2, Informative
An algorithm is a process for doing something. Perhaps the best example for non-programmers is long division, the way you learned to divide multi-digit numbers with pencil and paper. The basic mathematical definition of division is that it's the inverse of multiplication, which is a fine definition of an operation but doesn't tell you how to divide two numbers. Long division is a procedure which gives you the answer. Other procedures are possible.
This may sound like I'm describing a program, or a piece of code. A piece of code can implement an algorithm, but many different pieces of code can implement the same algorithm. An algorithm has a specific mathematical context in which it works -- e.g. the dividend is not zero. The piece of code has specific to a computational context -- written in C, divisor is in variable x, dividend is in variable y, quotient ends up in z, all of which are single precision floating point.
What's a deep algorithm? That's subjective, but I'd say it has to either be non-obvious or become obvious only after you learn a nontrivial piece of theory. There's probably an aesthetic component as well.
Try dxpc, the differential X protocol compressor. It removes much of the redundancy from X protocol network traffic. The early versions
I used years ago made the difference between "almost useable" and "not snappy, but fine".
I'm not saying the dxpc approach is optimal, or good design. I don't know much about X event streams, but it does seem like higher-level operations are appropriate. The thing that X gets right is the assumption that all transactions will be over a network.
Whether or not it protects people from anthrax, I say it's a good idea. We know that mail to elected officials isn't really read -- at most it's skimmed to determine which form letter to reply with. Refusing to accept it in the first place is more honest than what they've been doing, so I'm for it.
No publisher could afford to make the investment in printing something that was quickly copied by everybody else and sold for a fraction of the cost. The end result was an increase in the number of tabloid rags at the cost of real literature...
This is an excellent argument for copyright in the era of printing presses and manual typesetting. Then, publishers needed to recoup the large fixed cost of typesetting, and they couldn't do it on a small number of copies. Typesetting was the biggest bottleneck, and the whole industry was organized around it.
Today, for digital distribution, setup and distribution costs aren't much of an obstacle, and people can do it all for themselves. So what makes you think the comparison is valid?
That wasn't a genetically engineered cashew, mind you...it wasn't even a salted one! So your argument is, you might have
to be wary of new foods...my reply is, you need to be wary now!
He wasn't talking about eating new foods. He was talking about old foods which have been changed.
Put yourself in your cousin's position -- you know you're allergic to cashews, but you eat wheat bread daily with no trouble. Then people start making varieties of wheat with genes from cashews which express the protein or whatever that you're allergic to.
If all the wheat on the market is these varieties, wheat just moved to the "don't eat" column. Yeah, it might only take one bad experience for you to learn. But you've been forced to make a major change in your diet.
Or say only 1%, or 0.1%, of the wheat has these genes. Now you either have to avoid wheat bread, even though almost all of it is OK, or you gamble every time you eat it. Unless, of course, the information is on the label, in which case it's only an inconvenience.
The flip side, though, is that we could probably make varieties of cashews without those key genes, and you could eat them. But the market probably isn't big enough to be worth it.
The patent claims boil down to using a probabilistic classifier to recognize spam. There are many claims, but they're mostly trivial elaborations. Probabilistic classifiers aren't new, and there's no claim they invented them. And it doesn't look like they had to solve any real technical hurdles to apply it. It's one of the most egregiously obvious patents I've seen in a while.
I say there's only one way to test whether an idea is obvious to people skilled in the field, and that's to pose the problem to people skilled in the field and see if they can find the solution. Anything less is a joke.
Not to diss Horvitz and Heckerman -- they're big names in Bayesian inference and Bayes nets. They've been behind a bunch of solid research.
I did exactly that. The techs couldn't get it to see either the network card I had in it (which worked fine under linux) or the network card they brought. I didn't care, so eventually they gave up and recorded their work as a "self install".
But guess what? The page isn't available, and the machine doesn't answer pings. Either someone has DOS'd them, or they've been slashdotted by their own worms.
If you stretch a torus in the right direction, you get something that looks alot like a section of pipe. I don't know far you can stretch it and still have a practical bottle.
Closing the ends would be another problem.
Anyone running with xhost access control is asking for trouble. If you're security conscious, tunnel your X session over ssh.
It's not 1980's technology, it's 1860's technology. Thomas Edison's first patent was for an "Electronic Vote Recorder" system to do this. Congress rejected it, but most state legislatures have been using something like this for a century.
I suspect that Congress likes the current system. Voice votes let members hide their votes, and roll-call voting gives them more opportunities to play games.
Congratulations, you've just reinvented the great firewall of china.
According to Viant, a Boston-based market-research firm, 400,000 to 600,000 films are illegally downloaded from the Internet each day.
How many broadband users are there worldwide? I believe I've heard numbers around 10 million. Does the typical broadband user download a full-length movie every 3 weeks? Or are most movie downloads in a very-low-quality format that is plausible to download over at 56K?
I suspect BS.
Remember Transformers, and other cartoons of that era? They were essentially program-length commercials for toys.
Using it like a proper name, capitalized and with no articles, implies that it is a reference to the god of a monotheistic religion. Nobody uses it this way in contemporary American english unless they accept -- if only for purposes of argument -- the existence of the judeo-christian God.
When it's used as a common noun, e.g. "Athena is a god of the ancient greek religion", you're right.
I find that it works much faster if, before I start, I tell it to add training files (File menu). I've been using
I find it completely intuitive. But then, I had this same idea six or eight years ago. I wondered how well it worked. I'm glad someone coded it so I can try it.
Whether or not you use this much separate hardware, I strongly suggest you plan for the day when you don't work for your current employer(s). Draw clear lines between your data and the company's data. If you're going to keep personal data at work, make sure you label it in a way that will make it clear to someone else.
Yes, DARPA had one really great hit -- about 34 years ago.
Be cynical as you want, but DARPA is the one government agency which is really flexible and has a vision. With the rise of corporate dependency on innovation, even in the academic world, DARPA is one of the last bastions of basic research.
I can be awfully cynical about DARPA. My former employer's bread and butter was DARPA research. Which is to say that our primary products were proposals and billable hours. Many of those billable hours were spent documenting our activities -- presentations, review meetings, progress reports, final reports. Sometimes we had time for actual research, the direction of which changed with the whims of the DARPA program manager and was at best loosely correlated with the work proposed in the proposal. I'm not accusing my former employer of wrongdoing; that's the flavor of pointy hair induced by DARPA policies.
By the way, DARPA doesn't do basic research. In basic research, most of which is still done in universities, you give lip service to vague area of applications, but the real goal is understanding. DARPA's research goals are always applied -- i.e. the goal is always to produce something useful, not simply to understand the world. But it's "early R&D", farther from being applicable than most R&D, and too much of a long shot for most R&D organizations. The rule of thumb is that if nobody else in the Dept. of Defense thinks they know how to solve the problem, DARPA works on it. (This translator work seems to be an exception).
So most of DARPA's work is in the gap between basic research typical R&D. Ideas seem to get stuck in this gap for decades, which is why DARPA was created. But there's been too much pressure for short-term results for too long, so the agency is badly broken.
In the DARPA world, powerpoint isn't just for executive types. It's the primary mode of communication. For contractors, the quarterly PPT presentations are the most important contract deliverable. They're second only to proposals in their importance to the contracting firms. (Proposals get you the contract, presentations help you keep them.) Research results are tertiary.
First sale doctrine is what lets libraries operate today, and it sounds to me like they'd say libraries could continue to lend the physical embodiments as they always have.
But if the physical embodiment has a form which can be copied easily, then what? If I burn a copy of a library CD, hasn't a distribution happened? And, if the library is open to the general public, isn't it a public distribution? I'd say a distribution happened either when I made the copy or when I returned the original to the library, but I'm not sure whether it was a public distribution made by the library or a private distribution made by me.
I like the general idea that making the copy is no longer the place to draw the line -- I've had the same idea myself. And distribution is the only replacement I can come up with. But I'm not sure I can see what the implications would be.
From the FAQ:
How much does it cost to place my telephone number(s) on the "No Call" List?
It costs $6.50 for each telephone number per year. Annual renewals are $3.00 for each telephone number per year.
OK, so it actualy prints "hello polyglots", but still, all it does is output a fixed string. Quelin's program does an actual computation, admittedly a relatively minor one.
Moreover, all of polyglot's languages are languages that people have actually used to write real code. I'm not saying they are all reasonable languages, but one can at least semi-plausibly claim that they were written to be useful. Befunge and BrainF*ck are both toy languages written expressly to be perverse in some way (Befunge to be uncompilable, and BrainF*ck to be absurdly minimalist.)
That said, I was disappointed at how separate the languages' code blocks were in Quilen's program. C and perl share most of the same code, but there are three completely separate code blocks and the work is mostly in getting each language to ignore the others' code. It's probably the only way it can be done, but it's really a short quadralingual wrapper around three separate programs, one of which is bilingual.
The lights could have been reprogrammed so to show everybody green lights all the time. Or to act almost normally, but to occasionally turn more lights green than their should be.
Or maybe that wouldn't have been possible -- it's not clear from the article whether the computers controlled the signals in detail or just sent sync signals to otherwise autonomous lights.
I used to think that my files would expand to fill all available space, but not anymore. Different tasks, different tools, and different personalities mean different thresholds, but I think everyone has a threshold above which they won't keep their disks full. For me, my disks stopped being full around 10 gigs. My wife's antique PC (running msdos 2.0 and used strictly as a text editor) had a 15 meg drive and never went above 10% full. Obviously your threshold is higher, but I'm sure it exists.
Even after they hit their thresholds, most people's use will grow over time, but slowly. We'll also start writing things in ways that don't try to conserve disk space. Compression will be used almost exclusively for data transmission. Future filesystems will probably keep every version of every file ever saved. (Hopefully with an option to delete the occasional residue of an indiscretion and accidental copy of /dev/hda). But even these things won't increase our use by us more than a factor of 10 or so. If we really do get 120TB drives, we won't talk about buying new ones very often.
In principle it is possible to do a brute force attack like that, and it will produce a correct decryption. But it will also produce many incorrect outputs, and it will give you no information about which output is the correct one. So you still won't know the plaintext for the message.
If you restrict to trying one-time pad, the output will be every string of bits with the same length as the input. I think it's fair to say that by generating this list you haven't decrypted the message, since you could generate the same string of candidate outputs by exhaustive enumeration, without referring to the cyphertext at all.
An algorithm is a process for doing something. Perhaps the best example for non-programmers is long division, the way you learned to divide multi-digit numbers with pencil and paper. The basic mathematical definition of division is that it's the inverse of multiplication, which is a fine definition of an operation but doesn't tell you how to divide two numbers. Long division is a procedure which gives you the answer. Other procedures are possible.
This may sound like I'm describing a program, or a piece of code. A piece of code can implement an algorithm, but many different pieces of code can implement the same algorithm. An algorithm has a specific mathematical context in which it works -- e.g. the dividend is not zero. The piece of code has specific to a computational context -- written in C, divisor is in variable x, dividend is in variable y, quotient ends up in z, all of which are single precision floating point.
What's a deep algorithm? That's subjective, but I'd say it has to either be non-obvious or become obvious only after you learn a nontrivial piece of theory. There's probably an aesthetic component as well.
http://www.vigor.nu/dxpc/
I'm not saying the dxpc approach is optimal, or good design. I don't know much about X event streams, but it does seem like higher-level operations are appropriate. The thing that X gets right is the assumption that all transactions will be over a network.
Whether or not it protects people from anthrax, I say it's a good idea. We know that mail to elected officials isn't really read -- at most it's skimmed to determine which form letter to reply with. Refusing to accept it in the first place is more honest than what they've been doing, so I'm for it.
This is an excellent argument for copyright in the era of printing presses and manual typesetting. Then, publishers needed to recoup the large fixed cost of typesetting, and they couldn't do it on a small number of copies. Typesetting was the biggest bottleneck, and the whole industry was organized around it.
Today, for digital distribution, setup and distribution costs aren't much of an obstacle, and people can do it all for themselves. So what makes you think the comparison is valid?
He wasn't talking about eating new foods. He was talking about old foods which have been changed.
Put yourself in your cousin's position -- you know you're allergic to cashews, but you eat wheat bread daily with no trouble. Then people start making varieties of wheat with genes from cashews which express the protein or whatever that you're allergic to.
If all the wheat on the market is these varieties, wheat just moved to the "don't eat" column. Yeah, it might only take one bad experience for you to learn. But you've been forced to make a major change in your diet.
Or say only 1%, or 0.1%, of the wheat has these genes. Now you either have to avoid wheat bread, even though almost all of it is OK, or you gamble every time you eat it. Unless, of course, the information is on the label, in which case it's only an inconvenience.
The flip side, though, is that we could probably make varieties of cashews without those key genes, and you could eat them. But the market probably isn't big enough to be worth it.