Interesting work, but I notice that you are only examining trigrams, and you are using an even weight factor. To improve selection you probably at least need to use variable weights (a fuzzy logic neural network rather than binary logic) and train the network with more sample spam.
They aren't trying to answer the question "should this particular piece of e-mail be considered spam," but rather "is this particular piece of mail identical (to within some factor) to one that some human considers spam." So they don't need to train anything, they just store the hash-signatures of the spam that is currently making the rounds.
Even if someone mistakenly identifies a piece of mail as spam, it won't hurt anything; the odds are very low that it will ever match another piece of mail in the entire history of the cosmos.
I don't see why everyone things that devices that act like parasites would be cool to wear. If they are driven by the wind going by as you run, they will make it harder to run. If they are powered by the heat of your body, they will make it harder for you body to regulate its temperature (unless you live somewhere real cold). If they are powered by motion, they will make it harder to move. In short, you will have to work harder when you wear them, just as if they had a hand cranked generator.
I fail to see why this doesn't sound like a royal pain in the end.
The language biosystem is overpopulated, and mindshare starvation can be fatal to a new tool.
I couldn't disagree more. Mindshare may be needed for products but too much mindshare can kill ideas faster than anything. Memes that "everybody knows" are essentially dead. The language biosphere only seems overpopulated if you're used to living in the BASIC* & C* wasteland we had a few years back.
To bring an academic success to commercial fruition requires one, as Olin Shivers puts it, "to become Larry Wall for a year" - to take care of all the gritty implementation details...
First John Malkovich, then Andrew Plotkin, now this. Aren't things getting a little out of hand?
If you want to experience the Turing tarpit (where anything is possible, but nothing is easy enough to actually do) firsthand, try the Brainfuck [muppetlabs.com] language, based closely on the turing machine. the language has 8 instructions, and only one of them (input) has any arguments beyond an implicit current location. The compiler is 240 bytes!
There's an x86 compiler for bf at ~170 bytes, but isn't the smallest bf compiler written in bf well over a gigabytes?
Bf is a lot of fun, but not light in the sense of perl or scheme. Since the article didn't define light language, I'll give it a shot. Looking at their choice of languages, light appears to mean: easy to program, easy to understand, but powerful, interpreted languages. Bf is none of the above. About all you can say for it is Turing-complete!
No, it also has the advantage that, if some poor sap goes to all the trouble to write a deamon in it, you get to smile and say "bfd!"
-- MarkusQ
P.S. It's also occasionally useful to drive a spike in "you can't do Y in language X" debates that have gotten out of hand.
Also, although its been probably written 20 times by the time I write this, Asimov is often credited with inventing the term "robot".
If so, he is credited incorrectly. For the term "robot," try Lem instead. Asimov is known for "the three laws of robotics" which, IIRC, were actually devised by Campbell
There was never the slightest chance of small-scale startups having a go...So I declined to participate: I switched to the Phone Cooperative [phonecoop.org.uk]: I save big money off my BT bill, and all the profits go back into the co-op to make it better for us, the members.
Looks like you already found your small-scale startup, even though you still don't believe it exists. So I stand by my post; evolution isn't something you can opt out of, even if you're BT. Or MS.
So much for the magic wand of privatisation, curing all those horrible nationalised industries.
Not at all. Just give it time. The process is supposed to work something like this:
1) The big state-run monopolies go private
2) They are just as bad as they ever were, except now
3a) They can't just drain the treasury for more money and
3b) Startups can compete with them. This leads to
4) Small scale startups get a foot in the door
5) The old state-run idiots finally go under, and
6) The service they used to provide now comes from a real company.
The DMCA aims not only to protect companies who use crappy encryption
I think you may have hit upon a key step in fighting the DMCA: we need to point out that, stripped of all the falderal it is intended to let manufacturers pass shoddy goods off on us poor consumers.
If only some brave defender of the consumer/voter/masses would come forward to defend us from these cads (say, leading up to an election)...I'll bet the press would love it.
Remember, lobyists may give money, but they can be sold down the river in a heart beat if someone comes along offering votes.
You're missing the idea that multiple levels of classification here... take a look at my example for the detail on deciding about a manufacturer or body-type as the base class.
You can have as many levels as you wish, and even store things that don't fall into nice "levels." Just use the OODB to store objects and then have members in the objects that refer to other objects (not enumerations) for your classifications. Thus you don't need an instance of SUV for each manufacturer that makes one...because SUV is a (single) object. In the same way, Datsun and Delorean etc. are all objects.
The class of a particular instance of make (say, Model-T or Bug) wouldn't be a manufactureror a body-type, it would be the class make. And, like all instances of make, it would have (as members) both a manufacturerand a body-type.
The problem isn't with using an OODB, but with using an enumeration (or a collection of strings) when what you want is an object.
Because each car also has a body-type (compact, sedan, SUV, truck, van, etc...) - which in a relational database would simple by another lookup table, but in an OODBMS poses data management issues.
The OO way to answer this is that body-type is a class and compact, sedan, SUV, etc... are instances of it. Each car would have some instance of body-type as a member. I've implemented this sort of thing in a roll-your-own OODB (in Ruby) and in a OODB-on-SQL (in Delphi); in both cases it was painless. The only thing that is remotely tricky is to avoid infinite loops in your low-level serialization code, by doing lazy streaming or by having a serialization flag, or stack, etc. just in case some later person creates a body-type (e.g. batmobile) that somehow refers back to one or more instance of car.
Sorry you got modded down for being right. If I had real points I give you one, but at present the only mod points I have are the these fake ones I mint for my own private mod system.
Moderators, you might want to mark down messages like the parent that miss the point entirely.
Now back to the point, which is discussing how we can write scripts to be run on environments with bourne shell only when we're developing on linux, which doesn't have a fully compatible bourne shell.
I beg to differ. Before you recommend that someone be modded down, you might want to actually read the article. If you do so, it's clear that the_code_poet isn't developing on linux; he/she is wanting to run scripts developed under sh/*NX under bash/linux. Thus my point.
I think that there is some confusion here. The_code_poet clearly asked about compatibility going from sh to bash (i.e., he wants to write standard sh scripts and have them work on linux, and therefore bash). But the incompatibilities everyone is discussing (e.g. initialized export syntax) are all going the other way; things that you can write under bash that won't work under sh. So what?
The things that matter are first, the things sh has that bash does not (from the FAQ):
* uses variable SHACCT to do shell accounting
* includes `stop' builtin (bash can use alias stop='kill -s STOP')
* `newgrp' builtin
* turns on job control if called as `jsh'
* $TIMEOUT (like bash $TMOUT)
* `^' is a synonym for `|'
* new SVR4.2 sh builtins: mldmode, priv
...and the implementation differences:
* redirection to/from compound commands causes sh to create a subshell
* bash does not allow unbalanced quotes; sh silently inserts them at EOF
* bash does not mess with signal 11
* sh sets (euid, egid) to (uid, gid) if -p not supplied and uid < 100
* bash splits only the results of expansions on IFS, using POSIX.2 field splitting rules; sh splits all words on IFS
* sh does not allow MAILCHECK to be unset (?)
* sh does not allow traps on SIGALRM or SIGCHLD
* bash allows multiple option arguments when invoked (e.g. -x -v);
sh allows only a single option
argument (`sh -x -v' attempts
to open a file named `-v', and, on SunOS 4.1.4, dumps core.
On Solaris 2.4 and earlier versions, sh goes into an infinite
loop.)
* sh exits a script if any builtin fails; bash exits only if one of
the POSIX.2 `special' builtins fails
None of these seem to me to be show-stoppers if you are writing the script from scratch (or even porting a reasonably written one)--I mean really, are you counting on it to dump core if you use multiple option arguments? Is there some reason you can't ballane your quotes? So my question to the_code_poet is, what exactly are you trying to do in sh that won't work in bash?
But precision isn't too important in the real world. You can solve any NP-complete problem in polynomial time, if you define success as being off the optimal solution by at most a factor of two.
I'm sure you solve the halting problem for any application within 4 standard deviations without waiting too long.
While I agree with most of your post, I have to point out that coming within a factor of two is not very impressive for the halting problem. Since a given program on a given input will in fact either halt in finite time or it won't, your statement boils down to the assertion that it is possible to say either the word "true" or the word "false" (at random) in polynomial time. If you do this, you will either be right-within-a-factor-of-two (what most people would call wrong) or you will be exactly right. In the industry, this algorithm is called "guessing" and can be proven to be within a factor of two of correct on all binary choices.
Since an observatory is only looking at a small percentage of neutrinos on a relatively thin path between here and sun's core, I don't think you could establish any day/night difference, even with years of observations.
To quote a teacher I once had, I would agree with you if not for the fact that you're wrong. The day/night neutino detection cycle isn't a theoretic effect they are trying to observe, it's an experimental effect they are trying to explain. The present best explanation is that they switch between two families (muon & tau) with different masses.
So the assertion (or hypothosis) is that the amount of neutrinos emitted from the sun's core is different during night than day??
No, the same number are emitted, but if they have to travel through the bulk of the earth before reaching the detector, it will effect how many you detect. That's true of photons too (you see a lot more of them durring the day, even though the sun emits at a ~constant rate), but here it is even more interesting; the neutrinos aren't being absorbed by the earth, they are being converted between two forms, one of which is easier for a particular detector to detect. So you can wind up detecting more at night!
Your scenario is an excellent outcome. If it is anything like "global warming", you have scientists representing "special interests" presenting sensationalist and logically flawed ideas, and politicians ignoring the silliness and listening instead to better-informed scientists.
I can not understand why so many people continue to be taken in by the global warming/cooling scam. My only supposition is that, faced with the realization that man has essentially no impact on the universe at large, it become vitally important for them to believe that at least they are having a major impact on the earth. The fact that we are still minor players in the biosphere (there are, for example, well over a metric ton of termites per human, and termites are minor players among the insects. The insects, in turn, are dwarfed by the plant kingdom,...) is evidently so scary to them that they simply can't accept it.
*laugh* Good thing it wasn't the Basterdino's super-sym partner (the Basterdon). Last I heard, it was suspected to mass about as much as a Mastodon (within a factor of Pi times some magic number).
They aren't trying to answer the question "should this particular piece of e-mail be considered spam," but rather "is this particular piece of mail identical (to within some factor) to one that some human considers spam." So they don't need to train anything, they just store the hash-signatures of the spam that is currently making the rounds.
Even if someone mistakenly identifies a piece of mail as spam, it won't hurt anything; the odds are very low that it will ever match another piece of mail in the entire history of the cosmos.
-- MarkusQ
"for you body" --> "for your body"
"real cold" --> "realy cold"
And perhaps others. *sigh*
Typing with my eyes closed...
--MarkusQ
I fail to see why this doesn't sound like a royal pain in the end.
-- MarkusQ
I couldn't disagree more. Mindshare may be needed for products but too much mindshare can kill ideas faster than anything. Memes that "everybody knows" are essentially dead. The language biosphere only seems overpopulated if you're used to living in the BASIC* & C* wasteland we had a few years back.
-- MarkusQ
First John Malkovich, then Andrew Plotkin, now this. Aren't things getting a little out of hand?
-- MarkusQ
There's an x86 compiler for bf at ~170 bytes, but isn't the smallest bf compiler written in bf well over a gigabytes?
-- MarkusQ
No, it also has the advantage that, if some poor sap goes to all the trouble to write a deamon in it, you get to smile and say "bfd!"
-- MarkusQ
P.S. It's also occasionally useful to drive a spike in "you can't do Y in language X" debates that have gotten out of hand.
Thanks for catching it.
-- MarkusQ
If so, he is credited incorrectly. For the term "robot," try Lem instead. Asimov is known for "the three laws of robotics" which, IIRC, were actually devised by Campbell
-- MarkusQ
Looks like you already found your small-scale startup, even though you still don't believe it exists. So I stand by my post; evolution isn't something you can opt out of, even if you're BT. Or MS.
-- MarkusQ
Is this a result of the GPL?
*laugh* Cute. If I had real mod points I'd give you one.
-- MarkusQ
Not at all. Just give it time. The process is supposed to work something like this:
1) The big state-run monopolies go private
2) They are just as bad as they ever were, except now
3a) They can't just drain the treasury for more money and
3b) Startups can compete with them. This leads to
4) Small scale startups get a foot in the door
5) The old state-run idiots finally go under, and
6) The service they used to provide now comes from a real company.
I'd say you were somewhere arround step 4.
-- MarkusQ
I think you may have hit upon a key step in fighting the DMCA: we need to point out that, stripped of all the falderal it is intended to let manufacturers pass shoddy goods off on us poor consumers.
If only some brave defender of the consumer/voter/masses would come forward to defend us from these cads (say, leading up to an election)...I'll bet the press would love it.
Remember, lobyists may give money, but they can be sold down the river in a heart beat if someone comes along offering votes.
-- MarkusQ
You can have as many levels as you wish, and even store things that don't fall into nice "levels." Just use the OODB to store objects and then have members in the objects that refer to other objects (not enumerations) for your classifications. Thus you don't need an instance of SUV for each manufacturer that makes one...because SUV is a (single) object. In the same way, Datsun and Delorean etc. are all objects.
The class of a particular instance of make (say, Model-T or Bug) wouldn't be a manufacturer or a body-type, it would be the class make. And, like all instances of make, it would have (as members) both a manufacturer and a body-type.
The problem isn't with using an OODB, but with using an enumeration (or a collection of strings) when what you want is an object.
-- MarkusQ
The OO way to answer this is that body-type is a class and compact, sedan, SUV, etc... are instances of it. Each car would have some instance of body-type as a member. I've implemented this sort of thing in a roll-your-own OODB (in Ruby) and in a OODB-on-SQL (in Delphi); in both cases it was painless. The only thing that is remotely tricky is to avoid infinite loops in your low-level serialization code, by doing lazy streaming or by having a serialization flag, or stack, etc. just in case some later person creates a body-type (e.g. batmobile) that somehow refers back to one or more instance of car.
-- MarkusQ
Sorry you got modded down for being right. If I had real points I give you one, but at present the only mod points I have are the these fake ones I mint for my own private mod system.
-- MarkusQ
Now back to the point, which is discussing how we can write scripts to be run on environments with bourne shell only when we're developing on linux, which doesn't have a fully compatible bourne shell.
I beg to differ. Before you recommend that someone be modded down, you might want to actually read the article. If you do so, it's clear that the_code_poet isn't developing on linux; he/she is wanting to run scripts developed under sh/*NX under bash/linux. Thus my point.
-- MarkusQ
The things that matter are first, the things sh has that bash does not (from the FAQ):
* uses variable SHACCT to do shell accounting
* includes `stop' builtin (bash can use alias stop='kill -s STOP')
* `newgrp' builtin
* turns on job control if called as `jsh'
* $TIMEOUT (like bash $TMOUT)
* `^' is a synonym for `|' * new SVR4.2 sh builtins: mldmode, priv
* redirection to/from compound commands causes sh to create a subshell
* bash does not allow unbalanced quotes; sh silently inserts them at EOF
* bash does not mess with signal 11
* sh sets (euid, egid) to (uid, gid) if -p not supplied and uid < 100
* bash splits only the results of expansions on IFS, using POSIX.2 field splitting rules; sh splits all words on IFS
* sh does not allow MAILCHECK to be unset (?)
* sh does not allow traps on SIGALRM or SIGCHLD
* bash allows multiple option arguments when invoked (e.g. -x -v); sh allows only a single option argument (`sh -x -v' attempts to open a file named `-v', and, on SunOS 4.1.4, dumps core. On Solaris 2.4 and earlier versions, sh goes into an infinite loop.)
* sh exits a script if any builtin fails; bash exits only if one of the POSIX.2 `special' builtins fails
None of these seem to me to be show-stoppers if you are writing the script from scratch (or even porting a reasonably written one)--I mean really, are you counting on it to dump core if you use multiple option arguments? Is there some reason you can't ballane your quotes? So my question to the_code_poet is, what exactly are you trying to do in sh that won't work in bash?
--MarkusQ
I'm sure you solve the halting problem for any application within 4 standard deviations without waiting too long.
While I agree with most of your post, I have to point out that coming within a factor of two is not very impressive for the halting problem. Since a given program on a given input will in fact either halt in finite time or it won't, your statement boils down to the assertion that it is possible to say either the word "true" or the word "false" (at random) in polynomial time. If you do this, you will either be right-within-a-factor-of-two (what most people would call wrong) or you will be exactly right. In the industry, this algorithm is called "guessing" and can be proven to be within a factor of two of correct on all binary choices.
-- MarkusQ
To quote a teacher I once had, I would agree with you if not for the fact that you're wrong. The day/night neutino detection cycle isn't a theoretic effect they are trying to observe, it's an experimental effect they are trying to explain. The present best explanation is that they switch between two families (muon & tau) with different masses.
-- MarkusQ
No, the same number are emitted, but if they have to travel through the bulk of the earth before reaching the detector, it will effect how many you detect. That's true of photons too (you see a lot more of them durring the day, even though the sun emits at a ~constant rate), but here it is even more interesting; the neutrinos aren't being absorbed by the earth, they are being converted between two forms, one of which is easier for a particular detector to detect. So you can wind up detecting more at night!
--MarkusQ
-- MarkusQ
I can not understand why so many people continue to be taken in by the global warming/cooling scam. My only supposition is that, faced with the realization that man has essentially no impact on the universe at large, it become vitally important for them to believe that at least they are having a major impact on the earth. The fact that we are still minor players in the biosphere (there are, for example, well over a metric ton of termites per human, and termites are minor players among the insects. The insects, in turn, are dwarfed by the plant kingdom,...) is evidently so scary to them that they simply can't accept it.
-- MarkusQ
*laugh* Good thing it wasn't the Basterdino's super-sym partner (the Basterdon). Last I heard, it was suspected to mass about as much as a Mastodon (within a factor of Pi times some magic number).
-- Markus
*laught* As usual, I have no mod points.
-- MarkusQ