"A GA is only truly effective if you let it exhaustively search the search space"
I don't think that's true. What do you consider effective? I consider the algorithm to be effective if it finds a better solution, which this does.
Effective: Sure, any gain is welcome. That is correct. But for the GA to be effective at what it does, it needs to be able to search the search-space. If the GA starts with 20 predetermined tuning parameter sets and mates and mutates those, then all we have is a fancy hill-climber. In order to find novel Maxima in the search-space, the GA should not be constrained - it must have the ability to exhaustively search, even though it does not (of course) do such a thing.
The first reason is that what actually happens in the real world in one iteration may not be representative of usual behaviour. You don't want to optimise your whole system based upon measurements from an out-of-memory condition.
I am simply stating that running a kernel with random tuning parameters some percentage of the time (which is basically required in a non-simulation GA) will produce ugly results.
The second reason is that there is no guarantee that any given chromosome will give acceptable behaviour - in fact it's virtually certain that some will be completely broken. The idea behind genetic algorithms is that the population as a whole produces some improvements from generation to generation, not that each individual is an improvement.
Actually, the point behind a genetic algorithm is to adapt without human intervention, to evolve solutions that are "good enough", often novel and difficult to create in other ways. You DO NOT WANT to make use of your weak solutions, however. Populations should be diverse and that in no way guarantees you high population fitness. In fact, you will get better results in a GA if you always generate some random solutions. The average fitness over all solutions in a population, in a case like this (when taking random + highly evolved solutions and giving them time-slices) is likely to produce worse system performance and at best will give only modest gains with sporadic somewhat unpredictable performance.
If hill-climbing worked, then people wouldn't bother with the more complex genetic algorithms, would they? Hill-climbing algorithms tend to get stuck at local maxima rather than find the optimal solution.
GA finds the hill. Hill climbing gets to the top of it. Mutation in GAs (if written correctly) often perform some amount of hill-climbing. My point was not to eliminate GAs. My point in the original post was to use them correctly and sparingly and let Hill-climbing do some of the work. You want to minimize random behavior in the kernel as well as tuning overhead.
But I think GA is an excellent idea for tuning the kernel.
If you have the resources to exhastively search the space... you don't need a GA.
Ahh, semantics. "Allow the GA to exhaustively search vs. The GA searches exhaustively"
Of course the GA will not do an exhaustive search, but it must have the ability to do so (e.g. explore all the really bad solutions as well as the good ones - the solution space should not be artificially constrained by eliminating random solutions from testing).
It is pretty obvious that I was trying to say this if you actually look at my post: "First thing: A GA is only truly effective if you let it exhaustively search the search space - which is why GAs are run against simulations rather than in operational systems. Imagine trying to tune a kernel at runtime by occassionally switching to random tuning parameters. I think this is extremely non-optimal."
The point is simple: If you free the GA to do what it is supposed to do, you will often be running random (very suboptimal) tuning parameters which will wash out the performance gains that are achieved.
I guess: Listen to what I mean, not what I say;-)
And in finding this solution, which is "grown", not engineered, it's much easier for unintended wierdnesses to creep in. A GA might solve problem X by ruining performance on problem Y, something that you, as a software engineer, would never even consider viable, and hence you forgot to force the fitness function to check problem Y along with problem X.
Of course, this can be true. That is why debugging the fitness test is the most important activity in implementing a GA. We agree.
Moment preezz, I think the both of you are missing something. I have had some experience writing commercial software that collects performance stats for things like capacity planning, tuning, etc. As you can imagine it would be bad press for an application like that to be a performance hog, but (in my experience) when used to "collect all" the machine will take ~7% performance hit (many competitors were worse for less data).
Actually, that is precisely what I am talking about. The fitness test that produces fitness values is responsible for data collection and analysis. It is precisely that activity that I am stating must be minimized in order for adaptive tuning to be useful.
That being said, I do believe there is promise to this technique. The solution must minimize it's data-collection and analysis. Perhaps the other poster's assertion that "critical events" would initiate further tuning is along the right track.
Virtual machines (such as Sun's HotSpot for Java) have shown us that run-time tuning and optimization is a powerful technique. Making use of it in the Linux Kernel is a quite good idea. Doing it correctly and minimally is the challenge.
That solution would mean only tuning the performance for whatever the system happens to be doing during that 1 minute. Perhaps a better solution would be to keep track of the current performance (via the fitness function) and begin genetic tuning only when it falls below a certain benchmark (ie: the average of the last 50 fitness function evaluations).
Yes, you are right about that. I suppose I am imagining that this would be most useful in a data-center rather than the desktop. The desktop scenario is so varied as to pose significant problems for the GA. However, let's examine the general case a bit (in a sec).
Your suggestion of flipping the GA back on when current fitness drops below a critical threshold has some inherent assumptions that need to be examined. First, it assumes that we are continually calculating the fitness of the currently running parameter set - an operation that degrades performance. Second, it assumes that we have a fitness test that is temporally monotonically consistent (e.g. over time, the fitness score has the same meaning in an absolute rather than relative sense). Third, it assumes that the fitness score will drop below a critical value (what if it doesn't?). All of these assumptions are suspect, I believe.
So, for example, you're running a game server for a couple hours so everyone on your LAN can play some UT2k4 or whatever. It tunes your settings to optimize these conditions. Then it stops and just observes, so as to maximize the benefits; "The performance penalty for searching the space will overwhelm the performance gains from ocassionally running with highly tuned solutions."
But it is still calculating fitness, which is most likely to be the most expensive part of running the GA.
Next, you quit the game and start copying some files to your laptop, a totally different usage pattern. The fitness function returns less-than-running-mean numbers so the GA kicks on and starts testing new settings.
Here is where the question of temporal monotonicity must be examined. Under completely different usage patterns, are the fitness values meaningful when compared to the previous set? Or are they only meaningful when compared to other fitness values calculated under the same load? Coming up with a fitness test that has the characteristic of temporal monotonicity would be very challenging, I believe.
It is just as likely that while other "tuning solutions" are now better under the new load, the fitness value that is calculated under the new load is even better than it was when the UT server was running (when it was the best solution). You see the problem?
Within seconds, everything is tuned up and it stops again, leaving the settings in place and just observing passively.
Observing actively, since it is still calculating fitness. But I see where you were trying to go with this.
Here is a different solution that may be more effective:
Allocate to every solution (tuning parameter) in the current population a statistically relevent number of timeslices to run, based on it's fitness. So the most fit solution is allocated the highest number of timeslices and the least fit solution is allocated the lowest number of timeslices. The statistical curve for this distribution may also be tuned using the same GA.
For each timeslice, randomly (or following some predetermined pattern) select a tuning set (solution) according to our statistical distribution. Calculate performance metrics.
Compute next generation and repeat.
One variation would be to only compute performance metrics the first time a solution is run in a generation - it would be less accurate but also lower overhead.
The point of this (or my previous suggestion of 1 min / hour, or anything similar) would be to compare apples to apples, since I don't believe we can assume temporal monotonicity.
Moving the genetic algorithm processing to another machine may be warranted. If you had a good idea of what you were going to be doing (heavy database work for instance), a dedicated machine could be used to find an optimal scheduling solution and then that could be implemented on the production machine.
Ahh.. interesting idea..
If I am running in a cluster environment, I could dedicate one or more machines in the cluster to evolve tuning parameters. That machine could publish "discoveries" to the other machines in the cluster as they come up. Of course, the dedicated machine(s) that are using GA in the kernel would be handling a standard workload just like all the other machines in the cluster, but the penalities for searching the searchspace would be restricted to the machines that are tuning.
Generally, such clusters exist because they are performance bound in some way. So the benefits that come out of adaptive tuning for such a cluster would be quite welcome.
Now if you have killed off the GA and are using a straight forward hill climber (a la Simplex) you won't find the optimal solution. Either you need the GA to continue or your need a really good classical minimizer to move from your pre-defined seed points.
I agree. However, we have a problem: Running a GA on an operational system will (MUST) test more highly sub-optimal solutions than it will near-optimal solutions. The performance penalty for searching the space will overwhelm the performance gains from ocassionally running with highly tuned solutions.
This may be solved, perhaps, by only rarely searching - allow the system to run the most fit solution in acquiesence most of the time (perhaps only letting the GA run for 1 minute every hour or something similar). Combine that with a highly conservative mutator (that resembles a hill climber) and a modestly conservative recombination algorithm for solution mating and perhaps this solution could have legs.
As I point out in another post, we still have the problem of performance metric acquisition. It is the applications that are running that we need to optimize performance for and those metrics are not available to the kernel. It would be interesting to standardize some form of adaptive performance metric publishing for applications to get that data to the kernel.
The main problem with this or any other adaptive tuning mechanism is actually acquiring performance metrics.
What is the system using to decide if a new parameter set is better than a previous? What is the fitness test?
Some applications are disk-bound, others are CPU-bound, others are network bound. The performance dance is often non-obvious (e.g. some applications will achieve generally higher performance by allowing I/O higher priority than context switching, while others that appear to perform in a similar manner will achive higher performance by reversing that order).
The kernel does not have any mechanism to determine if a particular application is performing better or worse, it can only really get a guage of throughput and load. While this MAY be enough to get small systemwide performance gains, in order to really acheive significant application-specific performance gains, I think that applications would need to explicitly add support for adaptive tuning by logging relevant performance metrics for the kernel to tune around.
Now your talking. Adaptive tuning is definitely the future. While I disagree that a general purpose GA is useful here (you should not let a GA hit an operational system, you need to let it hit a simulation first to build up a certain amount of fitness in it's solution space), many adaptive techniques would be useful and could eliminate the need to hand tune in many environments.
First thing: A GA is only truly effective if you let it exhaustively search the search space - which is why GAs are run against simulations rather than in operational systems. Imagine trying to tune a kernel at runtime by occassionally switching to random tuning parameters. I think this is extremely non-optimal. Of course, if most of the heavy lifting is done before-hand and the GA is simply examining pre-defined parameter sets on your machine, it could work. But it's not really much of a GA anymore.
As an alternative, perhaps using some form of pseduo-GA that tries to find pre-tuned parameters that most closely match your operating environment and then letting a Hill-Climbing algorithm hit it would be a better solution.
Hill climbing can also be used in a GA type manner by letting the GA determine witch parameters to climb and in what order. The climbing itself is pretty straightforward, allow vectors to interact with individual parameters. If the result is worse, reverse the vectors or switch to new parameters. Rinse, repeat.
Yes, GA can produce odd bugs and potholes. Yes, it is the fitness test that determines if that will be true. But a good GA will generally find solutions that are as good or better than hand tuning for search spaces that are very complex. Overall, this is a good idea but is probably more complex than advertised.
took that comment as Steve Ballmer saying more digital music is pirated then not. Does everyone on this board actually disagree with that?
This isn't about music piracy. This is about Ballmer taking a shot at Apple because they have a product which is user focused, whereas MS have a product which is RIAA focused.
God, who cares? Ballmer is making a tactical error. The people who listen to him are thinking, "really? I can put music for free on an iPod? I gotta get me one of those."
Even the RIAA schmucks have pirated music on their iPods... Instinctively, we all know that information is free.
iTunes sells songs because it is convenient - it sells convenience, not songs (should I spend $1 for that track or go hunt for a decent copy of it... hell, I'll just spend the buck.) Ballmer just doesn't get it.
The article is actually about chips that IBM allows sensitive information to be securely stored into and only retrieved by applications that are digitally signed or somesuch.
So your credit-card number would never be on your hard-drive or in the memory of an application that was not approved to fetch it.
We may not like trusted environments, but the alternative (of the infinitely hackable, trojanable, virus infested sesspool that we have now) is not very appealing.
It does not mean the systems will not be hackable - there will still be exploits in signed software, but they will be much more rare. Presumably the hardware will refuse to run blocks of code that were not signed. But sign a language interpreter (VM) and you don't have to sign the code that runs in it, etc...
This will give us access to content and economics that have never been made available to us. Being able to spend (and profit!) will become easier and the stuff we can get access to will be higher quality as a result.
The real problem is not that you won't be able to burn that cd, but that you may have to have Microsoft approve your product before it will be signed. That is kinda scary...
If this is done right, there will be trusted "virtual computers" that can run alongside (e.g. as another user) from untrusted "virtual computers". Running trusted software in untrusted environments might result in a lack of features or content, but there is no reason that the software would have to stop running.
If this is done right (and it probably won't be), consumers will have the right to use their money to make decisions on how much sandboxing is acceptable and how much is not.
1) You are redefining what a "vote" means from the existence of the assertion of confidence to the presence and value of a variable (in this case binary 0=no 1=yes). That is not so subtle a change as you might suspect. If we do this, we may as well make it multivalued logic (at least 0=no 1/2=undecided 1=yes)... But please understand that the variable that you are implicitly positing is not the same as the traditional definition of a vote.
2) You are making the assumption that just because a ballot can be verified (e.g. all 10 candidates have a variable value assigned to them) that security of the counting process can be ensured.
My point is simple: There is more OPPORTUNITY for manipulation in the proposed system than exists in the existing system, not just at the ballot stage but more importantly, at the various data aggregation and analysis stages. For example, it is very simple in the one voter, one vote system to trace data and do statistical analysis on the results. In the proposed system, manipulation of the variables would be much more difficult to detect by analyzing the data that is produced by an election.
I also agree that if the geeks on slashdot don't get it, the average American has no hope (yes, the average American is much more clueless than the average slashdotter, regardless of the funny jokes we like to tell to each other about our own ignorance...)
Don't dismiss what I am saying just becuase you have an instinct that this could eliminate the hated 2 party mess that we have in America and generate world peace or whatever. Opportunities for exploitation by corrupt power-mongers are rarely overlooked, as our present situation so rudely is ensuring us.
You mark who you want, and who you don't want. How hard is that?
An awful lot of room for funny business when 200 million voters could produce 350 million votes... Err, sorry, 389 million. RECOUNT! Oops 410 million...
I think the idea is to have a yes and no bubble after every name. I don't know what it means to skip a name. Permision for poll officer to vote for you?
That does not address the problem. It only ensures that a fair re-count has the opportunity to be fair. However, when all the numbers are added up, you still have an arbitrary number that has nothing to do with the number of voters and therefore lacks a general credibility. The modification to this scheme proposed as economy voting in a previous post is an example of how to make approval voting add up right. Give the voter 5 votes that he must spend on the candidates in any way they like. Something like that.
If you must spend all votes, then the system becomes non-repudiable, which as I mentioned in another post, is a very serious problem with approval voting.
In straight approval voting, what stops the guys that take your ballot from marking their candidate of choice on your ballot?
The biggest problem that I can see with systems such as approval voting is that it is not non-repudiable. In other words, it would be impossible to verify that election results were not changed. A recount would not be able to detect changes made after a voter made his/her marks.
With a one voter, one vote system, it is easy to count the number of voters and the number of votes and ensure that the results were not modified.
I believe that this is a pretty important characteristic and I am a bit skeptical about who is pushing approval voting.
I'm posting this anonymously for reasons that will soon be obvious.
And what is it that is obvious?
New startup that specializes in products
Our legal advisors felt that it might be wise for us to settle
but we decided to press them further for details of the infringement and refused to yield
we feel that this is a scary precedent
Have any other entrepreneurs around here experienced anything similar?
to go after unwitting small startups who cannot fight back?
Personally, this was a nightmare for our company
we are not aware of any patents that we may be violating.
This reads like it was written by a psychologist to scare us out of starting an open-source company (note the use of language to make as many people identify with the text as possible combined with the implicit transfer of fear). That is what is obvious.
Purchase Equifax Identity Theft Protection. Not only will they notify you by email any time your credit report changes (e.g. new credit being taken out, etc), but they will insure you in case something happens.
I recommend that everyone does this these days. Your information is out there and easily collected by those that want it. Your information IS NOT safe.
Correct. You need to realize that "using fuel" is a synonym for "increasing entropy". Until you do, you don't understand entropy.
I think there is something missing from this assertion of equivilance. But I am not going to argue this particular point. The original point, that in the presence of life, entropy decreases, is the more important one.
It ultimately boils down to semantics and constrained theoretical models. If we discard the old fashioned "order" definition of entropy and instead examine entropy purely from the point of view of the movement and distribution of energy and heat then I believe you are correct. Entropy rules.
The fundamental problem in the argument that we are having is the assumption that running out of fuel leaves the system with more entropy.
If we assume that the universe was at it's most ordered state at the point in time prior to the big bang (e.g. no fuel?) then we have a contradiction...
Not a bad definition of life, in my opinion. But always remember life can do this because life is never a closed system. The second law always wins in the end.
Ahh, so perhaps life timeshifts entropy? You have anti-entropic effects over a period of time that encompasses the existance of the life (which is never infinite, right?)...
By the way, I am not really sure that you could say that going from hydrogen to helium is much of a gain in entropy, but if you like, we could start with hydrogen and anti-hydrogen and use the antimatter collision to create energy. Over any fixed period of time, the order of the system will be increasing due to the organizing effects of life (and it's proliferation and evolution). The antimatter generation of power is an interesting one to look at thermodynamically... Don't you think?
But absolutely, this closed system is antientropic. Until the fuel runs out and everything dies... And even then, it could be argued that the order within the closed system is higher than it was when the whole expirement began.
If you consider the universe as a closed system and further assume that we have neither a big crunch or enless expansion in the distant future (the equilibrium state), and assume through the proliferation of life in the galaxy, we are approaching the "omega point", then I think you could argue that the presence of life makes the entire universe anti-entropic.
Evolution: For this refute, we will use as our basis for study genetic algorithms and/or genetic programming to evolve artificial life.
It must be observable
In every conceivable manner, the evolution of species in the simulation is observable. From the genetic code of individuals in any generation to the clear increase in fitness of populations over time.
It must be testable
The simulation itself is the test. Running the simulation satisfies this requirement.
It must be repeatable
/home/george/project-> evolve -generations 300 > results /home/george/project-> evolve -generations 300 > results
Yep, very repeatable. But evolution is an interesting bugger - it makes different stuff each time. But it sure as hell does make it!
If you want to speculate on creationism in any way that makes any sense at all (i.e. something you can sink your teath into), then you first must define god in some reasonable way.
There is a way to do this that most scientifically minded people can sink their teeth into. All you have to do is start with, "Any technological civilization that is significantly more evolved than ours (especially during biblical times) would be [as a whole] indistinguishable from god."
If you start that way, we may have some traction. Supernatural, non-measurable forces or beings that are not bound by and don't obey the laws of the universe are not fit for speculation.
If the earth's surface was on the inside of a hollow sphere (i.e. we are creating a closed system) and power was provided by fusion of hydrogen (insert your desired sci-fi about how this works here) and life emerged and evolved within this sphere, does your assertion of local order at the expense of global disorder remain true?
Personally, I agree completely that life is anti-entropic. In the presence of life, the disordered becomes ordered.
"A GA is only truly effective if you let it exhaustively search the search space"
I don't think that's true. What do you consider effective? I consider the algorithm to be effective if it finds a better solution, which this does.
Effective: Sure, any gain is welcome. That is correct. But for the GA to be effective at what it does, it needs to be able to search the search-space. If the GA starts with 20 predetermined tuning parameter sets and mates and mutates those, then all we have is a fancy hill-climber. In order to find novel Maxima in the search-space, the GA should not be constrained - it must have the ability to exhaustively search, even though it does not (of course) do such a thing.
The first reason is that what actually happens in the real world in one iteration may not be representative of usual behaviour. You don't want to optimise your whole system based upon measurements from an out-of-memory condition.
I am simply stating that running a kernel with random tuning parameters some percentage of the time (which is basically required in a non-simulation GA) will produce ugly results.
The second reason is that there is no guarantee that any given chromosome will give acceptable behaviour - in fact it's virtually certain that some will be completely broken. The idea behind genetic algorithms is that the population as a whole produces some improvements from generation to generation, not that each individual is an improvement.
Actually, the point behind a genetic algorithm is to adapt without human intervention, to evolve solutions that are "good enough", often novel and difficult to create in other ways. You DO NOT WANT to make use of your weak solutions, however. Populations should be diverse and that in no way guarantees you high population fitness. In fact, you will get better results in a GA if you always generate some random solutions. The average fitness over all solutions in a population, in a case like this (when taking random + highly evolved solutions and giving them time-slices) is likely to produce worse system performance and at best will give only modest gains with sporadic somewhat unpredictable performance.
If hill-climbing worked, then people wouldn't bother with the more complex genetic algorithms, would they? Hill-climbing algorithms tend to get stuck at local maxima rather than find the optimal solution.
GA finds the hill. Hill climbing gets to the top of it. Mutation in GAs (if written correctly) often perform some amount of hill-climbing. My point was not to eliminate GAs. My point in the original post was to use them correctly and sparingly and let Hill-climbing do some of the work. You want to minimize random behavior in the kernel as well as tuning overhead.
But I think GA is an excellent idea for tuning the kernel.
If you have the resources to exhastively search the space... you don't need a GA.
;-)
Ahh, semantics. "Allow the GA to exhaustively search vs. The GA searches exhaustively"
Of course the GA will not do an exhaustive search, but it must have the ability to do so (e.g. explore all the really bad solutions as well as the good ones - the solution space should not be artificially constrained by eliminating random solutions from testing).
It is pretty obvious that I was trying to say this if you actually look at my post: "First thing: A GA is only truly effective if you let it exhaustively search the search space - which is why GAs are run against simulations rather than in operational systems. Imagine trying to tune a kernel at runtime by occassionally switching to random tuning parameters. I think this is extremely non-optimal."
The point is simple: If you free the GA to do what it is supposed to do, you will often be running random (very suboptimal) tuning parameters which will wash out the performance gains that are achieved.
I guess: Listen to what I mean, not what I say
And in finding this solution, which is "grown", not engineered, it's much easier for unintended wierdnesses to creep in. A GA might solve problem X by ruining performance on problem Y, something that you, as a software engineer, would never even consider viable, and hence you forgot to force the fitness function to check problem Y along with problem X.
Of course, this can be true. That is why debugging the fitness test is the most important activity in implementing a GA. We agree.
Moment preezz, I think the both of you are missing something. I have had some experience writing commercial software that collects performance stats for things like capacity planning, tuning, etc. As you can imagine it would be bad press for an application like that to be a performance hog, but (in my experience) when used to "collect all" the machine will take ~7% performance hit (many competitors were worse for less data).
Actually, that is precisely what I am talking about. The fitness test that produces fitness values is responsible for data collection and analysis. It is precisely that activity that I am stating must be minimized in order for adaptive tuning to be useful.
That being said, I do believe there is promise to this technique. The solution must minimize it's data-collection and analysis. Perhaps the other poster's assertion that "critical events" would initiate further tuning is along the right track.
Virtual machines (such as Sun's HotSpot for Java) have shown us that run-time tuning and optimization is a powerful technique. Making use of it in the Linux Kernel is a quite good idea. Doing it correctly and minimally is the challenge.
IMHO
That solution would mean only tuning the performance for whatever the system happens to be doing during that 1 minute. Perhaps a better solution would be to keep track of the current performance (via the fitness function) and begin genetic tuning only when it falls below a certain benchmark (ie: the average of the last 50 fitness function evaluations).
Yes, you are right about that. I suppose I am imagining that this would be most useful in a data-center rather than the desktop. The desktop scenario is so varied as to pose significant problems for the GA. However, let's examine the general case a bit (in a sec).
Your suggestion of flipping the GA back on when current fitness drops below a critical threshold has some inherent assumptions that need to be examined. First, it assumes that we are continually calculating the fitness of the currently running parameter set - an operation that degrades performance. Second, it assumes that we have a fitness test that is temporally monotonically consistent (e.g. over time, the fitness score has the same meaning in an absolute rather than relative sense). Third, it assumes that the fitness score will drop below a critical value (what if it doesn't?). All of these assumptions are suspect, I believe.
So, for example, you're running a game server for a couple hours so everyone on your LAN can play some UT2k4 or whatever. It tunes your settings to optimize these conditions. Then it stops and just observes, so as to maximize the benefits; "The performance penalty for searching the space will overwhelm the performance gains from ocassionally running with highly tuned solutions."
But it is still calculating fitness, which is most likely to be the most expensive part of running the GA.
Next, you quit the game and start copying some files to your laptop, a totally different usage pattern. The fitness function returns less-than-running-mean numbers so the GA kicks on and starts testing new settings.
Here is where the question of temporal monotonicity must be examined. Under completely different usage patterns, are the fitness values meaningful when compared to the previous set? Or are they only meaningful when compared to other fitness values calculated under the same load? Coming up with a fitness test that has the characteristic of temporal monotonicity would be very challenging, I believe.
It is just as likely that while other "tuning solutions" are now better under the new load, the fitness value that is calculated under the new load is even better than it was when the UT server was running (when it was the best solution). You see the problem?
Within seconds, everything is tuned up and it stops again, leaving the settings in place and just observing passively.
Observing actively, since it is still calculating fitness. But I see where you were trying to go with this.
Here is a different solution that may be more effective:
Allocate to every solution (tuning parameter) in the current population a statistically relevent number of timeslices to run, based on it's fitness. So the most fit solution is allocated the highest number of timeslices and the least fit solution is allocated the lowest number of timeslices. The statistical curve for this distribution may also be tuned using the same GA.
For each timeslice, randomly (or following some predetermined pattern) select a tuning set (solution) according to our statistical distribution. Calculate performance metrics.
Compute next generation and repeat.
One variation would be to only compute performance metrics the first time a solution is run in a generation - it would be less accurate but also lower overhead.
The point of this (or my previous suggestion of 1 min / hour, or anything similar) would be to compare apples to apples, since I don't believe we can assume temporal monotonicity.
Thoughts?
Moving the genetic algorithm processing to another machine may be warranted. If you had a good idea of what you were going to be doing (heavy database work for instance), a dedicated machine could be used to find an optimal scheduling solution and then that could be implemented on the production machine.
Ahh.. interesting idea..
If I am running in a cluster environment, I could dedicate one or more machines in the cluster to evolve tuning parameters. That machine could publish "discoveries" to the other machines in the cluster as they come up. Of course, the dedicated machine(s) that are using GA in the kernel would be handling a standard workload just like all the other machines in the cluster, but the penalities for searching the searchspace would be restricted to the machines that are tuning.
Generally, such clusters exist because they are performance bound in some way. So the benefits that come out of adaptive tuning for such a cluster would be quite welcome.
It's a damned good idea.
Now if you have killed off the GA and are using a straight forward hill climber (a la Simplex) you won't find the optimal solution. Either you need the GA to continue or your need a really good classical minimizer to move from your pre-defined seed points.
I agree. However, we have a problem: Running a GA on an operational system will (MUST) test more highly sub-optimal solutions than it will near-optimal solutions. The performance penalty for searching the space will overwhelm the performance gains from ocassionally running with highly tuned solutions.
This may be solved, perhaps, by only rarely searching - allow the system to run the most fit solution in acquiesence most of the time (perhaps only letting the GA run for 1 minute every hour or something similar). Combine that with a highly conservative mutator (that resembles a hill climber) and a modestly conservative recombination algorithm for solution mating and perhaps this solution could have legs.
As I point out in another post, we still have the problem of performance metric acquisition. It is the applications that are running that we need to optimize performance for and those metrics are not available to the kernel. It would be interesting to standardize some form of adaptive performance metric publishing for applications to get that data to the kernel.
The main problem with this or any other adaptive tuning mechanism is actually acquiring performance metrics.
What is the system using to decide if a new parameter set is better than a previous? What is the fitness test?
Some applications are disk-bound, others are CPU-bound, others are network bound. The performance dance is often non-obvious (e.g. some applications will achieve generally higher performance by allowing I/O higher priority than context switching, while others that appear to perform in a similar manner will achive higher performance by reversing that order).
The kernel does not have any mechanism to determine if a particular application is performing better or worse, it can only really get a guage of throughput and load. While this MAY be enough to get small systemwide performance gains, in order to really acheive significant application-specific performance gains, I think that applications would need to explicitly add support for adaptive tuning by logging relevant performance metrics for the kernel to tune around.
Thoughts?
Now your talking. Adaptive tuning is definitely the future. While I disagree that a general purpose GA is useful here (you should not let a GA hit an operational system, you need to let it hit a simulation first to build up a certain amount of fitness in it's solution space), many adaptive techniques would be useful and could eliminate the need to hand tune in many environments.
First thing: A GA is only truly effective if you let it exhaustively search the search space - which is why GAs are run against simulations rather than in operational systems. Imagine trying to tune a kernel at runtime by occassionally switching to random tuning parameters. I think this is extremely non-optimal. Of course, if most of the heavy lifting is done before-hand and the GA is simply examining pre-defined parameter sets on your machine, it could work. But it's not really much of a GA anymore.
As an alternative, perhaps using some form of pseduo-GA that tries to find pre-tuned parameters that most closely match your operating environment and then letting a Hill-Climbing algorithm hit it would be a better solution.
Hill climbing can also be used in a GA type manner by letting the GA determine witch parameters to climb and in what order. The climbing itself is pretty straightforward, allow vectors to interact with individual parameters. If the result is worse, reverse the vectors or switch to new parameters. Rinse, repeat.
Yes, GA can produce odd bugs and potholes. Yes, it is the fitness test that determines if that will be true. But a good GA will generally find solutions that are as good or better than hand tuning for search spaces that are very complex. Overall, this is a good idea but is probably more complex than advertised.
took that comment as Steve Ballmer saying more digital music is pirated then not. Does everyone on this board actually disagree with that?
This isn't about music piracy. This is about Ballmer taking a shot at Apple because they have a product which is user focused, whereas MS have a product which is RIAA focused.
God, who cares? Ballmer is making a tactical error. The people who listen to him are thinking, "really? I can put music for free on an iPod? I gotta get me one of those."
Even the RIAA schmucks have pirated music on their iPods... Instinctively, we all know that information is free.
iTunes sells songs because it is convenient - it sells convenience, not songs (should I spend $1 for that track or go hunt for a decent copy of it... hell, I'll just spend the buck.) Ballmer just doesn't get it.
The article is actually about chips that IBM allows sensitive information to be securely stored into and only retrieved by applications that are digitally signed or somesuch.
So your credit-card number would never be on your hard-drive or in the memory of an application that was not approved to fetch it.
This is a good idea.
We may not like trusted environments, but the alternative (of the infinitely hackable, trojanable, virus infested sesspool that we have now) is not very appealing.
Trusted computing = secure hardware + code signing
It does not mean the systems will not be hackable - there will still be exploits in signed software, but they will be much more rare. Presumably the hardware will refuse to run blocks of code that were not signed. But sign a language interpreter (VM) and you don't have to sign the code that runs in it, etc...
This will give us access to content and economics that have never been made available to us. Being able to spend (and profit!) will become easier and the stuff we can get access to will be higher quality as a result.
The real problem is not that you won't be able to burn that cd, but that you may have to have Microsoft approve your product before it will be signed. That is kinda scary...
If this is done right, there will be trusted "virtual computers" that can run alongside (e.g. as another user) from untrusted "virtual computers". Running trusted software in untrusted environments might result in a lack of features or content, but there is no reason that the software would have to stop running.
If this is done right (and it probably won't be), consumers will have the right to use their money to make decisions on how much sandboxing is acceptable and how much is not.
1) You are redefining what a "vote" means from the existence of the assertion of confidence to the presence and value of a variable (in this case binary 0=no 1=yes). That is not so subtle a change as you might suspect. If we do this, we may as well make it multivalued logic (at least 0=no 1/2=undecided 1=yes)... But please understand that the variable that you are implicitly positing is not the same as the traditional definition of a vote.
2) You are making the assumption that just because a ballot can be verified (e.g. all 10 candidates have a variable value assigned to them) that security of the counting process can be ensured.
My point is simple: There is more OPPORTUNITY for manipulation in the proposed system than exists in the existing system, not just at the ballot stage but more importantly, at the various data aggregation and analysis stages. For example, it is very simple in the one voter, one vote system to trace data and do statistical analysis on the results. In the proposed system, manipulation of the variables would be much more difficult to detect by analyzing the data that is produced by an election.
I also agree that if the geeks on slashdot don't get it, the average American has no hope (yes, the average American is much more clueless than the average slashdotter, regardless of the funny jokes we like to tell to each other about our own ignorance...)
Don't dismiss what I am saying just becuase you have an instinct that this could eliminate the hated 2 party mess that we have in America and generate world peace or whatever. Opportunities for exploitation by corrupt power-mongers are rarely overlooked, as our present situation so rudely is ensuring us.
You mark who you want, and who you don't want. How hard is that?
An awful lot of room for funny business when 200 million voters could produce 350 million votes... Err, sorry, 389 million. RECOUNT! Oops 410 million...
Make sense?
I think the idea is to have a yes and no bubble after every name. I don't know what it means to skip a name. Permision for poll officer to vote for you?
That does not address the problem. It only ensures that a fair re-count has the opportunity to be fair. However, when all the numbers are added up, you still have an arbitrary number that has nothing to do with the number of voters and therefore lacks a general credibility. The modification to this scheme proposed as economy voting in a previous post is an example of how to make approval voting add up right. Give the voter 5 votes that he must spend on the candidates in any way they like. Something like that.
If you must spend all votes, then the system becomes non-repudiable, which as I mentioned in another post, is a very serious problem with approval voting.
In straight approval voting, what stops the guys that take your ballot from marking their candidate of choice on your ballot?
The biggest problem that I can see with systems such as approval voting is that it is not non-repudiable. In other words, it would be impossible to verify that election results were not changed. A recount would not be able to detect changes made after a voter made his/her marks.
With a one voter, one vote system, it is easy to count the number of voters and the number of votes and ensure that the results were not modified.
I believe that this is a pretty important characteristic and I am a bit skeptical about who is pushing approval voting.
And what is it that is obvious?
New startup that specializes in products
Our legal advisors felt that it might be wise for us to settle
but we decided to press them further for details of the infringement and refused to yield
we feel that this is a scary precedent
Have any other entrepreneurs around here experienced anything similar?
to go after unwitting small startups who cannot fight back?
Personally, this was a nightmare for our company
we are not aware of any patents that we may be violating.
This reads like it was written by a psychologist to scare us out of starting an open-source company (note the use of language to make as many people identify with the text as possible combined with the implicit transfer of fear). That is what is obvious.
Purchase Equifax Identity Theft Protection. Not only will they notify you by email any time your credit report changes (e.g. new credit being taken out, etc), but they will insure you in case something happens.
I recommend that everyone does this these days. Your information is out there and easily collected by those that want it. Your information IS NOT safe.
It ultimately boils down to semantics and constrained theoretical models. If we discard the old fashioned "order" definition of entropy and instead examine entropy purely from the point of view of the movement and distribution of energy and heat then I believe you are correct. Entropy rules.
Just have to be careful with that word "order".
The fundamental problem in the argument that we are having is the assumption that running out of fuel leaves the system with more entropy.
If we assume that the universe was at it's most ordered state at the point in time prior to the big bang (e.g. no fuel?) then we have a contradiction...
By the way, I am not really sure that you could say that going from hydrogen to helium is much of a gain in entropy, but if you like, we could start with hydrogen and anti-hydrogen and use the antimatter collision to create energy. Over any fixed period of time, the order of the system will be increasing due to the organizing effects of life (and it's proliferation and evolution). The antimatter generation of power is an interesting one to look at thermodynamically... Don't you think?
But absolutely, this closed system is antientropic. Until the fuel runs out and everything dies... And even then, it could be argued that the order within the closed system is higher than it was when the whole expirement began.
If you consider the universe as a closed system and further assume that we have neither a big crunch or enless expansion in the distant future (the equilibrium state), and assume through the proliferation of life in the galaxy, we are approaching the "omega point", then I think you could argue that the presence of life makes the entire universe anti-entropic.
Yep, very repeatable. But evolution is an interesting bugger - it makes different stuff each time. But it sure as hell does make it!
If you want to speculate on creationism in any way that makes any sense at all (i.e. something you can sink your teath into), then you first must define god in some reasonable way.
There is a way to do this that most scientifically minded people can sink their teeth into. All you have to do is start with, "Any technological civilization that is significantly more evolved than ours (especially during biblical times) would be [as a whole] indistinguishable from god."
If you start that way, we may have some traction. Supernatural, non-measurable forces or beings that are not bound by and don't obey the laws of the universe are not fit for speculation.
If the earth's surface was on the inside of a hollow sphere (i.e. we are creating a closed system) and power was provided by fusion of hydrogen (insert your desired sci-fi about how this works here) and life emerged and evolved within this sphere, does your assertion of local order at the expense of global disorder remain true?
Personally, I agree completely that life is anti-entropic. In the presence of life, the disordered becomes ordered.