New Algorithm Provides Huge Speedups For Optimization Problems (mit.edu)

← Back to Stories (view on slashdot.org)

New Algorithm Provides Huge Speedups For Optimization Problems (mit.edu)

Posted by Soulskill on Saturday October 24, 2015 @03:07PM from the need-for-speed dept.

An anonymous reader writes: MIT graduate students have developed a new "cutting-plane" algorithm, a general-purpose algorithm for solving optimization problems. They've also developed a new way to apply their algorithm to specific problems, yielding orders-of-magnitude efficiency gains. Optimization problems look to find the best set of values for a group of disparate parameters. For example, the cost function around designing a new smartphone would reward battery life, speed, and durability while penalizing thickness, cost, and overheating. Finding the optimal arrangement of values is a difficult problem, but the new algorithm shaves a significant amount of operations (PDF) off those calculations. Satoru Iwata, professor of mathematical informatics at the University of Tokyo, said, "This is indeed an astonishing paper. For this problem, the running time bounds derived with the aid of discrete geometry and combinatorial techniques are by far better than what I could imagine."

17 of 129 comments (clear)

Min score:

Reason:

Sort:

Re:Whoah.. by pushing-robot · 2015-10-24 15:23 · Score: 4, Funny

You never know; I'm sure he had plenty of 1UPs.

--
How can I believe you when you tell me what I don't want to hear?
Bad news for them by goombah99 · 2015-10-24 15:29 · Score: 4, Interesting

I guess they never heard of the "No Free Lunch" theorem for optimization, which believe it or not is the name of a proven theorem by David Wolpert that says the following is rigrorously true.
Averaged over all optimization problems every possible algorithm (that does not repeat a previous move) takes exactly the same amount of time to find the global minimum.
The collorary for this is that: If you show your algorithm outperforms other algorithms on a test set, then you have just proven it performs slower on all other problems.
That is the better it works on a finite class of problem, the worse it is on most problems.
it is even true that a hill climbing algorithm blunders into the minimum in the same average time as a hill descending algorithm. (yes that is true!).
Look it up. It's the bane of people who don't understand optimization.
The only things that distinguishes one optimization algorithm from another is:
1) if your problems are restricted to a subspace with special properties that your algorithm takes advantage of
2) your performance metric is different then the average time it takes to find the global minimum. for example, something that limits the worst case time to get within epsilon of the minimum.

--
Some drink at the fountain of knowledge. Others just gargle.
1. Re:Bad news for them by goombah99 · 2015-10-24 15:39 · Score: 4, Interesting
  
  To be fair, the actual article doesn't claim to violate the no free lunch theorem because it only applies to "some" problems. It's the headline that violates the no free lunch theorem. But one aspect of the no-free-lunch theorem that usually proves to be insanely diabolical is that it turns out that it's usually VERY hard to define which problems an alogorithm will do best on in general. Often you can see a specific set for which it is obvious. But as you complicate things the obviousness part goes out the window. FOr example, if you know your surface is convex and has a very simple shape like a multidimensional parabola, then it's trivial to hill descend to the bottom. But once you start putting in multiple minima and arbitrary discontinuities it becomes hard. Interestingly in high dimensional spaces know something is smooth isn't very useful as smoothness only removes a low number of degrees of freedom--leaving behind a smaller but still high dimensional space.
  Thus if there is a breakthrough here it's not the algorithm, it would be the ability to specify what kinds of problems this will work well on!
  
  --
  Some drink at the fountain of knowledge. Others just gargle.
2. Re:Bad news for them by goombah99 · 2015-10-24 16:19 · Score: 3, Insightful
  
  I totally agree with you. I indeed left out the issue of incompressible optimization functions and their infeasibility in memory. More generally, I also believe that the NFL threorem does not imply despair but rather that the problems that interest humans are a tiny subspace of all possible problems. On this subspace better average performance is entirely plausible. The rub, is that to make that claim you have to be able to specify what the subspace of interest to humans in. I think you might be able to define that in terms of algorithmic complexity. But that alone doesn't tell you if your algorithm must be better than another one on all problems of a specified maximum complexity. Thus any time someone says their algorithms outperform without telling us in what subspace they achieve this there is no proof of superior performance. Since the paper actually does make this claim it's confusing to sort out how they are limiting the problem. Likely this is my own failing at understanding their mathermatical definitions.
  to be more complete. the NFL theorem ignores the computational cost of deciding the next move in the search as well. so really the NFL is counting the average number of guesses required not the time.
  Your statement that NFL relies on all problems being equally likely is very close to equvalent to my statement that the subspace of problems of interest to humans is smaller than the subspace of all problems. So I think were in agreement.
  The NFL does not imply optimization is hopeless. It just says a claim of improved speed is dubious till you tell us what subspace of problem you improved it on. This is what I stated to begin with. If an algorithm is faster on some problems then it's slower on most other problems.
  
  --
  Some drink at the fountain of knowledge. Others just gargle.
3. Re:Bad news for them by goombah99 · 2015-10-24 16:29 · Score: 5, Insightful
  
  Right. I don't disagree. The sublty of the NFL theorem is not in its crass conclusion that brute force is just as good as anyhting else. it's in the realization first that algorithms only work on subspaces well. And that the real trick is not in creating the algorithm but being able to say what subspace it works well on.
  TO see that consider the following thought experiment. imagine we divide the space of all problems into many subspaces and on each of these subspaces we have a very fast algorithm. Surely then this defeats the NFL theorem? No. The time it takes to decide what subspace solver to use, plus the time it takes to use it would be the same as brute force.
  Thus anytime one says an algorithm is better, if you can't say when it's better, one hasn't quite solved the issue.
  The escape clause is that it's very likely the class of problems humans care about is a very small subspace of all problems. But defining what that meansis hard.
  
  --
  Some drink at the fountain of knowledge. Others just gargle.
Re:Oh... by Pseudonym · 2015-10-24 15:45 · Score: 4, Funny

Math is hard.
Or, more accurately, math is believed to be hard.

--
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
Re:Oh... by Anonymous Coward · 2015-10-24 15:52 · Score: 5, Informative

To give a brief and hopefully laymen-friendly explanation:
It isn't an equation per se; P and NP are classes of computational problems. Roughly speaking, P is the set of problems that can be solved deterministically in polynomial time with respect to some input parameter, and NP is the set of problems that can be solved non-deterministically in polynomial time with respect to some input parameter (hence the "N"). A more approachable definition of NP for laymen is that it comprises problems for which a solution can be verified (or rejected) in polynomial time. For instance, prime factorization of an integer is in NP because, given an input integer and a list of prime factors, you can verify that the list is actually the prime factorization of the integer in polynomial time (by multiplying them together).
The significance, aside from pure theoretical interest, is that a lot of very interesting practical problems are known to be in NP, and knowing that those problems could be solved in polynomial time would be very useful.
Re:No it's not a special case by goombah99 · 2015-10-24 16:08 · Score: 3, Insightful

Averaged over all problems, the time it takes to figure out which special case one has plus the time to solve that special case has to be the same as any other algorithm. The escape clause is if when you say "cutting plane" you are intrisically restricting the problem to a special case. In that case alrgorithmic speedups are plausible. Likewise is cutting plane is a class of algorithm and you are comparing variations on that particular class of algorithm then speed ups are again plausible. But if all possible problems are addressed as cutting plane problems then No it is not possible for there to be any average speed up.
But please realize that all is not lost. It is very likely that all problems of interest to humans is a tiny set of the space of all possible problems. It may well be that we are only interested in a very special subspace. I'd even bet on it. But defining what that subspace is will be hard.

--
Some drink at the fountain of knowledge. Others just gargle.
I don't understand. by Art3x · 2015-10-24 16:19 · Score: 3, Funny

I read the summary. I didn't understand it. Then I read the article. I didn't understand it either.
1. Re:I don't understand. by Kwyj1b0 · 2015-10-24 17:11 · Score: 3, Informative
  
  Consider minimizing x^2 when x can take values in -10 to 10 (we know the answer is 0, since we only consider real valued numbers). If we wanted to solve this problem, there are several approaches; some example approaches are: randomly try a lot of different values, set the derivative to zero, or try a cutting plane algorithm. In general, we might not know the analytical expression for the function we are trying to minimize (or it might be too complex) so we can't really find the derivative efficiently. Derivatives can also be computationally expensive to compute, so let's ignore that approach.
  What we can do is to say let me find a solution for which the function is less than some threshold t, and then keep reducing t till I can't find any more solutions; this is what the article meant by finding a smaller circle inside a larger one (for each value of t, I try to find solutions that are smaller than t).
  What cutting planes do is chop up the original function in to pieces - in some pieces, I know the value will be much larger than my threshold (so I don't have to search in those pieces), and in others it might be smaller - I focus on searching these pieces to find a value that is smaller than my threshold (after which I can reduce the threshold and try again). This is what (in a simplistic sense) cutting plane algorithms do; they chop up my search space.
  How we select the points for chopping is crucial - bad choices (say choosing one point after another at random, or trying points very close to each other), I spend a lot of time chopping unnecessarily (or not benefiting from chopping by much). We also want to make sure our cuts really do divide the problem in to pieces which I can discard searching, and those pieces I discard should (ideally) be quite large. Until this work, the time taken to decide where to chop was n^3.373; they brought down the time to n^3 (where n is the number of variables that the function I am trying to minimize takes as inputs).
  They also said that for special classes of functions, they can really improve the total computation time significantly (from n^6 to n^2).
  I'm glossing over (and am certain I've got some details wrong) many issues to give a taste of the big picture idea of cutting plane approaches in general; there has been decades of work on these problems that you can read (I recommend anything written by Prof. Stephen Boyd as an introduction to some of this research).
Re:Oh... by KGIII · 2015-10-24 16:27 · Score: 3, Interesting

Why, I oughtta!
Anyhow, I'm checking out the paper. Well, I will be after snuggle-time. Yay! A use-case for a tablet. I've a bit of work in this area though, specifically, modeling traffic - traffic optimization *is* hard. Throughput, where humans are concerned, is nearing an attempt at modeling chaos (and for those who think perfection is possible, I welcome their insights).
I've yet to read it but I suspect it's not a huge advance so much as it is working on prior modal work. (Not all work gets published, for better or worse, and some remains a trade-secret.) My guess is that (and this is just from a quick look, absolutely not to be taken as an authoritative statement) this is just further optimization, refinement, and it appears like it may be a good step forward.
I'm not sure that I agree with the phone analogy but, well, okay... We're not children.
It looks like, if you've got highway traffic going to A, B, and C and you have exit/merge nodes 1, 2, and 3 then you can take the current values and include the street traffic - as well as time of day data, up to and including the myriad traffic uses, that you can then more accurately predict if new merge 1.5 will increase traffic for traffic going from A to B while decreasing overall traffic (of increasing it) from B to C and if it will actually reduce surface street traffic. It also looks like you can use this to predict if an exit at 1.5 will decrease overall time for, example, delivery vehicles that need to reach surface streets or if they're better shoveled off to a new exit at 2.5 or left to exit on 3. You can also deduce predictions (never certainties) for throughput from A to C, A to B, and which exits or mergers are more likely to congest or reduce traffic on the highway as well as optimize for certain functions such as the aforementioned delivery trucks.
Basically, in programmer terms, the more subroutines you can have the more accurately you can refine your answers and the better the results *can* be. Of course this adds complexity and computational difficulty. This algorithm optimizes existing work by allowing you to use more subroutines more efficiently - perhaps think of it like calling a library or OOP (I guess?). As you were a teacher, it's akin to being able to (more easily) tailor an individualized lesson plan for each student who may, or may not, excel and then, at the end, they'd all be able to pass their test which may, or may not, be also tailored for them. Basically, you'd be able to blab a whole bunch of data at them and they'd then get the appropriate information, tailored to their needs, and may output the correct answers on the test.
This doesn't do anything (from what I see so far) to actually ensure you're inserting good data (of course). I am obviously stretching the analogies quite a bit but math is hard. Well, no... It's not hard so much but, rather, it is that most people learned via rote and not conceptually. Some idiot decided to make word problems, not a bad idea. Some idiot came along after them and told them to take the word problems apart and to ignore the words. Nobody actually showed them the concepts of the maths themselves and why the answers are the way they are. I could type a novella on the subject... Complete with examples! Nobody would listen/read it and I don't blame them.
In short, this is "just" improving on existing models and appears to be doing so by making various inputs more accessible or, more accurately, more streamlined. Again, a very cursory scan was done and I could be way off. Another quick look makes me think I'm correct. I really need to read it to be certain.
'Snot hard. *nods* Looks like good work from a quick glimpse. It'd be a shame to see it wasted on optimizing phones. ;-) Another quick peek does, indeed, note that they're claiming improvements on current models. Note: This was written in a few sporadic posts and the paragraphs are probably horrifically out of order. If they don't make sense then swap 'em around a bit until they

--
"So long and thanks for all the fish."
Re:Oh... by pushing-robot · 2015-10-24 16:39 · Score: 4, Informative

Think of a huge jigsaw puzzle. You might have thousands of pieces that could each go in thousands of possible places, and while you may know tricks to speed up the process, solving it would surely take a lot of trial and error and time. When you did finally solve the puzzle, though, it would only take you a moment to check the picture against the box and prove your solution was correct. That makes it a NP problem.
Another jigsaw puzzle has every piece labeled with column and row, so you can put the whole thing together in one sweep—no trial and error needed. Both solving the puzzle and verifying the solution would be simple. That makes it a P problem.
If P=NP, it would indicate the first jigsaw puzzle has labels on its pieces too, just harder to see. If you can find them, solving the puzzle becomes trivial.
There are a great number of NP problems in the world; if P=NP we could find the solutions to many intractable math problems within our grasp.
.
Unfortunately, one of those is encryption. Many encryption methods use a NP math problem as a lock and its solution as a key. We assume it would take an impractical amount of time to solve the math problem, so the only way in is by knowing the solution. But if P = NP, breaking that lock (by solving the math) could become as fast as using the key. Oops!

--
How can I believe you when you tell me what I don't want to hear?
The two key contributions by Kwyj1b0 · 2015-10-24 16:39 · Score: 3, Interesting

Some very interesting results, but the two key contributions are (almost verbatim from the article):
1) With the best general-purpose cutting-plane method, the time required to select each new point to test was proportional to the number of elements raised to the power 3.373. Sidford, Lee, and Wong get that down to 3.
2) And in many of those cases (submodular minimization, submodular flow, matroid intersection, and semidefinite programming), they report dramatic improvements in efficiency, from running times that scale with the fifth or sixth power of the number of variables down to the second or third power.
So this seems to be a better cutting plane approach that improves the cutting process by reducing the time to find the next test point (in general), and for certain structured problems (like SDPs) this approach reduces the computation time significantly.
This does raise some interesting questions, such as: is there a limit to how fast you can (in a general problem, not using a specific problem's structure) find the next test point? Even if we don't know what algorithm gets us there, it would be useful to have a bound for future research.
Re:Oh... by KGIII · 2015-10-24 16:59 · Score: 3, Informative

Great, in theory. Then you end up with some drunk guy, driving backwards, on a one way street. As someone who made a career of modeling traffic, well, if we had intelligent drivers then I'd not have been able to retire at 50 after selling my company. Some dumb ass will ram the pylons and actually increase the overall time needed because emergency crews will be on-scene and the vehicle will be disabled. YOU might obey the rules but, well, you know how far that gets you.
Also, in an ideal world? We'd not need one single signal light nor stop sign. Yup. It could be done with ease. The only flaw in my world-overthrowing-plan is that humans are inherently stupid and more so when they're behind the wheel of an automobile. Trust me on this - I dare say that this is the one subject where I can speak authoritatively. Coming up with, and believing in, an ideal system that relies on the intelligence of the operators is as doomed as any political ideology that follows the same principles - meaning you're gonna need to be really draconian and authoritarian.
In short, we could just take the cars away. That'd make about as much sense as any other solution. Which, by the way, is me agreeing with you. It won't be perfect. No, it won't. However, your idea may actually mean that the overall throughput slows down and doesn't return to normal or result in an increased rate. It takes, on average, as long as THREE YEARS for them to acclimate to a new traffic pattern. Some, like the 'Magic Roundabouts' in the UK are just skipped by the locals who simply avoid them rather than adapt.
For every model of improvements you make, we'll invent a newer and dumber human. So, no... It won't be perfect. The pylons might be a stretch too far. Some places have actually opted to remove the tire puncture strips for those who go the wrong way on a one way route. Why? Idiots ended up holding up traffic. Sure, it had the desired result but there are more idiots. See the UK where they put in the pylons that rise up from the street, for example, and then lower only for certain vehicles with an attached sensor. People do exactly what you think they'll do - on a *very* regular basis, some of them even picking up speed to try to beat it. They don't. They do it again. And again... And again... And again...
Pylons probably aren't the solution - drivers education and mandatory re-licensing with strict adherence to testing standards might go somewhere but that's politically infeasible in my country.

--
"So long and thanks for all the fish."
Re: Oh... by KGIII · 2015-10-24 18:37 · Score: 3, Funny

I am an inconsiderate clod you... Oh, wait...
In Soviet Russia inconsiderate clods you?

--
"So long and thanks for all the fish."
Re:Oh... by Aereus · 2015-10-24 19:43 · Score: 3, Insightful

The biggest problem with that IMHO, at least around where I live, is people cannot for the life of them zipper merge properly. The average person is a stupid shitty driver is the biggest modeling issue... They wait until the last moment to merge in, they don't equalize speed to traffic flow so they force the people on the highway to break heavily as they enter, the people on the highway don't watch the ramp and try to alleviate issues by maintaining or altering speed slightly, etc. But I'm sure you already knew that. :)
Re:Oh... by mitkey · 2015-10-24 22:20 · Score: 4, Insightful

A nice explanation, but I can't resist a nitpick: Typical jigsaw puzzles are in P, because for every pair of pieces you can decide whether they fit together or not without having to consider other pieces; therefore solving this amounts to traversing an unoriented graph. An NP jigsaw must have the possibility of multiple pieces fitting one piece in general.