Exactly. The degree of parallelism (i.e., the number of independent compute cores) is much higher in a GPU. Having optimized code on a CPU has nothing to do with it. That said, GPUs are extremely limited devices, and they only work well for parallel jobs that operate in lockstep, so if you need asynchrony, traditional parallelism on CPUs is the way to go.
The NSA doesn't automatically 'know' something just because they have data that could be used to deduce that bit of information. In fact, learning these facts is still very hard. I'm sure that the NSA has access to some smart people and some resources that other people do not, but it does not change the fact that the number of 'facts' that can be inferred from a collection of data is very, very large. "Big data" implies "big time", so I think it's more likely that the organization spends most of its time looking for specific facts.
And maybe that specific fact is "what is the likelihood that somebody would buy a coffee at this location on the 1st, 17th, and 23rd of each month by chance?" But the point is, the NSA doesn't automatically know all of the facts about you, because then they would have had to had asked all of the questions. And doing that is impossible.
Fortunately for us, significance has a very precise meaning. The scientific term is meaningless, though, without an hypothesis as a frame of reference.
I commend you on your proper Maine dialect, particularly the spelling of 'fok'.
Funny story: when I was a sixth-grader, I made it all the way to the Maine state spelling bee, which was hosted at UMaine Orono. I was living in Castine at the time, so it was a big deal to go to the "big city" (Bangor... oh the irony). The winner got a college scholarship. Anyway, they made us draw straws to determine the order of the spelling bee lineup. I got #1.
So, we're standing there on stage, before the curtain opens and they decide to throw us a practice round. I get the word 'banana'. Piece of cake. B-A-N-A-N-A. After the practice round, they whisk open the curtains, say some things to the crowd, and then we're off. Again, I get the first word. The judge says "The word... is 'biggert'."
"'Biggert'?" I ask.
"Yes," say the judges.
OK, I've never heard this one before, but... here we go...
B-I-G-G-E-R-T
"Wrong. The correct spelling of 'biggert' is B-I-G-O-T."
I was crushed, and humiliated, because I was out on the first word in the first round. My mistake was twofold:
1. I should have asked for the word in a sentence, and
2. The Law of Conservation of R's means that New Englanders take the R's out of some words, but they always end up putting them back in somewhere. For example, "Law and Order" is pronounced "Lohr and Ohdah".
Sure there is. Go to the issues for a project, and click "Add Filter". I just did the search you mentioned above, and it works fine. Maybe you're using an old version? One feature it doesn't seem to have is to search the body of comments, but this is probably pretty easy to add if you know some Rails. Rails has built-in functions for finding with joins and conditions (much easier than the equivalent SQL), so it ought to be pretty easy to add what you want.
By that logic universities are money sinks as well. Unless you consider various externalities like, say, generating new ideas or building a community of researchers. Microsoft can be legitimately criticized for many things, but failing to make money on their research investments just sounds like sour grapes to me.
As I said to another commenter-- this was poorly-worded. What I meant was that F# was developed by MSR, and that F# is based on ML. Not that MS is in any way responsible for ML itself.
I find it ironic that I am being accused of MS-spin, given my post history here. Of course, you can't be expected to go back and read my old posts. But it highlights how the anonymity of this community cranks up the paranoia.
I worked in industry for eight years after doing my undergrad, and then went back to graduate school. If you want certain jobs (industry researcher), you pretty much need to have a PhD. Yes, I am older than most interns. However, I also took a huge paycut to go back to grad school (actually, MSR pays about as well as my old job), but I see this as an opportunity cost.
You misunderstood my ML comment. I mean that F# is basically ML, but that F# was developed at MSR.
I am well aware that I can talk about things that are published. When I say that there are 'interesting things on the horizon', I mean that there is interesting unpublished research. Which I really can't talk about yet, but may be able to not so far in the future.
That's not true. The Kinect camera hardware was developed by someone else, but the software (the real brains) was developed by Microsoft Research and then moved into a product group. Kinect-like technology is a big research focus for MSR.
I am currently doing an internship at Microsoft Research. There are a huge number of very innovative things on the horizon (which, sadly, I can't talk about), and Microsoft has gathered one of the most talented groups of people I have ever had the pleasure to work with. Note that I have never been a fan of Microsoft-- I conscientiously avoided their software for a long time. I've been a BSD/Linux person for more than a decade and a Mac person since the late 1980's, and I prefer to write code in more traditionally UNIX languages: Ruby, C, Scala, etc. But I've had the pleasure of working with F# (basically ML, also developed by MSR) on top of the.NET CLR while I've been at MSR, and I am quite impressed. It's a shame that Microsoft doesn't develop this stuff for UNIX.
I don't think Ballmer is blowing smoke, because from my standpoint, there's a lot going on here. While it's true that many of the things developed don't become products, the technology is very often integrated into existing products, without fanfare. The Windows fault-tolerant heap, for example, was developed at MSR for Linux, rejected by the Linux community (because it was not "incremental"), and then eventually ported to Windows. Many improvements that make Visual Studio a pleasure to use come from MSR. And, whether you think this is worthwhile or not-- MSR generates a huge number of very good research papers. Apple produces zero, although it does share some code (e.g., WebKit and LLVM). Google produces a handful and shares very little code (e.g., MapReduce and FlumeJava were never released, although they were reverse-engineered by people at Yahoo).
As someone who is currently studying probabilistic modelling-- you're wrong about these systems needing databases of phrases. While they could use that approach, it is not clear that it would help, and searching that database would likely be very inefficient.
Instead, speech/text/image recognition systems typically use some kind of probabilistic graphical model. A simple example of one is called a "Markov chain"-- simple enough that Markov was able to compute conditional probabilities by hand using his model. The basic idea works like this: when you scan a sentence to determine which words or which parts-of-speech a word belongs to, you condition the probability of the current word on the previous word. For instance, in the phrase "the cat [unknown word]", "jumped" is far more likely than "dog".
You're right that these models require a great deal of processing power, but they can be very accurate, especially when given a lot of training data. IIRC, Dragon requires you to train the recognizer yourself; a modern recognizer often doesn't need to be, which makes them much more useful, and accessible to casual users.
There is, of course, the possibility of multiple equilibria. Daisyworld is too simple of an example to capture this. Given the complexity of climate, I'd be surprised if we weren't simply at some local optimum. In any case, this is an unresolved question-- conducting the experiment may have some unpleasant consequences;)
Also, while NP-hard problems in their full generality are computationally infeasible, many instances of these problems are easily solvable. In some cases, most of the instances you care about are tractable. E.g., WalkSAT can solve for thousands (or is it millions?) of variables in a reasonable amount of time. Other randomized techniques from AI like simulated annealing are also very fast for these kinds of problems.
Are we supposed to conduct our lives in such a way as not to hurt other people's feelings? The fact is that Target used information learned by a shopper visiting their store. Their store. They have every right to that information. If you have a problem with that, don't shop at Target.
I have mixed feelings about data-mining. On the one hand, yeah-- I don't want people inferring things about my personal life. But on the other hand, if I walk into a store, and that store offers things I want, and doesn't bother trying to sell me something I don't want, isn't that a win for both of us? As someone who works with a lot of data regularly (compsci grad student... in fact I just finished my graphical probability models homework), I regularly find that technologies can almost always cut both ways. As another poster said, it boils down to the intent of the person using the tool.
The one big thing going for you is that unsupervised inference is really, really hard to do right, and given that the volume of data is growing faster than our ability to process it, you're sort of awash in anonymity.
I'm sorry-- this is idiotic. Capitalism's only value is profit. But that does NOT imply that workers get the shaft. Quality control is a very important part of manuacturing, and it is a FACT that workers who care about their work do a better job. This is why the Toyota Production System works. It works in America. You can hardly argue that Toyota is not capitalist.
Waking people up in the middle of the night out of company dorms so they can fix your design errors ain't flexibility-- it's slavery. Arguing that it is "just capitalism" is disingenuous, because capitalism is entirely compatible with happy and prosperous workers.
I stand corrected re: original license of KHTML.
Have you read the BSD license? It is incredibly short and says nothing of the sort re: re-licensing. In fact, it implies precisely the opposite. I'll give you the sprawling 3-clause version:
------
Copyright (c) YEAR, OWNER
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the ORGANIZATION nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Wrong on many counts. Firstly, you can only change a license if you own the copyright. I just checked out a copy of WebKit from their svn repo. Copyrights are all over the place-- some belong to Apple, some belong to Google, and many little pieces here and there belong to private individuals. Getting everyone to reassign their copyrights would be a nightmare. This is why many GNU projects require copyright assignment statements before you contribute patches.
Secondly, Apple put all of their original contributions under the (L)GPL. The original codebase, KHTML, was contributed by Qt, and that was BSD-licensed.
I'd like to see "no consecutive terms". That would go a long way toward ensuring that candidates spend their time in Washington actually doing their jobs. There would be no reason to spend their term campaigning.
There are plenty of applications that don't need to be able to address 64 bits worth of memory. Think webapps. Lots of cores with fast I/O are what you want. Core speed itself is less important since you're usually I/O bound.
Exactly. The degree of parallelism (i.e., the number of independent compute cores) is much higher in a GPU. Having optimized code on a CPU has nothing to do with it. That said, GPUs are extremely limited devices, and they only work well for parallel jobs that operate in lockstep, so if you need asynchrony, traditional parallelism on CPUs is the way to go.
The NSA doesn't automatically 'know' something just because they have data that could be used to deduce that bit of information. In fact, learning these facts is still very hard. I'm sure that the NSA has access to some smart people and some resources that other people do not, but it does not change the fact that the number of 'facts' that can be inferred from a collection of data is very, very large. "Big data" implies "big time", so I think it's more likely that the organization spends most of its time looking for specific facts.
And maybe that specific fact is "what is the likelihood that somebody would buy a coffee at this location on the 1st, 17th, and 23rd of each month by chance?" But the point is, the NSA doesn't automatically know all of the facts about you, because then they would have had to had asked all of the questions. And doing that is impossible.
Fortunately for us, significance has a very precise meaning. The scientific term is meaningless, though, without an hypothesis as a frame of reference.
And frankly, it's pretty obvious that we're not going to get them until China, India, and other developing countries play ball.
That sounds like a perfect application of the age-old art of coercion Americans are so good at: diplomacy.
I commend you on your proper Maine dialect, particularly the spelling of 'fok'.
Funny story: when I was a sixth-grader, I made it all the way to the Maine state spelling bee, which was hosted at UMaine Orono. I was living in Castine at the time, so it was a big deal to go to the "big city" (Bangor... oh the irony). The winner got a college scholarship. Anyway, they made us draw straws to determine the order of the spelling bee lineup. I got #1.
So, we're standing there on stage, before the curtain opens and they decide to throw us a practice round. I get the word 'banana'. Piece of cake. B-A-N-A-N-A. After the practice round, they whisk open the curtains, say some things to the crowd, and then we're off. Again, I get the first word. The judge says "The word... is 'biggert'."
"'Biggert'?" I ask.
"Yes," say the judges.
OK, I've never heard this one before, but... here we go...
B-I-G-G-E-R-T
"Wrong. The correct spelling of 'biggert' is B-I-G-O-T."
I was crushed, and humiliated, because I was out on the first word in the first round. My mistake was twofold:
1. I should have asked for the word in a sentence, and
2. The Law of Conservation of R's means that New Englanders take the R's out of some words, but they always end up putting them back in somewhere. For example, "Law and Order" is pronounced "Lohr and Ohdah".
Sure there is. Go to the issues for a project, and click "Add Filter". I just did the search you mentioned above, and it works fine. Maybe you're using an old version? One feature it doesn't seem to have is to search the body of comments, but this is probably pretty easy to add if you know some Rails. Rails has built-in functions for finding with joins and conditions (much easier than the equivalent SQL), so it ought to be pretty easy to add what you want.
By that logic universities are money sinks as well. Unless you consider various externalities like, say, generating new ideas or building a community of researchers. Microsoft can be legitimately criticized for many things, but failing to make money on their research investments just sounds like sour grapes to me.
Sure. Poor choice of words.
As I said to another commenter-- this was poorly-worded. What I meant was that F# was developed by MSR, and that F# is based on ML. Not that MS is in any way responsible for ML itself.
I find it ironic that I am being accused of MS-spin, given my post history here. Of course, you can't be expected to go back and read my old posts. But it highlights how the anonymity of this community cranks up the paranoia.
I worked in industry for eight years after doing my undergrad, and then went back to graduate school. If you want certain jobs (industry researcher), you pretty much need to have a PhD. Yes, I am older than most interns. However, I also took a huge paycut to go back to grad school (actually, MSR pays about as well as my old job), but I see this as an opportunity cost.
You misunderstood my ML comment. I mean that F# is basically ML, but that F# was developed at MSR.
I am well aware that I can talk about things that are published. When I say that there are 'interesting things on the horizon', I mean that there is interesting unpublished research. Which I really can't talk about yet, but may be able to not so far in the future.
That's not true. The Kinect camera hardware was developed by someone else, but the software (the real brains) was developed by Microsoft Research and then moved into a product group. Kinect-like technology is a big research focus for MSR.
.NET CLR while I've been at MSR, and I am quite impressed. It's a shame that Microsoft doesn't develop this stuff for UNIX.
I am currently doing an internship at Microsoft Research. There are a huge number of very innovative things on the horizon (which, sadly, I can't talk about), and Microsoft has gathered one of the most talented groups of people I have ever had the pleasure to work with. Note that I have never been a fan of Microsoft-- I conscientiously avoided their software for a long time. I've been a BSD/Linux person for more than a decade and a Mac person since the late 1980's, and I prefer to write code in more traditionally UNIX languages: Ruby, C, Scala, etc. But I've had the pleasure of working with F# (basically ML, also developed by MSR) on top of the
I don't think Ballmer is blowing smoke, because from my standpoint, there's a lot going on here. While it's true that many of the things developed don't become products, the technology is very often integrated into existing products, without fanfare. The Windows fault-tolerant heap, for example, was developed at MSR for Linux, rejected by the Linux community (because it was not "incremental"), and then eventually ported to Windows. Many improvements that make Visual Studio a pleasure to use come from MSR. And, whether you think this is worthwhile or not-- MSR generates a huge number of very good research papers. Apple produces zero, although it does share some code (e.g., WebKit and LLVM). Google produces a handful and shares very little code (e.g., MapReduce and FlumeJava were never released, although they were reverse-engineered by people at Yahoo).
As someone who is currently studying probabilistic modelling-- you're wrong about these systems needing databases of phrases. While they could use that approach, it is not clear that it would help, and searching that database would likely be very inefficient.
Instead, speech/text/image recognition systems typically use some kind of probabilistic graphical model. A simple example of one is called a "Markov chain"-- simple enough that Markov was able to compute conditional probabilities by hand using his model. The basic idea works like this: when you scan a sentence to determine which words or which parts-of-speech a word belongs to, you condition the probability of the current word on the previous word. For instance, in the phrase "the cat [unknown word]", "jumped" is far more likely than "dog".
You're right that these models require a great deal of processing power, but they can be very accurate, especially when given a lot of training data. IIRC, Dragon requires you to train the recognizer yourself; a modern recognizer often doesn't need to be, which makes them much more useful, and accessible to casual users.
There is, of course, the possibility of multiple equilibria. Daisyworld is too simple of an example to capture this. Given the complexity of climate, I'd be surprised if we weren't simply at some local optimum. In any case, this is an unresolved question-- conducting the experiment may have some unpleasant consequences ;)
Faster than blind search!
Also, while NP-hard problems in their full generality are computationally infeasible, many instances of these problems are easily solvable. In some cases, most of the instances you care about are tractable. E.g., WalkSAT can solve for thousands (or is it millions?) of variables in a reasonable amount of time. Other randomized techniques from AI like simulated annealing are also very fast for these kinds of problems.
Are we supposed to conduct our lives in such a way as not to hurt other people's feelings? The fact is that Target used information learned by a shopper visiting their store. Their store. They have every right to that information. If you have a problem with that, don't shop at Target.
I have mixed feelings about data-mining. On the one hand, yeah-- I don't want people inferring things about my personal life. But on the other hand, if I walk into a store, and that store offers things I want, and doesn't bother trying to sell me something I don't want, isn't that a win for both of us? As someone who works with a lot of data regularly (compsci grad student... in fact I just finished my graphical probability models homework), I regularly find that technologies can almost always cut both ways. As another poster said, it boils down to the intent of the person using the tool. The one big thing going for you is that unsupervised inference is really, really hard to do right, and given that the volume of data is growing faster than our ability to process it, you're sort of awash in anonymity.
Actually, Vertex is offering the drug free of charge to anyone who is uninsured and who makes less than $150,000/yr.
Given the cost of development, and the small merket, this sounds pretty reasonable to me.
I'm sorry-- this is idiotic. Capitalism's only value is profit. But that does NOT imply that workers get the shaft. Quality control is a very important part of manuacturing, and it is a FACT that workers who care about their work do a better job. This is why the Toyota Production System works. It works in America. You can hardly argue that Toyota is not capitalist.
Waking people up in the middle of the night out of company dorms so they can fix your design errors ain't flexibility-- it's slavery. Arguing that it is "just capitalism" is disingenuous, because capitalism is entirely compatible with happy and prosperous workers.
Have you read the BSD license? It is incredibly short and says nothing of the sort re: re-licensing. In fact, it implies precisely the opposite. I'll give you the sprawling 3-clause version:
------
Copyright (c) YEAR, OWNER
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-------
More here:
http://programmers.stackexchange.com/questions/75436/relicense-bsd-2-3-clause-code-to-gpl
Wrong on many counts. Firstly, you can only change a license if you own the copyright. I just checked out a copy of WebKit from their svn repo. Copyrights are all over the place-- some belong to Apple, some belong to Google, and many little pieces here and there belong to private individuals. Getting everyone to reassign their copyrights would be a nightmare. This is why many GNU projects require copyright assignment statements before you contribute patches.
Secondly, Apple put all of their original contributions under the (L)GPL. The original codebase, KHTML, was contributed by Qt, and that was BSD-licensed.
I'd like to see "no consecutive terms". That would go a long way toward ensuring that candidates spend their time in Washington actually doing their jobs. There would be no reason to spend their term campaigning.
I suppose we should add "messing with my sonar buoys" to the list.
There are plenty of applications that don't need to be able to address 64 bits worth of memory. Think webapps. Lots of cores with fast I/O are what you want. Core speed itself is less important since you're usually I/O bound.