There are a few PET/SPECT isotopes that require you to have a nearby cyclotron: 11C (20 minute half-life) and 82Rb (76 second half-life) for example. The vast majority of clinical studies are done using 18F (109 minutes), so generally speaking, you are right, you don't need to have the cyclotron too nearby. But, for some applications, it is in fact necessary.
That said, I agree, it's very hard to see an application of laser-wakefield acceleration of electrons to PET.
Any logic to this? Or is the idea that when they add another gazillion processors to each one, they will be able to meet up nicely somewhere over the Pacific.
As a simple example, I have been looking for Ph.D. level research positions. Of course every offering has a different title, "Research Scientist", "Scientist", "Algorithm Developer"... that's not surprising. But it's a minor annoyance, but an annoyance all the same, when companies variously advertise "Ph.D.", "PhD", "doctorate", "doctoral", or the rare but ever difficult "Ph. D." and "Ph D". Some job boards have education level fields, but many companies don't use them, so you have to search through the text. Finally, companies like Google tend to have blurbs in their text talking about how it was founded by Ph.D. students or they have lots of Ph.D.s, etc, so they always show up in my searches, regardless of what job it is.
So, as a whole, if we could force companies to not just copy and paste big blobs of text into a dozen different job boards, and instead have them fill out fields of relevant information ("Company history", "Educational requirements", "Programming languages", etc) with some simple rules, then life would be easier.
On the flip side... I of course mean easier for the job searcher, not the company. I suppose it's much easier for a company HR worker to copy and paste onto a dozen different boards than try to maintain some compliance with a dozen different posting standards. I also suppose you'd be wary about hiring a Ph.D. scientist who's not bright enough to search for PhD versus Ph.D....
Two types of biomedical research that have this "needle in a haystack" problem are function magnetic resonance imaging (fMRI) and computational neuroanatomy.
In fMRI, very basically you image the brain while the test subject is performing a task (looking at something, actively listening, tapping a finger, etc) and when they are not, and use the change in local blood oxygenation to infer brain activity. Since this is a tiny signal, you repeat lots of times. The simplest way to determine where the activity is would be just to do a t-test against the background or against an assumption of no change. However, given many tens or hundreds of thousands or millions of pixels, you'll have lots of false positives, or have to use a really really low p-value. Through the magic of spatial correlations and fancy math tricks, one can do reasonable interpretations of the data, but again, it's that sort of "needle in a haystack" problem.
In computational neuroanatomy, you scan lots of brains of normal folks and lots of brains of folks with neurodegenerative diseases, say, Alzheimers (or younger old people and older old people, that sort of thing). You perform some complex warping to map these brains onto a template brain (a real person, the younger version of the person, or some synthetic template... all are done), then study the warpings that are needed. What you want to see is how the various lobes of the brain are basically eroding with time as the disease progresses. Again, we can do standard statistics, but we are hurt by the massive number of data points we are dealing with (again, it's pixel by pixel), so we have to use more fancy math to get around it. In this case, theories of Hotelling and Adler (referenced in the original article from the original post) are very useful.
As the amount of scientific data we have grows, we are starting to draw on what was once pure abstract mathematics to get meaningful statistics out. I can't pretend to even begin to understand the PDF article, but it's neat to see the same problems in lots of very different fields!
In pattern recognition circles, there's a very simple alogorithm known as the k-nearest neighbors classifier. First you start with a big database of information. Let's say you have a database of a million bank loan applicants in the past, and for each one of those applicants, you have several variables: say, income, age, amount of loan, length of loan, total household income, etc. You also have final information as to whether that loan applicant ended up defaulting on their loan. Now, you have a new applicant come in, and you want to predict, are they a safe loan? The k-nearest neighbor algorithm says: just compute the distance (you have a lot of freedom as to how you define distance) between this new applicant and the million people in your database. For example, you can just take the sum of squares difference between all the variables. Now, take those million distances you've computed, sort them from nearest to farthest, and take the k closest instances, and basically have them vote on whether or not this new person will default. If k=500, and the 453 nearest neighbors all defaulted, then you suspect the new person will also default.
Sorting a million entries is not hard, but doing it quickly gets a bit difficult. If you want to get even fancier, you can scale the units along each dimension before you compute the distance, to give more weight to certain parameters (feature weighting, or wrapper bias, in the pattern recognition parlance). To do that really really well, you have to do an optimization. Because this is fraught with local minima, this requires either a lot of restarts or stochastic algorithms. Either way, you end up evaluating, and doing the million-entry sort, many hundreds or thousands of times.
Now, there are really fast ways of sorting out the k-nearest neighbors (KD-trees, for instance), and there are ways of pruning those million training points down, but this is just an example of a problem that involves colossal sorting. This algorithm is considered basic in that its very easy to understand and implement, and often performs very well, regardless of the structure of the data. However, just because its "basic" doesn't mean that it isn't used very heavily in real-life data mining situations.
Hope that sheds some light on the subject! Anyone else have another giant sorting application to share?
I do research in medical image analysis, and I regularly use both Google Scholar and PubMed. I think that there's a big stylistic difference about how different people approach these searches. Going to college and grad school in the mid to late 90's into the 2000s, I grew up (academically, at least) with the idea that I should be able to just type a few words into a search bar and a bunch of related stuff would come up, without having to think too much about where in the document it was located and whether it was a keyword or whether I was searching for the institution or publisher or whatever.
Older scientists grew up searching those big bound hardback science citation indices, where you had to think very hard about keywords and publishers and such. Even the abstract was more critical then, because you couldn't just grab articles willy nilly onto your desktop and then sort them out later.
I think of it like the difference between my parents and myself when searching for stuff on the web... my parents like to go to Yahoo and descend down the well organized categories until they get what they want, whereas I just type a bunch of phrases into Google. I'm not saying one way is better than the other... it's just a different style.
That being said, Google Scholar does need a bit more polishing, but I still use it a lot. However, until you can grab citation info into Endnote or Bibtex, it don't see it replacing anything soon.
I suppose part of it is the difficulty in benchmarking. It's a hard enough task when you've got all your processors sitting in front of you. With a distributed computing system, you can't very well ask everyone to not touch their computer for a few hours on Wednesday at noon. Additionally, it takes a long time to distribute information across distribution computing systems. So your timing would be subject to all the events that affect network speed around the world. In the end, you'd probably find that for some only modestly gigantic problems that they use for benchmarking supercomputers, you might be better off not using the whole grid, but only a few thousand of your best clients (people sitting on high speed networks using very powerful, possibly multiprocessor machines).
That being said, as has been said in every discussion about supercomputing and distributed computing, the set of problems for which traditional supercomputers are designed is typically very different from distributed computing problems. Perhaps some altogether different measure of computing capacity would be more appropriate for distributed computing.
You can watch the load on the National Center for Supercomputing Applications online system monitor.
I can't tell you what all the little widgets mean though. All I know is that while they may be informative, they can't beat the cool blinky lights of the old Connection Machine
After years and years in postgraduate academia, reviewing papers and publishing papers, I've never once heard of this "common practice". Many journals are at least nominally blinded during review, and a huge chunk of journal first authors are written by people with BS and MS degrees (ie grad students). Additionally, there are also many journals that don't affix degrees with author names and list the degree field as "optional" when submitting. Thus, why would someone feel the need to lie like that?
Sure, everything can be understood without school. But not everything can be done without school.
To be more specific, there are things that require a formal institution to be possible. Why would the government give someone a $5 million to study cancer, for example. Well, you write a grant with a good idea. Okay, lots of people with little education have good ideas. Looked at all those annoying futurists we have postings about on/. But you also need proof that you're able to do what you say. Part of that proof is, in fact, your degree. Part of it is demonstration of previous work. How do you get said previous work? By having "apprenticed" (read" PhD, postdoc, or previous grant... in the case of the last, return to the top of the paragraph and repeat). How did your mentor have money to do this? Same deal... good idea, and proof of ability to deliver.
In other words, there is a lot of science that isn't cheap. And it isn't small. You can't (or at least shouldn't) do high energy physics in your basement. You can't do experimental astrophysics by buying stuff of the web. You can't even do "smaller" science, like microbiology without the right facilities, and that stuff costs lots of money. And for someone to give you that money, they want to be darn sure that you know what you're doing. And I'm sorry to say that in the real world, that requires education.
So yes, you can learn anything you want however you want. But there is a lot you will not be able to do without a degree. Fair or not, it's just how it is.
Steve Jobs (and Bill Gates) may be wildly successful people, and perhaps to some degree the fact that they were "unhindered" by a conventional college degree may contribute to this.
On the other hand, I'd like to know how many thousand or million-PhD-hours were spent on developing the technology that goes into every Apple computer (Intel chip, MS product, etc). Yes, I know, "Apple did not invent x, y, or z, blah blah blah". But at some point, someone who went to university and graduate school and a postdoc, etc had to make the major breakthroughs in semiconductors, display technology, storage technology, networking technology, computer theory, etc that makes any of this possible.
I don't believe Steve Jobs was implying that people should not go to universities or that he has no use for people with conventional educations... and reading through the rest of his speech, I thought it was very very insightful. But, I think there is a tremendous arrogance in the people who are saying this, as it ignores the contributions of everyone else who's hard work and education contribute to making your job so darn easy.
As a disclaimer, I went to a "name-brand" type of university, and grad school, and did a postdoc (not in CS, though). I learned a tremendous amount in all my work there, even in the literature and history classes that are completely irrelevant to my day-to-day life. That being said, I suppose my imagination is so completely squashed by my conventional education that you shouldn't be surprised that I have a pro-university bias...
On the plus side of Win2K, it would only be fair to note the millions of MS Word (yes, you may look down your noses at them, but believe it or not, most people do not use StarOffice or vi+TeX to write their documents) documents that have been created with people using Win2K. And the millions of Excel spreadsheets, and millions of presentations, etc.
Now, I suppose if you define a failure in that it was not perfect, then yes, of course it was a failure. But did it do what Microsoft wanted (make ooodles of money and get MS products everywhere in the business world)? Yes. And did it do what all those people who DIDN'T experience any security problems wanted (office productivity)? Yes.
Win2k was like a 1990 Taurus. They were everywhere, billions of miles were gotten out of them, but she had no airbags. Ponder that, and don't try and look up whether or not the Taurus had airbags, since I didn't;)
As already mentioned, the biggest problem with FPGAs is the difficulty/time in writing the logic. While that's not necessarily a big problem for a major supercomputing center or a CS research center, it (along with cost) is a problem that prevents FPGAs from being routinely adopted by end-users such as people in the applied research community.
One idea to get around this has been advanced by (among others), Stretch, Inc.. The summary is that their compiler analyze your C-code and decide what can be more effectively rewritten as new instructions for their chip, and sets it up on the fly. You never get the ultra-low level control (or performance) of FPGA programming, but in principle you get more performance than before.
Their primary applications have been basically as programmable replacements for DSPs, but they really want to push workstation applications for their products.
That being said, I neither work for them nor have I ever used any of their products, but it certainly sounds interesting!
There are a few PET/SPECT isotopes that require you to have a nearby cyclotron: 11C (20 minute half-life) and 82Rb (76 second half-life) for example. The vast majority of clinical studies are done using 18F (109 minutes), so generally speaking, you are right, you don't need to have the cyclotron too nearby. But, for some applications, it is in fact necessary.
That said, I agree, it's very hard to see an application of laser-wakefield acceleration of electrons to PET.
Comparing pictures of this KEK Blue Gene with pictures of Lawrence Livermore's Blue Gene/L, what strikes me is ... one of them appears to be upside-down.
Any logic to this? Or is the idea that when they add another gazillion processors to each one, they will be able to meet up nicely somewhere over the Pacific.
Consistent formatting, consistent information.
... that's not surprising. But it's a minor annoyance, but an annoyance all the same, when companies variously advertise "Ph.D.", "PhD", "doctorate", "doctoral", or the rare but ever difficult "Ph. D." and "Ph D". Some job boards have education level fields, but many companies don't use them, so you have to search through the text. Finally, companies like Google tend to have blurbs in their text talking about how it was founded by Ph.D. students or they have lots of Ph.D.s, etc, so they always show up in my searches, regardless of what job it is.
... I of course mean easier for the job searcher, not the company. I suppose it's much easier for a company HR worker to copy and paste onto a dozen different boards than try to maintain some compliance with a dozen different posting standards. I also suppose you'd be wary about hiring a Ph.D. scientist who's not bright enough to search for PhD versus Ph.D. ...
As a simple example, I have been looking for Ph.D. level research positions. Of course every offering has a different title, "Research Scientist", "Scientist", "Algorithm Developer"
So, as a whole, if we could force companies to not just copy and paste big blobs of text into a dozen different job boards, and instead have them fill out fields of relevant information ("Company history", "Educational requirements", "Programming languages", etc) with some simple rules, then life would be easier.
On the flip side
Two types of biomedical research that have this "needle in a haystack" problem are function magnetic resonance imaging (fMRI) and computational neuroanatomy. In fMRI, very basically you image the brain while the test subject is performing a task (looking at something, actively listening, tapping a finger, etc) and when they are not, and use the change in local blood oxygenation to infer brain activity. Since this is a tiny signal, you repeat lots of times. The simplest way to determine where the activity is would be just to do a t-test against the background or against an assumption of no change. However, given many tens or hundreds of thousands or millions of pixels, you'll have lots of false positives, or have to use a really really low p-value. Through the magic of spatial correlations and fancy math tricks, one can do reasonable interpretations of the data, but again, it's that sort of "needle in a haystack" problem. In computational neuroanatomy, you scan lots of brains of normal folks and lots of brains of folks with neurodegenerative diseases, say, Alzheimers (or younger old people and older old people, that sort of thing). You perform some complex warping to map these brains onto a template brain (a real person, the younger version of the person, or some synthetic template ... all are done), then study the warpings that are needed. What you want to see is how the various lobes of the brain are basically eroding with time as the disease progresses. Again, we can do standard statistics, but we are hurt by the massive number of data points we are dealing with (again, it's pixel by pixel), so we have to use more fancy math to get around it. In this case, theories of Hotelling and Adler (referenced in the original article from the original post) are very useful.
As the amount of scientific data we have grows, we are starting to draw on what was once pure abstract mathematics to get meaningful statistics out. I can't pretend to even begin to understand the PDF article, but it's neat to see the same problems in lots of very different fields!
In pattern recognition circles, there's a very simple alogorithm known as the k-nearest neighbors classifier. First you start with a big database of information. Let's say you have a database of a million bank loan applicants in the past, and for each one of those applicants, you have several variables: say, income, age, amount of loan, length of loan, total household income, etc. You also have final information as to whether that loan applicant ended up defaulting on their loan. Now, you have a new applicant come in, and you want to predict, are they a safe loan? The k-nearest neighbor algorithm says: just compute the distance (you have a lot of freedom as to how you define distance) between this new applicant and the million people in your database. For example, you can just take the sum of squares difference between all the variables. Now, take those million distances you've computed, sort them from nearest to farthest, and take the k closest instances, and basically have them vote on whether or not this new person will default. If k=500, and the 453 nearest neighbors all defaulted, then you suspect the new person will also default.
Sorting a million entries is not hard, but doing it quickly gets a bit difficult. If you want to get even fancier, you can scale the units along each dimension before you compute the distance, to give more weight to certain parameters (feature weighting, or wrapper bias, in the pattern recognition parlance). To do that really really well, you have to do an optimization. Because this is fraught with local minima, this requires either a lot of restarts or stochastic algorithms. Either way, you end up evaluating, and doing the million-entry sort, many hundreds or thousands of times.
Now, there are really fast ways of sorting out the k-nearest neighbors (KD-trees, for instance), and there are ways of pruning those million training points down, but this is just an example of a problem that involves colossal sorting. This algorithm is considered basic in that its very easy to understand and implement, and often performs very well, regardless of the structure of the data. However, just because its "basic" doesn't mean that it isn't used very heavily in real-life data mining situations.
Hope that sheds some light on the subject! Anyone else have another giant sorting application to share?
I do research in medical image analysis, and I regularly use both Google Scholar and PubMed. I think that there's a big stylistic difference about how different people approach these searches. Going to college and grad school in the mid to late 90's into the 2000s, I grew up (academically, at least) with the idea that I should be able to just type a few words into a search bar and a bunch of related stuff would come up, without having to think too much about where in the document it was located and whether it was a keyword or whether I was searching for the institution or publisher or whatever.
... my parents like to go to Yahoo and descend down the well organized categories until they get what they want, whereas I just type a bunch of phrases into Google. I'm not saying one way is better than the other ... it's just a different style.
Older scientists grew up searching those big bound hardback science citation indices, where you had to think very hard about keywords and publishers and such. Even the abstract was more critical then, because you couldn't just grab articles willy nilly onto your desktop and then sort them out later.
I think of it like the difference between my parents and myself when searching for stuff on the web
That being said, Google Scholar does need a bit more polishing, but I still use it a lot. However, until you can grab citation info into Endnote or Bibtex, it don't see it replacing anything soon.
I suppose part of it is the difficulty in benchmarking. It's a hard enough task when you've got all your processors sitting in front of you. With a distributed computing system, you can't very well ask everyone to not touch their computer for a few hours on Wednesday at noon. Additionally, it takes a long time to distribute information across distribution computing systems. So your timing would be subject to all the events that affect network speed around the world. In the end, you'd probably find that for some only modestly gigantic problems that they use for benchmarking supercomputers, you might be better off not using the whole grid, but only a few thousand of your best clients (people sitting on high speed networks using very powerful, possibly multiprocessor machines).
That being said, as has been said in every discussion about supercomputing and distributed computing, the set of problems for which traditional supercomputers are designed is typically very different from distributed computing problems. Perhaps some altogether different measure of computing capacity would be more appropriate for distributed computing.
You can watch the load on the National Center for Supercomputing Applications online system monitor.
I can't tell you what all the little widgets mean though. All I know is that while they may be informative, they can't beat the cool blinky lights of the old Connection Machine
After years and years in postgraduate academia, reviewing papers and publishing papers, I've never once heard of this "common practice". Many journals are at least nominally blinded during review, and a huge chunk of journal first authors are written by people with BS and MS degrees (ie grad students). Additionally, there are also many journals that don't affix degrees with author names and list the degree field as "optional" when submitting. Thus, why would someone feel the need to lie like that?
Sure, everything can be understood without school. But not everything can be done without school.
/. But you also need proof that you're able to do what you say. Part of that proof is, in fact, your degree. Part of it is demonstration of previous work. How do you get said previous work? By having "apprenticed" (read" PhD, postdoc, or previous grant ... in the case of the last, return to the top of the paragraph and repeat). How did your mentor have money to do this? Same deal ... good idea, and proof of ability to deliver.
To be more specific, there are things that require a formal institution to be possible. Why would the government give someone a $5 million to study cancer, for example. Well, you write a grant with a good idea. Okay, lots of people with little education have good ideas. Looked at all those annoying futurists we have postings about on
In other words, there is a lot of science that isn't cheap. And it isn't small. You can't (or at least shouldn't) do high energy physics in your basement. You can't do experimental astrophysics by buying stuff of the web. You can't even do "smaller" science, like microbiology without the right facilities, and that stuff costs lots of money. And for someone to give you that money, they want to be darn sure that you know what you're doing. And I'm sorry to say that in the real world, that requires education.
So yes, you can learn anything you want however you want. But there is a lot you will not be able to do without a degree. Fair or not, it's just how it is.
Steve Jobs (and Bill Gates) may be wildly successful people, and perhaps to some degree the fact that they were "unhindered" by a conventional college degree may contribute to this.
... and reading through the rest of his speech, I thought it was very very insightful. But, I think there is a tremendous arrogance in the people who are saying this, as it ignores the contributions of everyone else who's hard work and education contribute to making your job so darn easy.
...
On the other hand, I'd like to know how many thousand or million-PhD-hours were spent on developing the technology that goes into every Apple computer (Intel chip, MS product, etc). Yes, I know, "Apple did not invent x, y, or z, blah blah blah". But at some point, someone who went to university and graduate school and a postdoc, etc had to make the major breakthroughs in semiconductors, display technology, storage technology, networking technology, computer theory, etc that makes any of this possible.
I don't believe Steve Jobs was implying that people should not go to universities or that he has no use for people with conventional educations
As a disclaimer, I went to a "name-brand" type of university, and grad school, and did a postdoc (not in CS, though). I learned a tremendous amount in all my work there, even in the literature and history classes that are completely irrelevant to my day-to-day life. That being said, I suppose my imagination is so completely squashed by my conventional education that you shouldn't be surprised that I have a pro-university bias
On the plus side of Win2K, it would only be fair to note the millions of MS Word (yes, you may look down your noses at them, but believe it or not, most people do not use StarOffice or vi+TeX to write their documents) documents that have been created with people using Win2K. And the millions of Excel spreadsheets, and millions of presentations, etc. Now, I suppose if you define a failure in that it was not perfect, then yes, of course it was a failure. But did it do what Microsoft wanted (make ooodles of money and get MS products everywhere in the business world)? Yes. And did it do what all those people who DIDN'T experience any security problems wanted (office productivity)? Yes.
;)
Win2k was like a 1990 Taurus. They were everywhere, billions of miles were gotten out of them, but she had no airbags. Ponder that, and don't try and look up whether or not the Taurus had airbags, since I didn't
As already mentioned, the biggest problem with FPGAs is the difficulty/time in writing the logic. While that's not necessarily a big problem for a major supercomputing center or a CS research center, it (along with cost) is a problem that prevents FPGAs from being routinely adopted by end-users such as people in the applied research community.
One idea to get around this has been advanced by (among others), Stretch, Inc.. The summary is that their compiler analyze your C-code and decide what can be more effectively rewritten as new instructions for their chip, and sets it up on the fly. You never get the ultra-low level control (or performance) of FPGA programming, but in principle you get more performance than before.
Their primary applications have been basically as programmable replacements for DSPs, but they really want to push workstation applications for their products.
That being said, I neither work for them nor have I ever used any of their products, but it certainly sounds interesting!