Re:The problem behind the problem
on
Biohackathon
·
· Score: 1
Ultimately, the question is whether it is more efficient to teach a computer science student biology or teach programming to a biology student.
That's why my current bioinformatics grant application contains a position each for a biology postdoc and a research programmer, plus myself, a biologist with a decade of computing under my belt. The postdoc will explore data analysis, prototype new applications and remain focussed on the biological questions, the programmer will generalize and componentize the prototyped applications and write new ones from scratch, plus s/he'll make sure that we store and treat the data correctly. Myself, I'll bridge the gap between bench scientists and our team, try and keep our sight on the forest and not on the trees and align our efforts with other similar teams.
Efforts such as these require multidisciplinary teams. There's simply no way that an individual can cover all aspects adequately, even if we try hard. We need to make sure, however, that all team members are on the same plane, understand what is going on and are working toward the same goal.
Any program that you intend to run for more than a day or two you should checkpoint its intermediate results to disk, even if this adds 100% to the run time.
Amen! And it probably won't add 100% to the runtime, more like a few %. Plus, even apps that run for a few hours at a time could benefit.
In the long run you'll save yourself so much lost compute time that you'll be glad you did it.
Yeah, maybe it easier said than done, but if you CAN do it at all it is going to be faster, more reliable and more elegant than OS-supplied checkpoint/restart.
Given source code, you most likely know exactly what your app needs to save at any given moment to restart. The OS however knows nothing and needs to save EVERYTHING remotely raletd to your app.
I have experience with checkpointing under SGI IRIX, and while it is nice to have, it sucks bigtime compared to those apps where builtin checkpoint/restart is available.
Size does matter, yes, but the researchers involved covered the fault tolerance issue first before going ultra-small. See the Teramac project for details.
In a nutshell, the guys at HP labs have worked with a system made up of 864 faulty chips and found ways to detect and route around defects, even while the system operates and performs actual work. They anticipated back then that techniques such as theirs would be crucial for the operation of molecular computers.
The new work with actual molecular computers, especially in their first generation, drives home the point, because no two chips can be made identical. That's in the nature of the manufactoring process. Just Brownian motion will make precise and repeatable placement of gates and wires very hard, not to mention difficulties with steering the actual chemistry of the manufactoring process. What's needed is a change in how we think about chips. They now resemble biological systems, where everything is imperfect, but manages to function most of the time.
This project seems to be a follow on to the original Teramac project, in which they linked 864 faulty processors together to form a functional and powerful computer. See here.
The real breakthrough then was coping with the defects of the processors and making the whole thing function reliably. It can even detect new faults and route around them (literally). The authors of the paper, chief among them Phil Kuekes, stated back then that this was fundamental technology for eventual molecular computers, which by their very nature would be made of faulty parts.
Now the molecular chips are 'real', and as anticipated, no two of these nanochips are the same. We'll have to rethink our assumptions about machines, QA and such, and take a clue from biology where everything is less than perfect, but can funtion perfectly nonetheless.
that according to their own article CSFB does not admit any wrongdoing in their letters of acceptance to the SEC and NASD (as is usual in such settlements). Further down in their own article, however, they state that they have fired, fined, suspended, redeployed or otherwise disciplined employees involved in this IPO thingy. If that is not an admission of guilt, then what is???
Corporations have such wierd ways of doing things...
Gigabit ethernet will make a significant difference in at least one important area even if it makes no difference in the performance of your other applications: backup & restore.
Granted. But frankly, how much backup do you need for cluster nodes where the local disk serves up OS and scratch space only? (Assuming you're running your cluster the way most supercomputer installations do.) I can't see the cost justification for a fast backup pipe here.
And of course, if your job is one of those really embarassingly parallel ones (say, rendering frames) you don't even need a cluster, but simply use desktops overnight or in the background. Saves even more dollars.
GigE won't help much because you're still stuck with ethernet's awful latency. Last time we shopped for supercomputers, cluster solutions lost out because of this, even with pricey Myrinet, Via or other high-end interconnects.
You're mostly off the mark, I'm afraid. Most software that uses a cluster runs through MPI or simply through scripts. Both mechanisms allow for easy adjustment in the number of nodes/CPUs you use.
Many large compute problems are embarassingly parallel, i.e. the same calculation needs to be repeated with slightly different input parameters. There's basically no interprocess communication, just a little forethought about filenaming conventions, total disk and memory usage, etc.
Execution of such tasks reduces essentially to a simple loop:
foreach parameter_set
rsh nodeN myprog outfileM
end
For those programs that actually run a single instance of the code on several CPUs, you have to be acutely aware of how many nodes you use. Your code has its own limits on how well it scales to multiple CPUs, and your cluster imposes limits on how well (in terms of latency and bandwidth) nodes can communicate. Very few codes in this world scale well beyond 64 CPUs, especially not on run-of-the-mill clusters with plain ethernet interconnects. Fortunately, it is trivial to readjust the number of nodes used for each invocation of the code.
Lastly, virtual nodes cannot easily simulate the behavior of real nodes. Again, it's the interconnect latency and bandwidth. When it comes to supercomputing, only trust what you have run and measured on a real life setup with your own code and input data.
You can overclock all you want, but to have an all around fast system you need the appropriate data channels to feed data to this smoking hot CPU. Although bus standards and real, available PC motherboards have gotten a lot better in the past few years, a PC still tends to slow down terribly when given a huge data load to crunch on.
Personally, I still prefer purpose-built well balanced Unix workstations, despite their higher price tag. But then, I am a scientist and not a gamer.
Having worked in a research lab, I can tell you that who paid for what is often very, very unclear.
If anything, the above is an understatement. The petition, as it stands right now, will open a Pandora's box of problems that won't ever be resolved to everyone's satisfaction.
Not that I am against the petition, I have signed it myself, actually. But it will be extremely hard, or even impossible to apply it as is. What I see as the petition's real advantage is that it helps stem the tide of publicly funded software being licensed to private companies and then sold back to publicly funded researchers for top dollar. Commercial scientific code is mostly not in the sub-$1000 league, and thus, the scientific community looses twice in the privatization process!
Yeah, but the Phillips spokesperson also said that they won't go to court over this, because their patents on CD-DA are running out this year and next. Hence a courtbattle wouldn't make sense.
I have to agree with them from a timeline perspective, but this immediately weakens their stance and is therefore not optimal/perfect.
I've always been very intrigued by the various partitioning options which you can get from commercial Unixes. Personally, I think Solaris is lightyears ahead of the rest, but any of the available solutions look intersting.
Partitioning, especially the dynamic variety, lets you take maximum advantage of a large multiprocessor machine. Can you say, 'OS upgrade without downtime'? From testing to gradual rollout, to full deployment, and if needed roll back, all without having to bring the machine down. Really cool!
I realize that atlas only envisages static partitioning for now. But can dynamic partitioning be far behind?
Consider this: Asperger's syndrome was described by a German doctor, observed on European patients. Psychologists in the US had a hard time getting used to Asperger's idea. That sounds like there are regional differences in wanting to accept new ideas and/or facts. Not higher prevalence of some syndrome.
What the US does have is a unsatiable need to put labels on anything, and an amazing tendency towards collective paranoia. These are societal trends, and explain why there may be more diagnosed cases of ADD and autism in the US than elsewhere. Note that it's diagnosed cases, not absolute numbers. We basically have no idea how many people suffer from ADD or autism in any given society or region, because noone has vene undertaken such a great systematic study. Nor should they.
And it's not just the parents who work too long and too hard. We know a family with a stay-home mom and pastor dad who's kid's unchecked bad behavior is getting out of hand. The mother is simply unwilling to really deal with the kid (and even admits to this), and was relieved when the behavior was finally labelled as ADH by some willing psycho-flack. Now there's a condition and she can resist taking responsibility even more.
I am not saying that the kid is perfectly normal, but taking care of him, including firmly and kindly guiding him towards acceptable forms of behavior, would have alleviated half the trouble. It just takes time, resolve and strength.
Yeah, and don't discount food, either. I have heard from a school psychologist friend that cutting down phosphates and phosphites (less cured meat) helps hyperactive kids a lot. Likewise, there is a ton of other junk in our food that affects our moods and behavior.
I think you hit the nail on the head. A far as ADD, ADHD and other labels go, I have been of your opinion for quite some time. And yes, it exonerates parents, teachers and other from shouldering the responsibility to deal with the peculiarity of a certain child's behavior and find a SOLUTION. That's the worst of it.
When it comes to autism, I feel that there is a continuum of severity. As stated elsewhere is this discussion, Asperger's has long been neglected in English-speaking psychology circles and now that it is being considered, all of a sudden more cases are being diagnosed, some of them far less severe than 'traditional' autism. The world is not black and white, you don't either have it or not. There are many shades of grey. Maybe there aren't even any truly normal people?
I consider myself slightly autistic, and a therapist even once mentioned as much, but it does not prevent me from functioning well in this world. I have a beautiful wife, kids, work in groups, take on leadership, etc.
In the end it is what you DO ABOUT IT, how you cope with your own deficiencies, and how hard you work to make up for it.
Likewise, I think most people who are labelled with some diagnosis, be it currently in fashion or not, can overcome much of it, if they and their families CHOOSE to deal with it.
You are wrong. Even if flash comes bundled with browser X and OS Y, it still is a separate piece of software that must be loaded whenever a flash animation is encountered. And, as someone else mentioned, flash files are large. Both not good.
Why designers make my browser load a plugin AND download a large file if there isn't much, if any gain in information content on the page is beyond me.
I have visited numerous websites of bands that are flash-only. Nothing to look at for those who don't have flash or choose not to enable it. Loosers all! I have never even bothered to stick around, despite the fact that we have a big, fat pipe to the internet backbone.
that this was taken up and followed through in court. I hope it goes all the way to the Supreme Court, and that all judges will be similarly enlightened. We need clear and hopefully pro-consumer verdicts to clarify all this licensing muddle. When did you last fully read AND understand all license terms by which you are supposedly bound (other than perhaps some of the OS licenses)?
It's not like they started to work on this thing AFTER the economic downturn started. We were briefed on it under NDA almost two years ago, and it didn't sound like they were just starting with it. Projects like that take a bit of time. Too much time in fact for us - we ended up bying SGI gear which was ready and proven at the time.
Probably not. In science there's sometimes money that needs to be spent by a certain deadline. More often than not the principal investigator will then suddenly remember that proposal by some group member to spend big bucks on some obscure (for him/her anyway) technical thing and get on the bandwagon. At that point things need to move fast...
Been there, done that. Planning has not much to do with it, but getting a busy and distracted person's ear does.
Remember: Computing per se is not the primary focus of most compute power consumers.
Also, on paper, a white background with black text is more very much accepted as being more readable than the other way around.
That, of course, is a consequence of the two facts that 1) light is reflected from white paper, and 2) the text is printed with black ink. Reflected light is much easier on the eyes than emitted light, such as from a CRT. And printing black characters precisely is far easier than leaving out white ones. Ever looked at a mostly black page in a cheap mag? Check those fuzzy outlines along the black borders, and note the (mis)alignment of cyan, magenta and yellow ink passes...
A CRT is basically a fancy light source. You really don't want to stare at a light source for an extended period, however low its energy might be. Hence the recommendations to use dark backgrounds with bright text.
I can personally attest that white and yellow on dark blue works great. I was introduced to this by SGI's factory defaults for their terminal windows and have stuck with it ever since (8 years). Thanks SGI!
In the offline world would you accept a low-cost, flashy looking safe from some no name company claiming it was absolutley safe and fireproof? And when you want to see some data on what sort of materials went into the construction of the box, they refuse to divulge anything. Would you still take it and use it? Thought so.
It is so easy to click and accept whatever looks good and flashy and costs little (if anything), simply because one does not have to look any further. But then you get what you paid for. Switch off your brain, spend zero thought and all the sleazebags will gladly come and start to suck time and energy from you.
Don't run Windows and you're completely safe from this.
There's something to be said about not running an OS with 90% market share. Many of the issues that plague most people just go away. If the marketers have to cater to more than half a dozen differnet OSes and platforms, it gets much more complex and expensive. That will put some breaks on such abusive tactics.
Ultimately, the question is whether it is more efficient to teach a computer science student biology or teach programming to a biology student.
That's why my current bioinformatics grant application contains a position each for a biology postdoc and a research programmer, plus myself, a biologist with a decade of computing under my belt. The postdoc will explore data analysis, prototype new applications and remain focussed on the biological questions, the programmer will generalize and componentize the prototyped applications and write new ones from scratch, plus s/he'll make sure that we store and treat the data correctly. Myself, I'll bridge the gap between bench scientists and our team, try and keep our sight on the forest and not on the trees and align our efforts with other similar teams.
Efforts such as these require multidisciplinary teams. There's simply no way that an individual can cover all aspects adequately, even if we try hard. We need to make sure, however, that all team members are on the same plane, understand what is going on and are working toward the same goal.
Any program that you intend to run for more than a day or two you should checkpoint its intermediate results to disk, even if this adds 100% to the run time.
Amen! And it probably won't add 100% to the runtime, more like a few %. Plus, even apps that run for a few hours at a time could benefit.
In the long run you'll save yourself so much lost compute time that you'll be glad you did it.
Yeah, maybe it easier said than done, but if you CAN do it at all it is going to be faster, more reliable and more elegant than OS-supplied checkpoint/restart.
Given source code, you most likely know exactly what your app needs to save at any given moment to restart. The OS however knows nothing and needs to save EVERYTHING remotely raletd to your app.
I have experience with checkpointing under SGI IRIX, and while it is nice to have, it sucks bigtime compared to those apps where builtin checkpoint/restart is available.
Size does matter, yes, but the researchers involved covered the fault tolerance issue first before going ultra-small. See the Teramac project for details.
In a nutshell, the guys at HP labs have worked with a system made up of 864 faulty chips and found ways to detect and route around defects, even while the system operates and performs actual work. They anticipated back then that techniques such as theirs would be crucial for the operation of molecular computers.
The new work with actual molecular computers, especially in their first generation, drives home the point, because no two chips can be made identical. That's in the nature of the manufactoring process. Just Brownian motion will make precise and repeatable placement of gates and wires very hard, not to mention difficulties with steering the actual chemistry of the manufactoring process. What's needed is a change in how we think about chips. They now resemble biological systems, where everything is imperfect, but manages to function most of the time.
This project seems to be a follow on to the original Teramac project, in which they linked 864 faulty processors together to form a functional and powerful computer. See here.
The real breakthrough then was coping with the defects of the processors and making the whole thing function reliably. It can even detect new faults and route around them (literally). The authors of the paper, chief among them Phil Kuekes, stated back then that this was fundamental technology for eventual molecular computers, which by their very nature would be made of faulty parts.
Now the molecular chips are 'real', and as anticipated, no two of these nanochips are the same. We'll have to rethink our assumptions about machines, QA and such, and take a clue from biology where everything is less than perfect, but can funtion perfectly nonetheless.
that according to their own article CSFB does not admit any wrongdoing in their letters of acceptance to the SEC and NASD (as is usual in such settlements). Further down in their own article, however, they state that they have fired, fined, suspended, redeployed or otherwise disciplined employees involved in this IPO thingy. If that is not an admission of guilt, then what is???
Corporations have such wierd ways of doing things...
Gigabit ethernet will make a significant difference in at least one important area even if it makes no difference in the performance of your other applications: backup & restore.
Granted. But frankly, how much backup do you need for cluster nodes where the local disk serves up OS and scratch space only? (Assuming you're running your cluster the way most supercomputer installations do.) I can't see the cost justification for a fast backup pipe here.
And of course, if your job is one of those really embarassingly parallel ones (say, rendering frames) you don't even need a cluster, but simply use desktops overnight or in the background. Saves even more dollars.
GigE won't help much because you're still stuck with ethernet's awful latency. Last time we shopped for supercomputers, cluster solutions lost out because of this, even with pricey Myrinet, Via or other high-end interconnects.
You're mostly off the mark, I'm afraid. Most software that uses a cluster runs through MPI or simply through scripts. Both mechanisms allow for easy adjustment in the number of nodes/CPUs you use.
Many large compute problems are embarassingly parallel, i.e. the same calculation needs to be repeated with slightly different input parameters. There's basically no interprocess communication, just a little forethought about filenaming conventions, total disk and memory usage, etc.
Execution of such tasks reduces essentially to a simple loop:
foreach parameter_set
rsh nodeN myprog outfileM
end
For those programs that actually run a single instance of the code on several CPUs, you have to be acutely aware of how many nodes you use. Your code has its own limits on how well it scales to multiple CPUs, and your cluster imposes limits on how well (in terms of latency and bandwidth) nodes can communicate. Very few codes in this world scale well beyond 64 CPUs, especially not on run-of-the-mill clusters with plain ethernet interconnects. Fortunately, it is trivial to readjust the number of nodes used for each invocation of the code.
Lastly, virtual nodes cannot easily simulate the behavior of real nodes. Again, it's the interconnect latency and bandwidth. When it comes to supercomputing, only trust what you have run and measured on a real life setup with your own code and input data.
You can overclock all you want, but to have an all around fast system you need the appropriate data channels to feed data to this smoking hot CPU. Although bus standards and real, available PC motherboards have gotten a lot better in the past few years, a PC still tends to slow down terribly when given a huge data load to crunch on.
Personally, I still prefer purpose-built well balanced Unix workstations, despite their higher price tag. But then, I am a scientist and not a gamer.
Having worked in a research lab, I can tell you that who paid for what is often very, very unclear.
If anything, the above is an understatement. The petition, as it stands right now, will open a Pandora's box of problems that won't ever be resolved to everyone's satisfaction.
Not that I am against the petition, I have signed it myself, actually. But it will be extremely hard, or even impossible to apply it as is. What I see as the petition's real advantage is that it helps stem the tide of publicly funded software being licensed to private companies and then sold back to publicly funded researchers for top dollar. Commercial scientific code is mostly not in the sub-$1000 league, and thus, the scientific community looses twice in the privatization process!
Yeah, but the Phillips spokesperson also said that they won't go to court over this, because their patents on CD-DA are running out this year and next. Hence a courtbattle wouldn't make sense.
I have to agree with them from a timeline perspective, but this immediately weakens their stance and is therefore not optimal/perfect.
I've always been very intrigued by the various partitioning options which you can get from commercial Unixes. Personally, I think Solaris is lightyears ahead of the rest, but any of the available solutions look intersting.
Partitioning, especially the dynamic variety, lets you take maximum advantage of a large multiprocessor machine. Can you say, 'OS upgrade without downtime'? From testing to gradual rollout, to full deployment, and if needed roll back, all without having to bring the machine down. Really cool!
I realize that atlas only envisages static partitioning for now. But can dynamic partitioning be far behind?
You're full of it.
Consider this: Asperger's syndrome was described by a German doctor, observed on European patients. Psychologists in the US had a hard time getting used to Asperger's idea. That sounds like there are regional differences in wanting to accept new ideas and/or facts. Not higher prevalence of some syndrome.
What the US does have is a unsatiable need to put labels on anything, and an amazing tendency towards collective paranoia. These are societal trends, and explain why there may be more diagnosed cases of ADD and autism in the US than elsewhere. Note that it's diagnosed cases, not absolute numbers. We basically have no idea how many people suffer from ADD or autism in any given society or region, because noone has vene undertaken such a great systematic study. Nor should they.
And it's not just the parents who work too long and too hard. We know a family with a stay-home mom and pastor dad who's kid's unchecked bad behavior is getting out of hand. The mother is simply unwilling to really deal with the kid (and even admits to this), and was relieved when the behavior was finally labelled as ADH by some willing psycho-flack. Now there's a condition and she can resist taking responsibility even more.
I am not saying that the kid is perfectly normal, but taking care of him, including firmly and kindly guiding him towards acceptable forms of behavior, would have alleviated half the trouble. It just takes time, resolve and strength.
Yeah, and don't discount food, either. I have heard from a school psychologist friend that cutting down phosphates and phosphites (less cured meat) helps hyperactive kids a lot. Likewise, there is a ton of other junk in our food that affects our moods and behavior.
I think you hit the nail on the head. A far as ADD, ADHD and other labels go, I have been of your opinion for quite some time. And yes, it exonerates parents, teachers and other from shouldering the responsibility to deal with the peculiarity of a certain child's behavior and find a SOLUTION. That's the worst of it.
When it comes to autism, I feel that there is a continuum of severity. As stated elsewhere is this discussion, Asperger's has long been neglected in English-speaking psychology circles and now that it is being considered, all of a sudden more cases are being diagnosed, some of them far less severe than 'traditional' autism. The world is not black and white, you don't either have it or not. There are many shades of grey. Maybe there aren't even any truly normal people?
I consider myself slightly autistic, and a therapist even once mentioned as much, but it does not prevent me from functioning well in this world. I have a beautiful wife, kids, work in groups, take on leadership, etc.
In the end it is what you DO ABOUT IT, how you cope with your own deficiencies, and how hard you work to make up for it.
Likewise, I think most people who are labelled with some diagnosis, be it currently in fashion or not, can overcome much of it, if they and their families CHOOSE to deal with it.
You are wrong. Even if flash comes bundled with browser X and OS Y, it still is a separate piece of software that must be loaded whenever a flash animation is encountered. And, as someone else mentioned, flash files are large. Both not good.
Why designers make my browser load a plugin AND download a large file if there isn't much, if any gain in information content on the page is beyond me.
I have visited numerous websites of bands that are flash-only. Nothing to look at for those who don't have flash or choose not to enable it. Loosers all! I have never even bothered to stick around, despite the fact that we have a big, fat pipe to the internet backbone.
that this was taken up and followed through in court. I hope it goes all the way to the Supreme Court, and that all judges will be similarly enlightened. We need clear and hopefully pro-consumer verdicts to clarify all this licensing muddle. When did you last fully read AND understand all license terms by which you are supposedly bound (other than perhaps some of the OS licenses)?
Christoph
It's not like they started to work on this thing AFTER the economic downturn started. We were briefed on it under NDA almost two years ago, and it didn't sound like they were just starting with it. Projects like that take a bit of time. Too much time in fact for us - we ended up bying SGI gear which was ready and proven at the time.
Probably not. In science there's sometimes money that needs to be spent by a certain deadline. More often than not the principal investigator will then suddenly remember that proposal by some group member to spend big bucks on some obscure (for him/her anyway) technical thing and get on the bandwagon. At that point things need to move fast...
Been there, done that. Planning has not much to do with it, but getting a busy and distracted person's ear does.
Remember: Computing per se is not the primary focus of most compute power consumers.
Also, on paper, a white background with black text is more very much accepted as being more readable than the other way around.
That, of course, is a consequence of the two facts that 1) light is reflected from white paper, and 2) the text is printed with black ink. Reflected light is much easier on the eyes than emitted light, such as from a CRT. And printing black characters precisely is far easier than leaving out white ones. Ever looked at a mostly black page in a cheap mag? Check those fuzzy outlines along the black borders, and note the (mis)alignment of cyan, magenta and yellow ink passes...
A CRT is basically a fancy light source. You really don't want to stare at a light source for an extended period, however low its energy might be. Hence the recommendations to use dark backgrounds with bright text.
I can personally attest that white and yellow on dark blue works great. I was introduced to this by SGI's factory defaults for their terminal windows and have stuck with it ever since (8 years). Thanks SGI!
Users should apply some common sense!
In the offline world would you accept a low-cost, flashy looking safe from some no name company claiming it was absolutley safe and fireproof? And when you want to see some data on what sort of materials went into the construction of the box, they refuse to divulge anything. Would you still take it and use it? Thought so.
It is so easy to click and accept whatever looks good and flashy and costs little (if anything), simply because one does not have to look any further. But then you get what you paid for. Switch off your brain, spend zero thought and all the sleazebags will gladly come and start to suck time and energy from you.
Don't run Windows and you're completely safe from this.
There's something to be said about not running an OS with 90% market share. Many of the issues that plague most people just go away. If the marketers have to cater to more than half a dozen differnet OSes and platforms, it gets much more complex and expensive. That will put some breaks on such abusive tactics.