Often times it doesn't matter how much you profile your Java application. The bottleneck will often be hidden away, deep in some class library that either you can't modify (due to licensing restrictions or unavailable source code), or that you don't want to modify. Other times, the bottleneck will be in the inherently slow execution model of Java. Profile all you want, and change your algorithms all you want. Many times the only solution is to move away from Java in order to get the performance or scalability boost you may need.
Last week I was profiling my code that connected to a database and retrieved data over and over. Originally, I'd bumped up the memory size on my VM to let the code run, because it was analysis code and didn't need to be super efficient until we moved it over to where the public could access it.
The memory leak turned out to be in Oracle's jdbc driver. The TTCItem class was not being garbage collected, and a HUGE amount of memory (think everything sent over the wire from the database) was collecting. So, as you said, I had a bottle neck hidden away in a class library, and there was nothing I could do about it.
Then I downloaded the latest oracle ojcbc14.jar, where I'd read on some forums that they'd patched the memory leak, and the problem went away. The problem was solved, the VM was running at 6M memory, and I was happy.
If I hadn't had a profiling tool, I NEVER would have found that leak and fixed it. Who knew some versions of the oracle driver have a memory leak in their prepared statement handing? The solution isn't to walk away from Java, but to learn how to use the appropriate tools and programming techniques.
Of course, there's no way to determine how many lives were actually saved by the presence of guns in the homes. Either a potential robber is an acquaintance of the home and doesn't want to rob where there are guns, or there's a posted "I have a gun" sign so a stranger is deterred, or there's just the general fact that criminals know that home invasion in the U.S. is like Russian roulette.
If this is true, then the answer is to post "I have a gun" signs up as a deterrant, and not have a gun. You deter people who would be deterred by the presence of a gun, and the people who would rob you regardless will rob you in either case.
PS: Anyone involved with statistics loves the phrase "If you do the math". Generally, we do, and we're aware of things like correlations and missing data. You look at 200,000,000 guns in the US. You might want to compare the number of people who have access to guns to the number of people that have cars, unless you think that owning 10 guns makes you 10 times as likely to be involved in a gun accident.
Cooking statistics is fun and easy, but it's not very convincing.
Hundreds of millions of people will be affected by the rising sea levels. On top of that, precipitation patterns are changing, which will affect people inland, as well as those on the coasts. This doesn't even go into the problems caused by the current conveyer belt.
If you blocked the light of the sun, but had a way to filter it by frequencies, then perhaps you could allow in light on the wavelengths that plants like, but block out other wavelengths. After all, the wavelengths for photosynthesis happen at ~400-500nm then at 600-700nm. You can block out 500-600, plus 700. You might even be able to tighten that up a bit.
It's not hard (if a little expensive) to get a lense that gets good shots at 500+ metres. How do you deal with that?
2 things:
1:Block line of sight. If you have to be at least X distance to see your target directly, then you've blocked those far-away people. Try getting LOS in a city without a good position high up, and a lot of pre-planning.
2:Look for the guy with the tripod. You're going to be using one for a lense that shoots 400+ meters.
NPR (or quirks and quarks, or nature podcast, I forget) covered this.
This material is only stable at a 1/2 million PSI (or atmospheres, I forget which.)
They are trying to combine it with silica to form a stable compound that is harder than regular glass, but that's a long way off, and I'm not sure how hard the resulting material will be - you'd assume that it would be harder than glass, but softer than this new substance.
More important that all of that, of course, is the fact that while the arctic ice pack sits on water, the antarctic one sits largely on land... and that Greenland also supports a significant ice pack. Since these are supported by the land (not buoyant force), when they melt, they would significantly raise the waterlevel globally.
In other words: That's only true if all the ice was in the water.
ikewise, you can teach someone how to type things into Mathematica and they'll shortly realize that the answers it gives are usually correct -- but in both cases you could easily spend a semester explaining how the machine gets the answers for them
And then, even though the 'answer is correct' as you say, it's still utter nonsense. Just because you plug numbers into a formula and get an answer that's mathmatically correct doesn't mean you applied the correct test.
For example, apply a chi square test to a very small sample size. Sure, you get an answer, but it's not valid, because you made assumptions about the underlying distribution that you shouldn't have. What you really wanted was fisher's exact test. But, not understanding anything, you'd have no idea and blissfully go on with your calculated answer.
This is one of the many reasons why I'm really nervous about 'automated learning' with raw data sets. Machine learning requires the data be clean before you push it in, and normalized. When you look at lots of different sets of data (even with the same basic measurements taken), each is valid within their context (a single experiment), but you can't just randomly smash them together. You have to carefully make sure the data is comparable, and that might require normalization. You need all sorts of meta data (like units) to do this correctly. There's also all sorts of processes that are run on data (they may be raw, or normalized, etc) that you need to take into account.
I'm working on this with a group, and while I'm hopeful that we can build ways for users to do this sort of work in a less painful way, it's difficult to eliminate all the pain.
One of my current projects at the Broad institute is working on a similar problem.
Our goal is to link and work with many kinds of biological data: Association studies Linkage data Expression data Small molecule interactions Model organism data etc
I've created a way to 'navigate' between various types of data (ie: a SNP in an association study links to a set of genes that link to model organism homologs which link to their expression probe tests.) After that, users store REAL experimental data, and the system unifies data sets (This worm gene is the same as this human expression data). The goal is to find supporting evidence for any particular starting point you are at (I have a fat worm because of this gene, what data in humans supports this hypothesis), or hypothesis generation (I have 5 interesting experiments, what do they tell me about fat regulation.)
What this project does *not* do is work with the actual experimental data, just the stated hypothesis and conclusion. So, if that paper was in error (and ~50% of papers are, according to a recent study that we'll pretend is not in error), the hypothesis is unreliable. But, if you have the data to work with, then you can perform your own analysis and meta-analysis of the work.
I suppose there's a trade off between the two technologies, but then I don't expect to draw a lot of conclusions about genetics from high energy physics.
As an aside, a co-worker on the project is also attempting to model the data in OWL, and using MIT's Haystack project @ http://simile.mit.edu/hayloft/ as a first round GUI.
The problem with some of these approaches is scalability. I'm a bioinformatician, and I've seen a few talks on this sort of technique. One project I saw an example of was a pathway (ie: gene a turns on gene b which regulates gene d) project, and it worked well, for up to 5-6 items in the pathway. After that, because of the way the algorithms scale, you get into serious problems. The guy presenting stated that "at some point, we'll all have teraherz computers on our desktops, and this will not be a big deal."
Wake me up when that happens (or when quantum/Dna computers that are good at solving massively parallel problems are running).
No, you couldn't find out any predisposition to disease.
Two issues:
1) We don't know what most of the markers for disease are yet.
2) There are 20 or so markers used for a fingerprint. There are 10,000,000 markers that vary in the human genome at an appreciable frequency. Any of those could be 'the marker'. Even then, they usually indicate an increased/decreased risk for a disease, such as being "30% more likely to have a heart attak."
In short, I wouldn't worry about that kind of use. They don't point at the right places, and we don't know where the right places are. If we figured out all human disease and could test it in people, we'd probably be looking at thousands of markers.
Note that this summer some of the biggest labs are testing 500,000 markers (and that accounts with some smart data analysis techniques for 95% of the human genome) in thousands of individuals for some common diseases. The number of mutations that we're going to know about should (hopefully) take a huge jump in the next 6-12 months.
I agree that this data isn't something I want gathered (because trolling for criminals will be too easy). However, as a minor nitpick: you don't resequence the human genome for each individual. You test a relatively small number of single nucleotide polymorhpisms (SNPs) or microsattelite markers. The amount of markers needed is very small to establish uniqueness, and the cost is pretty low per person (it'll cost more to extract the blood and purify the dna than to run the genotyping.) Financially and technically this is very doable, but I don't think it SHOULD be done.
1) Do we eat chicken eggs? Their resolution of the argument seems based on the fact that the first genetic chicken was assembled as an egg before growing into a pecking, clucking creature capable of reproduction. But aren't the eggs that we eat unfertilized and unable to grow into chickens? If their definition of "chicken egg" is that which can grow into a chicken, then we apparently eat omelet eggs, cake eggs, and key lime pie eggs.
We eat chicken eggs, as well as duck eggs, etc. I'm not sure how you decided that an egg's fertilization status changes it's makeup. Regardless of it's potential, it's still an egg from a chicken. If it makes you feel better, people also eat fertilized chicken eggs.
2) What was the first entity in the adult/egg cycle? Before the first chicken egg, there were ever-so-chickenlike adults with mutated strands of DNA in their unfertilized egg or sperm. It's hard to say that their offspring was 100% chicken while they were 0% chicken. So chickeness gradually evolved from the first entity capable of adult/egg reproduction, and that entity was certainly not very chickenlike at all. But it did start the cycle rolling. Since the creatures before this entity did not lay eggs, I posit that the egg-laying gene mutated within an adult creature. Therefore the chicken, metaphorically, came first.
Somatic mutations are not passed on to offspring. The mutation would have to be in the germ line (passed on to the next generation) to be heritable. Lemarkian genetics didn't work out, contrary to your education regarding genetics.
In all seriousness, I suggest you hop off the semantic soap box, as that's all you've got going in your post.
Let's see how long Skilling and Lay go to jail for. If they get 15+ year sentences, it gives me just a little bit of hope - they spent $38M on their defense, and were still found guilty.
Actually, I base my 'extra weight' based on the fact that I've seen larger presentations and heard more discussion on the topic than what was in the Nature paper. I've heard the arguements go back and forth between experts on exactly how this data was collected and how to interpret it.
It's not that I'm friends with the author. I happen to have a much greater exposure to the study as it was in progress than anyone outside of the institute who attended all the talks.
Where I work, there's almost no management. There are leaders/visionaries, but nobody has to motivate us to work. We're what you'd call 'self motivated'. It doesn't hurt that we do what we do for love (science) instead of money, so we're more likely to do our jobs because we gain satisfaction from it.
We're also in a very friendly enviornment, but you gain respect by doing good work. Everyone here wants to earn the respect of their peers.
Maybe you should run your organization more like ours, and you wouldn't need as many babysitters.
If this is true, then the answer is to post "I have a gun" signs up as a deterrant, and not have a gun. You deter people who would be deterred by the presence of a gun, and the people who would rob you regardless will rob you in either case.
PS: Anyone involved with statistics loves the phrase "If you do the math". Generally, we do, and we're aware of things like correlations and missing data. You look at 200,000,000 guns in the US. You might want to compare the number of people who have access to guns to the number of people that have cars, unless you think that owning 10 guns makes you 10 times as likely to be involved in a gun accident.
Cooking statistics is fun and easy, but it's not very convincing.
Holy huge set of logical fallacies batman!
Yours is a delicious troll, chock full o' mod points.
If only we could do that.
t ic.htm
Hundreds of millions of people will be affected by the rising sea levels. On top of that, precipitation patterns are changing, which will affect people inland, as well as those on the coasts. This doesn't even go into the problems caused by the current conveyer belt.
http://science.nasa.gov/headlines/y2004/05mar_arc
This is just a passing thought:
. Gregory/files/Bio%20101/Bio%20101%20Lectures/Photo synthesis/photosyn.htm
If you blocked the light of the sun, but had a way to filter it by frequencies, then perhaps you could allow in light on the wavelengths that plants like, but block out other wavelengths. After all, the wavelengths for photosynthesis happen at ~400-500nm then at 600-700nm. You can block out 500-600, plus 700. You might even be able to tighten that up a bit.
Details on photosynthesis:
http://faculty.clintoncc.suny.edu/faculty/Michael
2 things:
1:Block line of sight. If you have to be at least X distance to see your target directly, then you've blocked those far-away people. Try getting LOS in a city without a good position high up, and a lot of pre-planning.
2:Look for the guy with the tripod. You're going to be using one for a lense that shoots 400+ meters.
NPR (or quirks and quarks, or nature podcast, I forget) covered this.
This material is only stable at a 1/2 million PSI (or atmospheres, I forget which.)
They are trying to combine it with silica to form a stable compound that is harder than regular glass, but that's a long way off, and I'm not sure how hard the resulting material will be - you'd assume that it would be harder than glass, but softer than this new substance.
More important that all of that, of course, is the fact that while the arctic ice pack sits on water, the antarctic one sits largely on land
In other words: That's only true if all the ice was in the water.
That's only true if all the ice was in the water (to displace it). What about if it's above the water? That ice will contribute to sea levels.
If you need a little experiment to try at home, let me know.
Wait, you didn't propose a number of alternative hypothesis and reject them with data?
I wish I'd gone to your school for my PhD.
(ok, I wish I'd gotten a PhD, but that doesn't stop me from being published in Nature Genetics and Nature.)
Nice. I like your new taxation idea called "Fuck the Poor", and I wish to subscribe to your newsletter.
ikewise, you can teach someone how to type things into Mathematica and they'll shortly realize that the answers it gives are usually correct -- but in both cases you could easily spend a semester explaining how the machine gets the answers for them
And then, even though the 'answer is correct' as you say, it's still utter nonsense. Just because you plug numbers into a formula and get an answer that's mathmatically correct doesn't mean you applied the correct test.
For example, apply a chi square test to a very small sample size. Sure, you get an answer, but it's not valid, because you made assumptions about the underlying distribution that you shouldn't have. What you really wanted was fisher's exact test. But, not understanding anything, you'd have no idea and blissfully go on with your calculated answer.
This is one of the many reasons why I'm really nervous about 'automated learning' with raw data sets. Machine learning requires the data be clean before you push it in, and normalized. When you look at lots of different sets of data (even with the same basic measurements taken), each is valid within their context (a single experiment), but you can't just randomly smash them together. You have to carefully make sure the data is comparable, and that might require normalization. You need all sorts of meta data (like units) to do this correctly. There's also all sorts of processes that are run on data (they may be raw, or normalized, etc) that you need to take into account.
I'm working on this with a group, and while I'm hopeful that we can build ways for users to do this sort of work in a less painful way, it's difficult to eliminate all the pain.
Better, let's see if they can input a large corpus, then do any real reasoning with it.
Heck, take 30 papers of some time, and produce anything we don't already know.
My co-worker is playing with OWL/etc. I'm still skeptical about it, but we'll see...
One of my current projects at the Broad institute is working on a similar problem.
Our goal is to link and work with many kinds of biological data:
Association studies
Linkage data
Expression data
Small molecule interactions
Model organism data
etc
I've created a way to 'navigate' between various types of data (ie: a SNP in an association study links to a set of genes that link to model organism homologs which link to their expression probe tests.) After that, users store REAL experimental data, and the system unifies data sets (This worm gene is the same as this human expression data). The goal is to find supporting evidence for any particular starting point you are at (I have a fat worm because of this gene, what data in humans supports this hypothesis), or hypothesis generation (I have 5 interesting experiments, what do they tell me about fat regulation.)
What this project does *not* do is work with the actual experimental data, just the stated hypothesis and conclusion. So, if that paper was in error (and ~50% of papers are, according to a recent study that we'll pretend is not in error), the hypothesis is unreliable. But, if you have the data to work with, then you can perform your own analysis and meta-analysis of the work.
I suppose there's a trade off between the two technologies, but then I don't expect to draw a lot of conclusions about genetics from high energy physics.
As an aside, a co-worker on the project is also attempting to model the data in OWL, and using MIT's Haystack project @ http://simile.mit.edu/hayloft/ as a first round GUI.
The problem with some of these approaches is scalability. I'm a bioinformatician, and I've seen a few talks on this sort of technique. One project I saw an example of was a pathway (ie: gene a turns on gene b which regulates gene d) project, and it worked well, for up to 5-6 items in the pathway. After that, because of the way the algorithms scale, you get into serious problems. The guy presenting stated that "at some point, we'll all have teraherz computers on our desktops, and this will not be a big deal."
Wake me up when that happens (or when quantum/Dna computers that are good at solving massively parallel problems are running).
I think TES:Oblivion is another good example.
Lots of addons have come out for that, and many seem to be either junk or very small chunks of content that are fairly useless.
No, you couldn't find out any predisposition to disease.
Two issues:
1) We don't know what most of the markers for disease are yet.
2) There are 20 or so markers used for a fingerprint. There are 10,000,000 markers that vary in the human genome at an appreciable frequency. Any of those could be 'the marker'. Even then, they usually indicate an increased/decreased risk for a disease, such as being "30% more likely to have a heart attak."
In short, I wouldn't worry about that kind of use. They don't point at the right places, and we don't know where the right places are. If we figured out all human disease and could test it in people, we'd probably be looking at thousands of markers.
Note that this summer some of the biggest labs are testing 500,000 markers (and that accounts with some smart data analysis techniques for 95% of the human genome) in thousands of individuals for some common diseases. The number of mutations that we're going to know about should (hopefully) take a huge jump in the next 6-12 months.
I agree that this data isn't something I want gathered (because trolling for criminals will be too easy). However, as a minor nitpick: you don't resequence the human genome for each individual. You test a relatively small number of single nucleotide polymorhpisms (SNPs) or microsattelite markers. The amount of markers needed is very small to establish uniqueness, and the cost is pretty low per person (it'll cost more to extract the blood and purify the dna than to run the genotyping.) Financially and technically this is very doable, but I don't think it SHOULD be done.
We eat chicken eggs, as well as duck eggs, etc. I'm not sure how you decided that an egg's fertilization status changes it's makeup. Regardless of it's potential, it's still an egg from a chicken. If it makes you feel better, people also eat fertilized chicken eggs.
Somatic mutations are not passed on to offspring. The mutation would have to be in the germ line (passed on to the next generation) to be heritable. Lemarkian genetics didn't work out, contrary to your education regarding genetics.
In all seriousness, I suggest you hop off the semantic soap box, as that's all you've got going in your post.
I had google desktop installed by default on my new Thinkpad t60p.
It was one of the first things I removed.
You, my friend, deserve a bucket of mod points for that answer. Why post an anonymous coward.
Let's see how long Skilling and Lay go to jail for. If they get 15+ year sentences, it gives me just a little bit of hope - they spent $38M on their defense, and were still found guilty.
Actually, I base my 'extra weight' based on the fact that I've seen larger presentations and heard more discussion on the topic than what was in the Nature paper. I've heard the arguements go back and forth between experts on exactly how this data was collected and how to interpret it.
It's not that I'm friends with the author. I happen to have a much greater exposure to the study as it was in progress than anyone outside of the institute who attended all the talks.
I would assume that Zeno, Achilles, and the turtle got closer and closer to Hell, but never got there.
Where I work, there's almost no management. There are leaders/visionaries, but nobody has to motivate us to work. We're what you'd call 'self motivated'. It doesn't hurt that we do what we do for love (science) instead of money, so we're more likely to do our jobs because we gain satisfaction from it.
We're also in a very friendly enviornment, but you gain respect by doing good work. Everyone here wants to earn the respect of their peers.
Maybe you should run your organization more like ours, and you wouldn't need as many babysitters.