Sorry for the excessive waking-up-slowly attempt at humorous snarkiness. You do make good points. I'm coming from the case of encountering lot of people who know just enough statistics to be dangerous, but not realize a lot of the issues you have to deal with. You wouldn't believe how many times I get "well, why don't you just do this or that" and there's a thousand reasons why not, but none of them can be explained in 10 minutes. So when I object, they think I'm being too nit-picky.
Sorry about the "hahaha".... that was a brainfart. Statistically speaking, they happen.
...you always reach a point where there's this one number that is completely made up...
This is true, but your proposed methods do not eliminate this. Yes, a sensitivity analysis can help. But the only advantage of prior distributions over parameters is that they encourage one to put all their assumptions on the table, whereas frequentist statistical methods (fixing parameters) tend to hide things. Other than that, you will always be subject to your modeling assumptions.
***Try a sensitivity analysis using Monte Carlo techniques. That sounds hard but it isn't.
Yes, it's easy to just "do"; doing it correctly in a way that never gives you false information or gives you accurate confidence bounds can be extremely difficult. Not that it doesn't work a lot of the time, but there are dozens of gotchas that can cause the answer to be complete rubbish and no one would know without a lot of very careful math and analysis. Yes, it can be better than other methods, but only if used properly.
Take the parameter that you have doubts about and give it a distribution (Gaussian or rectangular or something) with the mean at your best guess and the std deviation chosen to be big enough to cover the range it might reasonably vary over.
Um, yes, and those parameters have a big influence on your results... For example, if you center your prior on the parameter you think it is, it is generally NOT true that the mean you get out will be even close to the value when just plugging in that parameter. Most real data in industry and finance is not subject to the natural processes that seem to turn most things Gaussian.
***Use Gaussian if you have an idea of what the parameter probably is but aren't exactly sure, rectangular if you really have no idea. A rectangular distribution says "I have no idea, the parameter could be anywhere within this particular range."
HaHAHAHA. Can I quote that next time I teach? A bounded uniform prior, when done with Monte Carlo, often denotes MUCH stronger assumptions than does a Gaussian or t-distribution. It is basically say, "there is no chance whatsoever the parameter can be out of this range." So, you say, make your rectangle large enough. Well, that only works for undergrad stats courses, not in most of the models I've worked with or dealt with. It also breaks down phenomenally fast in higher dimensions. It may work as a hack, but I would NEVER trust such results unless there is good reason to use a bounded uniform.
****Monte Carlo is great for those of us who don't care to learn the arcane minutiae of stat math. If you have a working model it takes an hour or two to extend it so you get stochastic results. Note that it's no harder to give a distribution to all your input parameters not just one. In which case you will be doing the kind of work that people who make 500 grand a year do.
If you don't go through the math and simply treat it as a black box, you WILL MESS monte carlo up and give false results. We need more people in that sector who really know statistics, which means mastering statistical math (and I'm curious why you think it's arcane), not just think they do and plow blindly through minefields of gotchas you never learn in undergrad stats courses. Yes, MC is a great tool; it may be a step up, but I would never trust it without having good theoretical justification that it works. On the models in the financial industry, this is much more difficult than you might think.
Sounds to me like anyone trying to use cocaine by drinking Red Bull Cola would die of excess water consumption (anything is a poison given a high enough dose) before they'd feel the cocaine at all.
[Citation needed]
I suspect it'd be because they flew off into outer space and suffocated.
I once interned with a high-energy physics research group of about 40 people or so that had a policy that all physics related code had to be written in fortran 77. The reasoning was that it had to be fast, and that everyone could read it as everyone knew it. That was 2003. So yeah, it's not going to die.
OTOH, that was the main reason I left physics and went into computer science -- I kept thinking "There has to be something better out there..." -- and I don't regret that decision.
Given that half the stuff people supposedly don't have to worry about are just things taken over by the kernel, I'm guessing she didn't poll many of them...
As a current Ph.D. student in statistics, I would say YES!!! The errors do not come from the statistics itself (the theory is very rigorously worked out from assumptions) or from deciding to use statistical modeling in a given situation (the problems almost always come when people incorrectly think a process is non-random), but from people applying models incorrectly. In other words, what we need are more people who know the complexity and assumptions behind these, not more statistical monkeys who treat statistical modeling like a tweakable black box spitting out answers.
In my viewpoint, the problems and the current financial process were not surprising at all. We have a saying that "all models are wrong; some models are useful", and the problems have no idea how useful it is and where it could go wrong, given the assumptions they work under.
Yeah, well, there's no reason to be troubled there. Monte Carlo is great, and great in this situation, because it expands the possible models that people can work with, and if done with any intelligence will give reliable answers and error bounds on those answers. Throw Monte Carlo out the window, and you're back to conjugate distributions and low dimensional models that have far more restrictive and unrealistic assumptions.
Good point. The intent of my question is to gauge how well it replies to currently unanswerable questions. If it said: "I don't know, as a proof of the Riemann Hypothesis is necessarily non-constructive", then I would be quite impressed. If it says "I don't know what you mean; please rephrase", I'd be less impressed.
If he spent so much effort gathering lizard byproducts, why didn't he record information about where the sample was found, GPS coords, time of year, time of day, surroundings, etc. It sounds like it was just a unmarked 30 kg bag of poop that the university thought was leftover fertilizer or something. If so, I can understand it. However, a big bag of individually labeled samples in plastic bags, each with detailed information, would scream "research project" even if it the big bag didn't have a sign saying "lizard poop for research project."
http://www.sagemath.org/. Sage rocks. It's python based, brings together many of the useful libraries already mentioned (numpy, scipy, matplotlib, etc.), and has a very active mailing list. Can't recommend it enough.
http://xkcd.com/519/. Says it all.
Seriously, the most influential part of my early education was that my parents only allowed me to play computer games I had programmed myself.
Are you suggesting that enron's problem was that they didn't use vi?
Thanks, my bookshelf fell down, and I was unable to read it myself.
does the knob go up to 11D?
True -- but one could ask "easier to do what?" and then almost everyone would be happier.
I thought the caps lock key made things easier? http://www.bash.org/?835030
Sorry for the excessive waking-up-slowly attempt at humorous snarkiness. You do make good points. I'm coming from the case of encountering lot of people who know just enough statistics to be dangerous, but not realize a lot of the issues you have to deal with. You wouldn't believe how many times I get "well, why don't you just do this or that" and there's a thousand reasons why not, but none of them can be explained in 10 minutes. So when I object, they think I'm being too nit-picky.
Sorry about the "hahaha".... that was a brainfart. Statistically speaking, they happen.
...you always reach a point where there's this one number that is completely made up...
This is true, but your proposed methods do not eliminate this. Yes, a sensitivity analysis can help. But the only advantage of prior distributions over parameters is that they encourage one to put all their assumptions on the table, whereas frequentist statistical methods (fixing parameters) tend to hide things. Other than that, you will always be subject to your modeling assumptions.
***Try a sensitivity analysis using Monte Carlo techniques. That sounds hard but it isn't.
Yes, it's easy to just "do"; doing it correctly in a way that never gives you false information or gives you accurate confidence bounds can be extremely difficult. Not that it doesn't work a lot of the time, but there are dozens of gotchas that can cause the answer to be complete rubbish and no one would know without a lot of very careful math and analysis. Yes, it can be better than other methods, but only if used properly.
Take the parameter that you have doubts about and give it a distribution (Gaussian or rectangular or something) with the mean at your best guess and the std deviation chosen to be big enough to cover the range it might reasonably vary over.
Um, yes, and those parameters have a big influence on your results... For example, if you center your prior on the parameter you think it is, it is generally NOT true that the mean you get out will be even close to the value when just plugging in that parameter. Most real data in industry and finance is not subject to the natural processes that seem to turn most things Gaussian.
***Use Gaussian if you have an idea of what the parameter probably is but aren't exactly sure, rectangular if you really have no idea. A rectangular distribution says "I have no idea, the parameter could be anywhere within this particular range."
HaHAHAHA. Can I quote that next time I teach? A bounded uniform prior, when done with Monte Carlo, often denotes MUCH stronger assumptions than does a Gaussian or t-distribution. It is basically say, "there is no chance whatsoever the parameter can be out of this range." So, you say, make your rectangle large enough. Well, that only works for undergrad stats courses, not in most of the models I've worked with or dealt with. It also breaks down phenomenally fast in higher dimensions. It may work as a hack, but I would NEVER trust such results unless there is good reason to use a bounded uniform.
****Monte Carlo is great for those of us who don't care to learn the arcane minutiae of stat math. If you have a working model it takes an hour or two to extend it so you get stochastic results. Note that it's no harder to give a distribution to all your input parameters not just one. In which case you will be doing the kind of work that people who make 500 grand a year do.
If you don't go through the math and simply treat it as a black box, you WILL MESS monte carlo up and give false results. We need more people in that sector who really know statistics, which means mastering statistical math (and I'm curious why you think it's arcane), not just think they do and plow blindly through minefields of gotchas you never learn in undergrad stats courses. Yes, MC is a great tool; it may be a step up, but I would never trust it without having good theoretical justification that it works. On the models in the financial industry, this is much more difficult than you might think.
See sig!!!
Sounds to me like anyone trying to use cocaine by drinking Red Bull Cola would die of excess water consumption (anything is a poison given a high enough dose) before they'd feel the cocaine at all.
[Citation needed]
I suspect it'd be because they flew off into outer space and suffocated.
Despite all the memos, it is still not the Year of Linux in the Dumpster.
I once interned with a high-energy physics research group of about 40 people or so that had a policy that all physics related code had to be written in fortran 77. The reasoning was that it had to be fast, and that everyone could read it as everyone knew it. That was 2003. So yeah, it's not going to die.
OTOH, that was the main reason I left physics and went into computer science -- I kept thinking "There has to be something better out there..." -- and I don't regret that decision.
Given that half the stuff people supposedly don't have to worry about are just things taken over by the kernel, I'm guessing she didn't poll many of them...
Apparently, their RAID goes up to 11.
As a current Ph.D. student in statistics, I would say YES!!! The errors do not come from the statistics itself (the theory is very rigorously worked out from assumptions) or from deciding to use statistical modeling in a given situation (the problems almost always come when people incorrectly think a process is non-random), but from people applying models incorrectly. In other words, what we need are more people who know the complexity and assumptions behind these, not more statistical monkeys who treat statistical modeling like a tweakable black box spitting out answers.
In my viewpoint, the problems and the current financial process were not surprising at all. We have a saying that "all models are wrong; some models are useful", and the problems have no idea how useful it is and where it could go wrong, given the assumptions they work under.
--
Those of us clean up after you think differently.
Yeah, well, there's no reason to be troubled there. Monte Carlo is great, and great in this situation, because it expands the possible models that people can work with, and if done with any intelligence will give reliable answers and error bounds on those answers. Throw Monte Carlo out the window, and you're back to conjugate distributions and low dimensional models that have far more restrictive and unrealistic assumptions.
I've now changed my password from Thomas to ThomasX, where X is a digit that I'm not telling.
... and how much for an animated paper clip?
Good point. The intent of my question is to gauge how well it replies to currently unanswerable questions. If it said: "I don't know, as a proof of the Riemann Hypothesis is necessarily non-constructive", then I would be quite impressed. If it says "I don't know what you mean; please rephrase", I'd be less impressed.
The first question I'll ask it: "Is the Riemann Hypothesis true?" The answer would probably be a good indicator of how useful the system will be.
Well, your user name is close to the verb "retch". This event must be pretty burned in your memory, so that's understandable.
If he spent so much effort gathering lizard byproducts, why didn't he record information about where the sample was found, GPS coords, time of year, time of day, surroundings, etc. It sounds like it was just a unmarked 30 kg bag of poop that the university thought was leftover fertilizer or something. If so, I can understand it. However, a big bag of individually labeled samples in plastic bags, each with detailed information, would scream "research project" even if it the big bag didn't have a sign saying "lizard poop for research project."
which could easily be removed in an open source windows. Don't think Microsoft could get around that one, even if it wanted to...
I'm a member of PETA, you insensitive clod!
Oh, it's not people eating tasty animals?
Fine, fine, carry on then.
http://www.sagemath.org/. Sage rocks. It's python based, brings together many of the useful libraries already mentioned (numpy, scipy, matplotlib, etc.), and has a very active mailing list. Can't recommend it enough.
Good point. Gnome should change "move to trash" to "compost file."