Honest question: what do you do when you need Math / ML support?
Java is astonishingly vacant -- only Weka and Mahout -- but no packages provide logistic regression, variation measures, distance functions, distribution functions, etc.
I have a friend who works for a company that does gene sequencing and other genetic research and, from what he's told me, the whole industry uses mostly python.
I work in a Gene Sequencing company and the current debate is: Python vs Clojure vs Scala:
Python unquestionably has the best bioinformatics support. Period. No debate. But the lack of language features has irked many programmers, myself included.
Clojure is gaining attention as a pure functional language with an R-like environment for large scale machine learning tasks. (Bioinformatics is machine learning applied to biomedicine). This makes porting Matlab/Octave/R code much easier, or so the thinking goes.
Scala has its backers. Java is nearly invisible in the bioinformatics world, but the JVM is hard to ignore. Scala has excellent support for Machine Learning but terrible support for "biological and medical applications". Hat tip: you can hire scala programmers or teach Scala to Java programmers in short time
BIG DATA is a bigger problem for us today than previously: "genome wide arrays" used to mean all ~25,000 gene transcripts or 500,000 single DNA changes (SNP).
These revolutionary technologies are already considered "old".
The rise in performance and drop of cost of DNA sequencing is much faster than the commodification of CPUs during the internet dot-com race.
Recall when AMD first had 64bit support and intel was still doing *EMULATED* 32bit.
AMD was hands down the best option.
Either AMD biz dev was bad, or Intel biz dev was good (illegal?), or both.
I suspect it was both.
Having one CPU maker is bad for everyone.
Long live AMD, I guess.
Input: 3 data streams A,B,C each which are not truly random
Correlate: A&B, A&C, B&C
Subproblems: given A&B predict C
Subproblems: given A&C predict B
Subproblems: given B&C predict A
NSA has mathematicians better than I, for sure. My guess is that if an agency has altered an RNG then they have done so in a way that is systematic. These kinds of problems are common in analysis of complex systems -- given three non random variables with characteristic variance can you predict the output variable from the covariance? I dont think I could, but it seems feasible to me that the NSA could.
NSA employs more mathematicians than anyone.
NSA uses linux for their data farms.
NSA is a code breaking agency.
NSA has worked with many tech companies, from wintel to google, stellar wind to TIA, etc, etc.
Occam's razor: which is more likely?
A) NSA worked with intel to provide a known hardware key OR
B) NSA did not work with Intel and chose instead to work with Microsoft, Google, Yahoo, Verizon, ISPs, etc, etc, etc
Entropy + mutual distrust = security
FASCINATING proposal!
I would have never thought
MSS+KGB+NSA = privacy IFF every agency provides a mutually distrusted hardware key
But is this mathematically true? (honest question)
Shannon entropy also applies to mutual information , which formally includes joint entropy, eg...."detecting when values change together".
If these same agencies were able to detect any shared properties (such as joint entropy), the encryption would be EASIER to break, not harder.
Previous court cases were thrown out because no one could prove they were being spied on.
PERFECT !
Either:
A) FBI can install equipment using by citing authorizations which gives ISPs and customers grounds to sue...or...
A) FBI cannot install equipment because they have cited no authorization to do so.
SO which is it FBI?
Are you, or are you not installing equipment?
** Fuck sports. GO TEAM EFF !!!! **
This is more about democratic process than privacy.
Tice (NSA whistleblower to NYT in 2004) claims that the NSA is wiretapping members of congress, federal judges, FISA judges, appropriations committees,.... etc.
Lets hope this is all one elaborate lie or we have a KGB in america.
As an NLP Bioinformatics guy, I believe the real crime Aaron Swartz committed was being in the news.
He isn't the first to have that dataset and he wont be the last. We write papers using massive NLP scans of publications rather routinely.
Most of the time, the papers are downloaded from PubMed (public funded) so they can't even complain about bandwidth costs, etc.
For anyone who didn't know already, most subscription Publishers don't **DO** anything. They are only slightly better than patent trolls, and in some cases, worse.
"nobody wants hierarchical web directories"
Tim Berners Lee sure does ('inventor' of WWW) -- to the degree that he is spending all his political capital on the Semantic Web.
This pissed me off so much I just registered FactCheckScience.org
Currently finishing PhD thesis writing (bioinformatics) so I dont have time to invest in a domain.
If anyone has a stellar idea, I'm an Open Source supporter and happy to share the domain.
This has to stop.
The "Too Big To Jail" story is the greatest threat to democracy and world stability.
If we come crashing down, it will be banks not terrorists.
Honestly, I consider banks to be more hostile to me than terrorists.
Truly -- I mean that. Terrorists bomb you when you spend a decade in their country installing puppets.
Banks do it to their own neighbors for the thrill of success.
That's much more hostile, IMHO.
"Any scientist these days is going to have to be proficient with computers and analyzing data"
IAMA phd bioinformatics person with a CS background and love for math.
The biological problems are increasingly requiring graduate level math and computer science training, for example gene network analysis, biological structure and binding prediction, and bayesian analyses, to name a few.
While the biology is obviously not simple, it can be more easily learned as "on the fly" (though this is still very difficult).
Why? Because biology is more QUALITATIVE and computer science/math is more QUANTITATIVE.
FWIW,
1 opinion + 1 more = 1.
I did this! New company doesn't want to pay for matlab. :(
Matlab user here: it is EXPENSIVE.
Leaving your company means leaving your language.
One should not have have to leave your language behind just because you leave your people.
R is great until you look under the hood at the package implementations.
"A good implementation beats a good design "
Honest question: what do you do when you need Math / ML support? Java is astonishingly vacant -- only Weka and Mahout -- but no packages provide logistic regression, variation measures, distance functions, distribution functions, etc.
How do you do MATH with Java?
a complete answer would be Python and C++, because numpy/scipy can't do everything and Python is still very slow for number-crunching.
The problem with using the mix (when you actually write the C++ code yourself) is that debugging it is a major pain in the ass
+1 "I just spent three days chasing down build error that uses numpy/scipy"
I have a friend who works for a company that does gene sequencing and other genetic research and, from what he's told me, the whole industry uses mostly python.
I work in a Gene Sequencing company and the current debate is: Python vs Clojure vs Scala:
Python unquestionably has the best bioinformatics support. Period. No debate. But the lack of language features has irked many programmers, myself included.
Clojure is gaining attention as a pure functional language with an R-like environment for large scale machine learning tasks. (Bioinformatics is machine learning applied to biomedicine). This makes porting Matlab/Octave/R code much easier, or so the thinking goes.
Scala has its backers. Java is nearly invisible in the bioinformatics world, but the JVM is hard to ignore. Scala has excellent support for Machine Learning but terrible support for "biological and medical applications". Hat tip: you can hire scala programmers or teach Scala to Java programmers in short time
BIG DATA is a bigger problem for us today than previously:
"genome wide arrays" used to mean all ~25,000 gene transcripts or 500,000 single DNA changes (SNP).
These revolutionary technologies are already considered "old".
The rise in performance and drop of cost of DNA sequencing is much faster than the commodification of CPUs during the internet dot-com race .
Recall when AMD first had 64bit support and intel was still doing *EMULATED* 32bit. AMD was hands down the best option. Either AMD biz dev was bad, or Intel biz dev was good (illegal?), or both. I suspect it was both. Having one CPU maker is bad for everyone. Long live AMD, I guess.
How would they detect any shared properties?
Perhaps Covariance analysis? (mutual information, pearson, svd, etc).
Input: 3 data streams A,B,C each which are not truly random
Correlate: A&B, A&C, B&C
Subproblems: given A&B predict C
Subproblems: given A&C predict B
Subproblems: given B&C predict A
NSA has mathematicians better than I, for sure. My guess is that if an agency has altered an RNG then they have done so in a way that is systematic. These kinds of problems are common in analysis of complex systems -- given three non random variables with characteristic variance can you predict the output variable from the covariance? I dont think I could, but it seems feasible to me that the NSA could.
NSA employs more mathematicians than anyone.
NSA uses linux for their data farms.
NSA is a code breaking agency.
NSA has worked with many tech companies, from wintel to google, stellar wind to TIA, etc, etc.
Occam's razor: which is more likely?
A) NSA worked with intel to provide a known hardware key OR
B) NSA did not work with Intel and chose instead to work with Microsoft, Google, Yahoo, Verizon, ISPs, etc, etc, etc
A is more likely.
Entropy + mutual distrust = security
FASCINATING proposal!
I would have never thought
MSS+KGB+NSA = privacy IFF every agency provides a mutually distrusted hardware key
But is this mathematically true? (honest question)
Shannon entropy also applies to mutual information , which formally includes joint entropy, eg...."detecting when values change together".
If these same agencies were able to detect any shared properties (such as joint entropy), the encryption would be EASIER to break, not harder.
It is true, most of the population went along with the Third Reich, Michael Hayden is in the class of "Doers" who turned idea into project plan.
Previous court cases were thrown out because no one could prove they were being spied on. PERFECT ! Either: A) FBI can install equipment using by citing authorizations which gives ISPs and customers grounds to sue ...or...
A) FBI cannot install equipment because they have cited no authorization to do so.
SO which is it FBI?
Are you, or are you not installing equipment?
** Fuck sports. GO TEAM EFF !!!! **
PostgreSQL can do the vast majority of what Oracle can do at no cost
And PG year after year is much, MUCH easier to install,backup,and maintain.
Candidate Obama debates President Obama on Government Surveillance
www.youtube.com/watch?v=7BmdovYztH8&feature=youtu.be
This is more about democratic process than privacy.
Tice (NSA whistleblower to NYT in 2004) claims that the NSA is wiretapping members of congress, federal judges, FISA judges, appropriations committees, .... etc.
Lets hope this is all one elaborate lie or we have a KGB in america.
Not sure what to make of this. Any thoughts other than hopelessness?
As an NLP Bioinformatics guy, I believe the real crime Aaron Swartz committed was being in the news.
He isn't the first to have that dataset and he wont be the last.
We write papers using massive NLP scans of publications rather routinely.
Most of the time, the papers are downloaded from PubMed (public funded) so they can't even complain about bandwidth costs, etc.
For anyone who didn't know already, most subscription Publishers don't **DO** anything.
They are only slightly better than patent trolls, and in some cases, worse.
After their comeback, they will ___________. Their focus is on ___________.
After their comeback, they will HIRE Time Berners Lee. Their focus is on the Semantic Web.
Why would the inventor of the WWW work for Yahoo?
Because you give him ALL the resources to make his dream a coherent reality.
Long shot for sure, but its what I would do.
"nobody wants hierarchical web directories" Tim Berners Lee sure does ('inventor' of WWW) -- to the degree that he is spending all his political capital on the Semantic Web.
First inter-racial kiss. First nerd cult classic en mass. Agreed.
That is because both Abrams and Bay can't seem to write any character
CANT or WONT? The variable is named $box_office and has a default value of $MAX_PROFIT.
This pissed me off so much I just registered FactCheckScience.org Currently finishing PhD thesis writing (bioinformatics) so I dont have time to invest in a domain. If anyone has a stellar idea, I'm an Open Source supporter and happy to share the domain. This has to stop.
Long time slashdot reader. This is the most insightful comment I have ever seen.
The "Too Big To Jail" story is the greatest threat to democracy and world stability. If we come crashing down, it will be banks not terrorists. Honestly, I consider banks to be more hostile to me than terrorists. Truly -- I mean that. Terrorists bomb you when you spend a decade in their country installing puppets. Banks do it to their own neighbors for the thrill of success. That's much more hostile, IMHO.
"Any scientist these days is going to have to be proficient with computers and analyzing data" IAMA phd bioinformatics person with a CS background and love for math. The biological problems are increasingly requiring graduate level math and computer science training, for example gene network analysis, biological structure and binding prediction, and bayesian analyses, to name a few. While the biology is obviously not simple, it can be more easily learned as "on the fly" (though this is still very difficult). Why? Because biology is more QUALITATIVE and computer science/math is more QUANTITATIVE. FWIW, 1 opinion + 1 more = 1.