Pinnacle Project is sort of a prototype for an idea some of the upper level informatics people have at the Broad: Google data. In other words, start at any particular data you have, and link institute wide to other projects that have 'related' data.
Oh, and I've used the KS stat on a few of my papers (and we're using it and some other similiar stats for Pinnacle.) It's useful in that it tells you there's a difference between two distributions, but if have differences at both tails, you don't get to see that (well, you see the effect of the larger one). And it doesn't really help you find out or rank how the individual elements are acting. I used it for my CNC paper, and while it was useful and cool to be able to say that conserved non coding regions are under selection as a whole class, it would be far better to be able to point to *particular* CNCs that were contributing to the signal, and not just point the the class of features. My guess is that most of the class was just noise, and you didn't get to see the particularly interesting ones via KS.
I work at Children's Hospital and the Broad institute. I'm currently working on a few projects. My pet project of the moment is the "Pinnacle Project". If you search on that and "Broad", you'll come up with (not my) bio, but the PI's bio. I've done all the db architecture work, as well as some really nifty software for the system.
Basically, the project's job is to take lots of different kinds of data (snp association, expression, model organism, small molecule, etc) and map lots of different data sets to each other. So, you could take some expression profile for diabetics, and "map" the data onto snp association data for diabetes. You could then determine if the two data sets point towards the same sort of story - do certain genes have strong results in both data sets?
All of that happens 'automagically' at this point, and there's a really handy database to support both lots of different kinds of experimental data, as well as a decision support system to figure out how all this data should come together.
In the past, I've worked on Conserved Non Coding regions in the human genome, using the Hapmap data to look for skews in allele frequency indicating signatures of selection in the genome (partitioning the scans by if the snp is in or not in a CNC.) That one got me (shared) first author Nature Genetics paper.:)
So much for my anonymity. If my boss is reading, I guess you realize I'm contemplating quitting...:)
Please, for the love of god, stop spamming the MBTA. Have you seen the redline lately?
I and a bunch of other people who do 'real computer science' as you call it (bioinformatics analysis) we quite amused by the "we'll overwork you like a startup, but we'll make sure our CEO gets rich" add campaign.
I figured based on the amount of spam you have, you must be really desperate to hire people. The only add spam I've seen that's worse on the train has been alcohol advertisements (beer in particular), and everyone knows you really have to shovel the adds to sell Coors light.
They are the people who would if they weren't doing it as a job would be writing code in the evenings because they love it.
Every time I hear this, I almost feel a little sad. If I didn't have a job writing code, I don't know that I'd write code at home. Frankly, the problems and the meaning behind the code I write for work will get me to work past the 'expected' hours often - I'm writing code to analyze biological data and change how people treat common human disease. I don't think I'm going to have a 'pet project' at home that matches that. At work, I have lots of resources (large data sets, big machines, etc) to work on. At home, what could I do? Put up a dinky website? Maybe make an app to solve some problem everyone else already has?
Maybe I'd get involved in an open source project, but my gut feeling is that if I'm unemployed, I'm going to NOT code for a bit, take a break, get refreshed, then get another job. I like coding, but there's just so much more to life than that. (see: sexy girlfriend, mountain biking, cooking, music, reading, games, friends, etc)
Does this make me not a superstar? Maybe. On the other hand, I've been talking to some people lately about my current job and how well it's (not) going. I'm getting job offers left and right from lots of different people for very nice positions in both academia and industry. So, I must not suck - but this notion that someone who's a good problem solver would always be solving problems, and in one particular area (coding, and there's a lot of other areas you can apply your same understanding of logic, math, statistics, process, etc to.)
More like, it's an issue that has been CREATED so that politicians don't have to face the tough questions of our day, like what the heck we're still doing in Iraq. If you can't tell that this (along with immigration) are issues that were cooked up as distractions, then it's working on you.
I study diabetes and obesity for a living as a bioinformatician at a genetics lab. We do use BMI. However, we also use things like waist-hip ratio, which can help with the issue you quote. Unfortunately, your issue is rare, as there aren't a lot of people who have high BMIs and are also healthy.
My cash is on CodonDevices. George Church has an incredible new sequencing technology, and he's making it open source. I know some of the peeps there writing software, and between the tech and the IT team, they'll be able to generate and handle the data - the big sequencing companies ought to be scared...
Or, as I've mentioned in other posts, don't use SQL.
Use an ORM solution instead. I haven't written SQL in over a year.
One of my current projects is a generic way to query fairly complex type/value data out of a set of tables without any database knowledge by the user.
Imagine you have a bunch of people with traits, like eye color, weight, etc.
You want to say: Show me all people that have blue eyes and weigh more than 50lb, and do not smoke (or put an "OR" query in there instead.)
That sort of query actually maps onto multiple tables in the database (I think ~5 or so)
I've cranked a nice solution to that where all you have to do is configure where your tables exist, and where your operators/operands are mapped. The framework then generates objects (in this case, Criteria objects from Hibernate) that create the query for you.
And, since the query generated is 'rooted' in the data type you want to retrieve (like Individual) you can run the framework to do this filtering, then add your own restrictions or joins to the query automatically. You can even associate this result to other entities that you didn't even configure for by just configuring how you'd join the two objects, by simply referencing the property of the object that defines the join.
Again, there's no SQL here. It's an OO query language where you pass in objects to represent joins and restrictions.
Java Persistance has some similar functionality, but Hibernate has a LOT of power, once you get over the learning curve of how to use it.
Second, when did parameterized calls = stored procs? I have plenty of the first, and none of the second in my current project.
Also, if you want seperation of concerns (business logic != data retrieval), then what's wrong with coding up your parameterized queryies appropriately? Just check for nulls on the fields you put in, and don't join on those entities. It ain't tough.
Or, use an ORM (Hibernate, others), and you don't have to worry about that at all...
You should have seen what PM was promising for Dungeon keeper 1 a year before release. It was going to be like a raid-style MMRPG. You'd get a group of 6 people as the 'heroes', and one dungeon keeper. The 6 players have to defeat the one player.
Not exactly what came out of dungeon keeper, was it.
Compared to the hype, that game was a piece of shit too.
I'd have to fall back to magic carpet and syndicate (and maybe high octane) to give PM's crew much credit, and he probably just rode someone else's coat-tails there.
Can you please send me a one time pad, and guarantee that it isn't intercepted?
AFAICT, that's what quantum encryption is all about - getting that one time pad over the wire, and knowing if someone has attempted a man in the middle attack.
How about this: implement an interface that all of your processes are going to implement.
Create a map of concrete classes to your names. As each name comes in, check the map, run the concrete class on the data. If the name doesn't exist, then throw an exception that the process hasn't been implemented yet.
You can 'inject' that map very easily, and you can configure it either using Spring Framework, or by setting it up in some properties file (or XML.)
I do this stuff all the time for scientific workflow pipelines.
As an ex-EJB dev (oh, 1.0 was just the right time to adopt...ug!), I'm kinda hopefull about 3.0. It has adopted the spring framework aspects I like the most (injection-style programming), I can use hibernate for my ORM (which is stronger than just Java Persistance), and I get JMX/JNDI/JTA/etc services pretty easily.
As it stands, I've read the books, and the next project I start (in a few weeks) will be ejb3.0, just to give a quick back-to-front application a shot.
I hear where you're comming from, though, as I've just also finished reading a lot of "Faster, Lighter Java" style books where all ejb services are replaced with lightweight componenets (like Spring.)
Consider: As I understand it, the wiring in the Golden Gate Bridge, if layed end-to-end, would stretch around the globe three times over. Considering the circumfrence of the earth is something like 40,000km, that would mean that we've already built bridge structures that incorporate over 100,000km of cabling. Granted, the design of the space elevator is completely novel; but this stuff is based on modern engineering understanding.
On september 11th, around 11AM, there was a giant stack of rubble that incorporated all the building materials of two large buildings. That's the same as having the buildings, right?
You (or dawkins) forget that proximity of one gene to another gene has a huge effect. If a gene X is beneficial, then lots of genetic material around that gene are carried along with it (see the term 'selective sweep'). So, if you have a gene that causes some level of problems (but not outweighed by the 'good' gene - and I have no idea how this works - we mostly have the same genes, but there are mutations in the genes that change the proteins that are produced, or the levels of expression of the gene, or the speed of degredation of the protein, etc) it can be preseerved.
The problem with 'old' theories is that we're learning an incredible amount of information about how the genome works - I'd say that it's an almost exponential learning process right now, given how fast data generation has increased in speed in the last 10 years, and the amount of raw data available to mine.
Actually, as an aside, that's one of the coolest things about bioinformatics now: I can download lots of datasets from different groups, and think of new ways to join them together and analyse them, and find completely new concepts/feauters about the genome. Now, think about how bioinformatics didn't exist 10 years ago at the university level, and how there are tons of students earning PhDs in informatics now - and there's just an explosion of research.
I'm officially a 'brie eating cambridge, MA liberal'. Seriously, I live in Cambridge, MA (and I prefer St. Andre to regular brie. At least make it a triple cream, ok?)
Yet, when I listen to Air america (or a right wing talk show), I hear the same basic tactics. There's 14 kinds of logical falacies, and neither side takes a reasonable approach. I've heard people on air america (Jennene Garafalo, shut the Fuck up!) who are just embarrasing.
AA is fighting the right wing talk radio attitude with the same type of bullshit as the right wing, but they don't do it as well. They come off as just another group of idiots saying the exact opposite of the right wing, but don't sound a heck of a lot more reasonable.
I'd love for them to use dispassonate arguments, site facts and statistics, and speak to the common man. Instead, they preach only to the converted.
I'd guess that 95+% of game users get to play games without major show stopping bugs on average for a game. Actually, I'd guess that the number is higher than that.
Also, if you're doing a 'home brew' of your system, then it's kinda your responsibility to test and make sure your system works, not the game manufacturers. I know it can be complicated (tech changes all the time, and if you don't keep up, then homebrew can be difficult.) Still, for around $600, you can buy a system that 'just works', and does more than play games.
Given all that, I'm still heavily considering a Wii, so don't think I'm anti-console. I'm just anti "I tried it 9 years ago, and it didn't work, so things must be THE EXACT SAME now".:)
I had an unmodified GT7900 from evga. It froze up when I ran 3dmark 2006 on it (the deep freeze scene in particular hard locked the system.) The new card has different memory, and a better cooling system. I'm not OC'ing the card, as I like quiet performace over speed.
Needless to say, if you read the eVGA forums (or other forums for PC builders) this was a huge problem for people running at stock speeds.
Pinnacle Project is sort of a prototype for an idea some of the upper level informatics people have at the Broad: Google data. In other words, start at any particular data you have, and link institute wide to other projects that have 'related' data.
Oh, and I've used the KS stat on a few of my papers (and we're using it and some other similiar stats for Pinnacle.) It's useful in that it tells you there's a difference between two distributions, but if have differences at both tails, you don't get to see that (well, you see the effect of the larger one). And it doesn't really help you find out or rank how the individual elements are acting. I used it for my CNC paper, and while it was useful and cool to be able to say that conserved non coding regions are under selection as a whole class, it would be far better to be able to point to *particular* CNCs that were contributing to the signal, and not just point the the class of features. My guess is that most of the class was just noise, and you didn't get to see the particularly interesting ones via KS.
I work at Children's Hospital and the Broad institute. I'm currently working on a few projects. My pet project of the moment is the "Pinnacle Project". If you search on that and "Broad", you'll come up with (not my) bio, but the PI's bio. I've done all the db architecture work, as well as some really nifty software for the system.
:)
Basically, the project's job is to take lots of different kinds of data (snp association, expression, model organism, small molecule, etc) and map lots of different data sets to each other. So, you could take some expression profile for diabetics, and "map" the data onto snp association data for diabetes. You could then determine if the two data sets point towards the same sort of story - do certain genes have strong results in both data sets?
All of that happens 'automagically' at this point, and there's a really handy database to support both lots of different kinds of experimental data, as well as a decision support system to figure out how all this data should come together.
In the past, I've worked on Conserved Non Coding regions in the human genome, using the Hapmap data to look for skews in allele frequency indicating signatures of selection in the genome (partitioning the scans by if the snp is in or not in a CNC.) That one got me (shared) first author Nature Genetics paper.
So much for my anonymity. If my boss is reading, I guess you realize I'm contemplating quitting...:)
Please, for the love of god, stop spamming the MBTA. Have you seen the redline lately?
I and a bunch of other people who do 'real computer science' as you call it (bioinformatics analysis) we quite amused by the "we'll overwork you like a startup, but we'll make sure our CEO gets rich" add campaign.
I figured based on the amount of spam you have, you must be really desperate to hire people. The only add spam I've seen that's worse on the train has been alcohol advertisements (beer in particular), and everyone knows you really have to shovel the adds to sell Coors light.
Every time I hear this, I almost feel a little sad. If I didn't have a job writing code, I don't know that I'd write code at home. Frankly, the problems and the meaning behind the code I write for work will get me to work past the 'expected' hours often - I'm writing code to analyze biological data and change how people treat common human disease. I don't think I'm going to have a 'pet project' at home that matches that. At work, I have lots of resources (large data sets, big machines, etc) to work on. At home, what could I do? Put up a dinky website? Maybe make an app to solve some problem everyone else already has?
Maybe I'd get involved in an open source project, but my gut feeling is that if I'm unemployed, I'm going to NOT code for a bit, take a break, get refreshed, then get another job. I like coding, but there's just so much more to life than that. (see: sexy girlfriend, mountain biking, cooking, music, reading, games, friends, etc)
Does this make me not a superstar? Maybe. On the other hand, I've been talking to some people lately about my current job and how well it's (not) going. I'm getting job offers left and right from lots of different people for very nice positions in both academia and industry. So, I must not suck - but this notion that someone who's a good problem solver would always be solving problems, and in one particular area (coding, and there's a lot of other areas you can apply your same understanding of logic, math, statistics, process, etc to.)
More like, it's an issue that has been CREATED so that politicians don't have to face the tough questions of our day, like what the heck we're still doing in Iraq. If you can't tell that this (along with immigration) are issues that were cooked up as distractions, then it's working on you.
I study diabetes and obesity for a living as a bioinformatician at a genetics lab. We do use BMI. However, we also use things like waist-hip ratio, which can help with the issue you quote. Unfortunately, your issue is rare, as there aren't a lot of people who have high BMIs and are also healthy.
My cash is on CodonDevices. George Church has an incredible new sequencing technology, and he's making it open source. I know some of the peeps there writing software, and between the tech and the IT team, they'll be able to generate and handle the data - the big sequencing companies ought to be scared...
Or, as I've mentioned in other posts, don't use SQL.
Use an ORM solution instead. I haven't written SQL in over a year.
One of my current projects is a generic way to query fairly complex type/value data out of a set of tables without any database knowledge by the user.
Imagine you have a bunch of people with traits, like eye color, weight, etc.
You want to say: Show me all people that have blue eyes and weigh more than 50lb, and do not smoke (or put an "OR" query in there instead.)
That sort of query actually maps onto multiple tables in the database (I think ~5 or so)
I've cranked a nice solution to that where all you have to do is configure where your tables exist, and where your operators/operands are mapped. The framework then generates objects (in this case, Criteria objects from Hibernate) that create the query for you.
And, since the query generated is 'rooted' in the data type you want to retrieve (like Individual) you can run the framework to do this filtering, then add your own restrictions or joins to the query automatically. You can even associate this result to other entities that you didn't even configure for by just configuring how you'd join the two objects, by simply referencing the property of the object that defines the join.
Again, there's no SQL here. It's an OO query language where you pass in objects to represent joins and restrictions.
Java Persistance has some similar functionality, but Hibernate has a LOT of power, once you get over the learning curve of how to use it.
I'd mod you up, but I spent too much time telling other people to get a clue.
PS: try Spring framework or Hibernate, if you don't already and use Java...
Many java APIs, you can name parameters (like say, JDBC, Hibernate, etc)
:name";
"Select * from customer where customer.name =
setParameter("name", "Bob");
Now, is it still so hard?
First:
You want an ORM layer, not SQL.
Second, when did parameterized calls = stored procs? I have plenty of the first, and none of the second in my current project.
Also, if you want seperation of concerns (business logic != data retrieval), then what's wrong with coding up your parameterized queryies appropriately? Just check for nulls on the fields you put in, and don't join on those entities. It ain't tough.
Or, use an ORM (Hibernate, others), and you don't have to worry about that at all...
You should have seen what PM was promising for Dungeon keeper 1 a year before release. It was going to be like a raid-style MMRPG. You'd get a group of 6 people as the 'heroes', and one dungeon keeper. The 6 players have to defeat the one player.
Not exactly what came out of dungeon keeper, was it.
Compared to the hype, that game was a piece of shit too.
I'd have to fall back to magic carpet and syndicate (and maybe high octane) to give PM's crew much credit, and he probably just rode someone else's coat-tails there.
What confuses me is that the new MS player isn't compatable with PlaysForSure?
If so, that's really going to undermine the "PlaysForSure" brand...
Can you please send me a one time pad, and guarantee that it isn't intercepted?
AFAICT, that's what quantum encryption is all about - getting that one time pad over the wire, and knowing if someone has attempted a man in the middle attack.
How about this: implement an interface that all of your processes are going to implement.
Create a map of concrete classes to your names. As each name comes in, check the map, run the concrete class on the data. If the name doesn't exist, then throw an exception that the process hasn't been implemented yet.
You can 'inject' that map very easily, and you can configure it either using Spring Framework, or by setting it up in some properties file (or XML.)
I do this stuff all the time for scientific workflow pipelines.
As an ex-EJB dev (oh, 1.0 was just the right time to adopt...ug!), I'm kinda hopefull about 3.0. It has adopted the spring framework aspects I like the most (injection-style programming), I can use hibernate for my ORM (which is stronger than just Java Persistance), and I get JMX/JNDI/JTA/etc services pretty easily.
As it stands, I've read the books, and the next project I start (in a few weeks) will be ejb3.0, just to give a quick back-to-front application a shot.
I hear where you're comming from, though, as I've just also finished reading a lot of "Faster, Lighter Java" style books where all ejb services are replaced with lightweight componenets (like Spring.)
On september 11th, around 11AM, there was a giant stack of rubble that incorporated all the building materials of two large buildings. That's the same as having the buildings, right?
Oh wait, it's not.
Some problems are more difficult to solve than others.
Can we have a competition for inane comments?
Given that the GP can't spell 'their' correctly, I doubt it.
And, as far as I know 'chemicals' aren't heritable. The genetics behind them, you could make an arguemnt for.
You (or dawkins) forget that proximity of one gene to another gene has a huge effect. If a gene X is beneficial, then lots of genetic material around that gene are carried along with it (see the term 'selective sweep'). So, if you have a gene that causes some level of problems (but not outweighed by the 'good' gene - and I have no idea how this works - we mostly have the same genes, but there are mutations in the genes that change the proteins that are produced, or the levels of expression of the gene, or the speed of degredation of the protein, etc) it can be preseerved.
The problem with 'old' theories is that we're learning an incredible amount of information about how the genome works - I'd say that it's an almost exponential learning process right now, given how fast data generation has increased in speed in the last 10 years, and the amount of raw data available to mine.
Actually, as an aside, that's one of the coolest things about bioinformatics now: I can download lots of datasets from different groups, and think of new ways to join them together and analyse them, and find completely new concepts/feauters about the genome. Now, think about how bioinformatics didn't exist 10 years ago at the university level, and how there are tons of students earning PhDs in informatics now - and there's just an explosion of research.
I'm officially a 'brie eating cambridge, MA liberal'. Seriously, I live in Cambridge, MA (and I prefer St. Andre to regular brie. At least make it a triple cream, ok?)
Yet, when I listen to Air america (or a right wing talk show), I hear the same basic tactics. There's 14 kinds of logical falacies, and neither side takes a reasonable approach. I've heard people on air america (Jennene Garafalo, shut the Fuck up!) who are just embarrasing.
AA is fighting the right wing talk radio attitude with the same type of bullshit as the right wing, but they don't do it as well. They come off as just another group of idiots saying the exact opposite of the right wing, but don't sound a heck of a lot more reasonable.
I'd love for them to use dispassonate arguments, site facts and statistics, and speak to the common man. Instead, they preach only to the converted.
I'm a dem, but they just depress me.
I don't know why I wanted to double check this. I must be bored/obsessed with numbers?
1,000,000 K / minute = 1,000,000 / 60 seconds = 16666KByte/second
4.5 Mbps * 1024(Kb/M) = 4608 KByte/second
With that 4 fold factor, I'd have said "Um...compression, if it's text?"
Unless I did my math wrong....:)
You said "I'd still be constantly saving in Monkey Island 3"
I sland
:)
Monkey island 3 is 1997. http://en.wikipedia.org/wiki/The_Curse_of_Monkey_
I'd guess that 95+% of game users get to play games without major show stopping bugs on average for a game. Actually, I'd guess that the number is higher than that.
Also, if you're doing a 'home brew' of your system, then it's kinda your responsibility to test and make sure your system works, not the game manufacturers. I know it can be complicated (tech changes all the time, and if you don't keep up, then homebrew can be difficult.) Still, for around $600, you can buy a system that 'just works', and does more than play games.
Given all that, I'm still heavily considering a Wii, so don't think I'm anti-console. I'm just anti "I tried it 9 years ago, and it didn't work, so things must be THE EXACT SAME now".
1997 called. I wants it's 9 year old video games back.
Times have changed since 1997, both for PCs and consoles.
I had an unmodified GT7900 from evga. It froze up when I ran 3dmark 2006 on it (the deep freeze scene in particular hard locked the system.) The new card has different memory, and a better cooling system. I'm not OC'ing the card, as I like quiet performace over speed.
Needless to say, if you read the eVGA forums (or other forums for PC builders) this was a huge problem for people running at stock speeds.