Interesting. I view this from a completely different perspective: if DNA sequencing really is outpacing Moore's Law, that just means that the results become disposable. You use them for your initial analysis and store whatever summarized results you want from this sequence, then delete the original data.
If you need the raw data again, you can just resequence the sample.
The only problem with this approach, of course, is that samples are consumable; eventually there wouldn't be any more material left to sequence. So this wouldn't be appropriate in every situation.
I assume you're talking about incoming data, not the final DNA sequence. As I understand it the final result is 2 bits/base pair and about 3 billion base pairs so about a CD's worth of data per human. And if you were talking about a genetic database I guess 99%+ is common so you could just store a "reference human" and diffs against that. So at 750 MB for the first person and 7.5 MB for each additional person I guess you could store 2-300.000 full genetic profiles on a 2 TB disk. Probably the whole human race in less than 100 TB.
The incoming data is image-based, so yes, it will be huge. Regarding the sequence data: yes; in its most condensed format it could be stored in 750MB. There are a couple of issues that you're overlooking, however:
1. The reads aren't uniform quality -- and methods of analysis that don't consider the quality score of a read are quickly being viewed as antiquated. So each two bit "call" also has a few more bits representing the confidence in that call.
2. This technology is based on redundant reads. In order to get to an acceptable level of quality, you want at least ~20 (+/- 10) reads at each exonic loci.
So that 750MB you mention for a human genome grows by a factor of 20, then by another factor of 2 or 3, depending on how you store the quality scores.
Your suggestion of deduplicating the experiments could work, but definitely not as well as you think because of all the "noise" that's inherent in the above two steps.
If you really just wanted to unique portions of a sample, you could use a SNP array which just reads the samples at specific locations which are known to differ between individuals. Even with the advances in the technology, the cost of sequencing a genome still isn't negligible. For most labs, it's still cheaper to store the original data for reanalysis later.
I often stumble across some product on Wikipedia that I'm interested in buying (album, book, etc.). I actually would find it very convenient if such pages had a "Purchase this Item" link.
I'm sure Amazon would kick in a few million for that privilege, or you could use their pre-existing referral program. I think most users would view those links as added value to Wikipedia.
And in light of this, why are we assuming GPS? I can't get find GPS satellites through the metal in my car roof, let alone through my entire car. Is it more likely they they're just tracking the cellular connections?
A big-ass Oracle or IBM-DB2 can do the job if you pay enough for tuning.
Why is it that, ever since Key-Value DBs came into vogue, that relational databases instantly got perceived as so neanderthal?
A normal-ass Oracle database would surely be just fine for storing a no-fly list which, by necessity, has magnitudes of order less than 6.whatever billion names; I'm guessing it would do so without much tuning, also.
I'm not familiar with the intricacies of the Torrent protocols, but it seems like this group would either need to be in one of two groups:
A.) Connect to a swarm as a "spectator" not uploading or downloading any data.
B.) Connect to the swarm and actively upload/download.
If A., it seems like it would be hard to prove that any IP logged as participating in the swarm is actively engaged in any malicious behavior. If B., aren't they (the group) guilty of the same crimes of which they're accusing these other people?
I guess I just don't see how they could assure the courts of a crime being committed without having to participate in the exact same action in order to prove it.
I guess most people aren't worried about incontinence AND sperm production, but I would not want a wireless transmitter hosted that close to my reproductive factories...
I think you're committing the same sin of which you're accusing the author, just on the opposite side of the pendulum.
Saying that all DRBMSs won't "cut it" for modern applications in any domain is pretty narrow-minded. It seems like a simple rule of thumb to me: put relational data in an RDBMS, put key-value data in a Key-Value DB...
As an aside, having worked on the Information Retrieval side of bioinformatics for the past few years, I've found that the complex side of bioinformatics is generally in the computation, not the retrieval. I've been well-suited by a single RDBMS server up to this point, though I have played around with MemCached for a couple of web apps.
Regarding the visibility of typing patterns, any in-browser chat/forum will likely use Javascript on the client side. JS provides access to the Key Up and Key Down events in text boxes; adding time stamps to those events is trivial.
And I agree that being a 40 year old man doesn't make you a pedophile, but I think, probabilistically, that being a 40 year old man in a "Teenz Only!!1!" chatroom may rightfully raise a flag or two.
I've been waiting for years for someone to get Compositing working on >2 monitors. Unfortunately, all of the solutions mentioned involving Compiz are off-limits to anyone with more than two monitors.
But fishing line is a big problem in protecting marine animals; it seems like intentionally stranding hundreds of yards of the stuff might have some negative impacts on the surrounding aquatic life.
Ironically, they also sell a safe into which you can put your super-durable DVDs. They list as one of the justifications:
"Fire. The DVD can withstand temperatures as high as 300 degrees. Unless you can make sure that your house doesn't burn down at more than 300 degrees, you need the DVD Vault."
Agreed. I was shocked they were going to sequence that many genomes.
The article is also tagged "gene expression." This research has nothing to do with sequencing or gene expression analysis; just analyzing one-nucleotide mutations in the genome.
"Every man speaks of public opinion and means by public opinion, public opinion minus his opnion. Every man makes his contribution negative under the erroneous impression the the next man's contribution is positive. Every man surrenders his fancy to a general tone which is itself a surrender."
G.K. Chesterton.
Actually, that's how this keyboard works. They have the rubber dome, just like MOST other keyboards, with a carbon contact on the bottom. This allows varying degrees of current to flow through when the key is entirely depressed (but none before the dome has buckled).
As more of the surface of the carbon plate comes in contact with the underlying sensors, the pressure value goes up.
Interesting. I view this from a completely different perspective: if DNA sequencing really is outpacing Moore's Law, that just means that the results become disposable. You use them for your initial analysis and store whatever summarized results you want from this sequence, then delete the original data.
If you need the raw data again, you can just resequence the sample.
The only problem with this approach, of course, is that samples are consumable; eventually there wouldn't be any more material left to sequence. So this wouldn't be appropriate in every situation.
I assume you're talking about incoming data, not the final DNA sequence. As I understand it the final result is 2 bits/base pair and about 3 billion base pairs so about a CD's worth of data per human. And if you were talking about a genetic database I guess 99%+ is common so you could just store a "reference human" and diffs against that. So at 750 MB for the first person and 7.5 MB for each additional person I guess you could store 2-300.000 full genetic profiles on a 2 TB disk. Probably the whole human race in less than 100 TB.
The incoming data is image-based, so yes, it will be huge. Regarding the sequence data: yes; in its most condensed format it could be stored in 750MB. There are a couple of issues that you're overlooking, however:
1. The reads aren't uniform quality -- and methods of analysis that don't consider the quality score of a read are quickly being viewed as antiquated. So each two bit "call" also has a few more bits representing the confidence in that call.
2. This technology is based on redundant reads. In order to get to an acceptable level of quality, you want at least ~20 (+/- 10) reads at each exonic loci.
So that 750MB you mention for a human genome grows by a factor of 20, then by another factor of 2 or 3, depending on how you store the quality scores.
Your suggestion of deduplicating the experiments could work, but definitely not as well as you think because of all the "noise" that's inherent in the above two steps.
If you really just wanted to unique portions of a sample, you could use a SNP array which just reads the samples at specific locations which are known to differ between individuals. Even with the advances in the technology, the cost of sequencing a genome still isn't negligible. For most labs, it's still cheaper to store the original data for reanalysis later.
I often stumble across some product on Wikipedia that I'm interested in buying (album, book, etc.). I actually would find it very convenient if such pages had a "Purchase this Item" link. I'm sure Amazon would kick in a few million for that privilege, or you could use their pre-existing referral program. I think most users would view those links as added value to Wikipedia.
And in light of this, why are we assuming GPS? I can't get find GPS satellites through the metal in my car roof, let alone through my entire car. Is it more likely they they're just tracking the cellular connections?
This thing looks like "futuristic" technologies from the a 1980s movie: picture.
And the FCC ID is the same as the one in a mobile credit card terminal)...
I guess it's comforting to see that, in this instance, the government isn't decades ahead of the rest of us...
Sense vehicular motion (including vibration) and shut down the texting function while in motion.
Passengers in cars (, boats, and trains) may object to that one...
How exactly can the PR and marketing department assist a mile underwater?
use their bodies to plug up the well?
Honestly it's the best use for marketing and PR people....
Only on /. would this be rated as "Insightful" instead of "Funny"...
I wonder what it would mean to the RIAA (or any IP-based litigation) to have multiple ISP customers consistently NAT'ted to the same IP.
... Maybe this won't be so bad after all!
I bet you could help those people quite a bit by selling the computer you're using and donating the proceeds.
We might appreciate it, too...
A.) Connect to a swarm as a "spectator" not uploading or downloading any data.
B.) Connect to the swarm and actively upload/download.
If A., it seems like it would be hard to prove that any IP logged as participating in the swarm is actively engaged in any malicious behavior. If B., aren't they (the group) guilty of the same crimes of which they're accusing these other people?
I guess I just don't see how they could assure the courts of a crime being committed without having to participate in the exact same action in order to prove it.
I guess most people aren't worried about incontinence AND sperm production, but I would not want a wireless transmitter hosted that close to my reproductive factories...
Agreed. I thought the black hole would envelop Earth much more quickly than this.
Saying that all DRBMSs won't "cut it" for modern applications in any domain is pretty narrow-minded. It seems like a simple rule of thumb to me: put relational data in an RDBMS, put key-value data in a Key-Value DB...
As an aside, having worked on the Information Retrieval side of bioinformatics for the past few years, I've found that the complex side of bioinformatics is generally in the computation, not the retrieval. I've been well-suited by a single RDBMS server up to this point, though I have played around with MemCached for a couple of web apps.
"Seemingly from the dawn of man all nations have had governments; and all nations have been ashamed of them."
- G.K. Chesterton
Even in Javascript you have the opportunity of embedding timing data from the Key Press events into the typing data.
Regarding the visibility of typing patterns, any in-browser chat/forum will likely use Javascript on the client side. JS provides access to the Key Up and Key Down events in text boxes; adding time stamps to those events is trivial. And I agree that being a 40 year old man doesn't make you a pedophile, but I think, probabilistically, that being a 40 year old man in a "Teenz Only!!1!" chatroom may rightfully raise a flag or two.
'Gives a whole new meaning to "Freudian slip."
SRWare Iron is a solution to your Chrome privacy concerns - http://www.srware.net/en/software_srware_iron.php
It's a build of Chrome without all the privacy-infringing "features."
I've been waiting for years for someone to get Compositing working on >2 monitors. Unfortunately, all of the solutions mentioned involving Compiz are off-limits to anyone with more than two monitors.
But fishing line is a big problem in protecting marine animals; it seems like intentionally stranding hundreds of yards of the stuff might have some negative impacts on the surrounding aquatic life.
Their site states "300 degrees" (F?).
Ironically, they also sell a safe into which you can put your super-durable DVDs. They list as one of the justifications:
"Fire. The DVD can withstand temperatures as high as 300 degrees. Unless you can make sure that your house doesn't burn down at more than 300 degrees, you need the DVD Vault."
Agreed. I was shocked they were going to sequence that many genomes. The article is also tagged "gene expression." This research has nothing to do with sequencing or gene expression analysis; just analyzing one-nucleotide mutations in the genome.
"Every man speaks of public opinion and means by public opinion, public opinion minus his opnion. Every man makes his contribution negative under the erroneous impression the the next man's contribution is positive. Every man surrenders his fancy to a general tone which is itself a surrender." G.K. Chesterton.
Actually, that's how this keyboard works. They have the rubber dome, just like MOST other keyboards, with a carbon contact on the bottom. This allows varying degrees of current to flow through when the key is entirely depressed (but none before the dome has buckled). As more of the surface of the carbon plate comes in contact with the underlying sensors, the pressure value goes up.