Celera Genomics, using modern technology (and alot of financial backing, and the fact they are a subsidiary of the people who make sequencing machines,) competed the genome in a matter of months.
Just to correct a few inaccuracies. Firstly it hasn't been completed yet. Celera Genomics decided to claim they'd finished it, which spurred the public project to also claim the same. In reality both had only 90 - 95% of the consensus sequence and only then at "draft" quality.
Secondly, I feel it's wrong to claim that Celera were the people to "complete" it. Celera used their own data in conjunction with the public data, and yet they have (more or less) comparable results in terms of coverage, number of contigs, quality and so on. Personally I feel that this is like admitting that their own work doesn't add anything new to the public effort - ie they failed.
The bottleneck with sequencing at the moment is in the "finishing" process - tidying up the results to produce highly accurate answers. This is largely caused due to the randomness of the shotgun sequencing approach. As it's effectively solving a jigsaw puzzle from lots of randomly cut pieces of DNA, in some places you'll get lots stacking up and in others you'll find none. It is unrealistic (not to mention expensive) to keep increasing the coverage so that everywhere gets covered by the random shotgun process.
Instead a fixed depth (only 3 or 4 fold in the draft sequences) is used followed by directed sequencing where the user, or an automatic program, analyses the data set and chooses experiments to perform (primer walking typically).
The graphs I've seen of draft vs finished data show quite well how the finishing is lagging behind.
However this new strategy, not being random, will greatly reduce the amount of finishing needed. However note that at present they are using it for probes rather than full scale sequencing. It has great potential, but looks to be years away from replacing the current work.
I was watching a, rather jerky, webcast with RealPlayer. It got to the point of total eclipse and I was rather impressed that it managed to adjust the gain correctly so that you could see the corona. Not quite the same as the real thing (which I've experienced only once).
Anyway, 1 minute into this real player crashed in a heap and then I couldn't reconnect.
Ok, you have some very good points. My SGI admin knowledge _IS_ rather old. However I should point out that I did try to put Irix 6.5 on a (rather old system). I tried an upgrade instead of a full install, which was clearly a bad move. Firstly I was told that inst was too old and it'd offered to upgrade that - fair enough, I have no complaints. However the upgrade of inst then bombed out because it claimed inst was already running. Err, well of course it is. The result? A system without inst, that now needs a total reinstall (or at least a pilfering of the broken components from another system).
I didn't see Gnome, CDE or NFS bundled either, but I get my SGI stuff from the local varsity plan which seems rather poorly organised (with the CDs turning up addressed to any one of three people - I suspect some things have got lost; no fault of SGIs directly though). I've found obtaining licences for varsity stuff a nightmare though; and their tech support weren't helptful either. Again this is local trouble - we have varisty, but no varsity number (and SGI cannot provide it to us, despite the fact that clearly we're on their books as they send us stuff).
So, that said, indeed my knowledge is from older systems. I'll reinstall our SGI systems once we need to make a new release of our software, and naturally I'll use 6.5. (Until recently this wasn't and option though as we need to maintain compatibility with our users. They should all be on 6.5 by now though.)
It's good to see that they have addressed most of my complaints, so maybe they will survive after all. Certainly my experience with 6.2 and 6.3 indicated that they didn't even deserve to survive.
Anyway, some specific points I want to reply to:
Security - SGI _has_ had security flaws this year, but bugtraq seems to discuss mainly solaris and linux (as there are many more users than on SGI). Lots of the bugs discussed also effect IRIX though. Eg the bind vulnerabilities, and the irix specific (I think) rpc.espd. Admittedly it isn't as many as the recent spate of Solaris holes.
Admin - I know how to admin systems and I greatly prefer modifying files rather than system specific GUIs. However I'm really complaining about the policy of adopting a GUI that runs from any user, rather than using existing UNIX concepts such as "su". It just stinks of bad design and no defensive programming (which is important if you take security seriously).
Installation - rpm also tells you the dependencies, but I was really thinking about debian's apt. Anyway I often seem to get into cases of impossible conflicts which cannot be resolved. Maybe this is just due to upgrading systems rather than reinstalling from scratch.
The bottom line? You've changed my views slightly - I'm pleased that things are changing for the better, but I suspect it's FAR too late. We provide precompiled binaries for our software on suns, alphas, x86 linux and sgis. Other the last few years the call for sgi binaries has dropped considerably.
Ok so they're possibly biased, but comparing against all the other biased benchmarks (ie other vendor-generated SPEC marks at htpp://www.spec.org) this looks pretty fast!
Eg it considerably beats the similar clocked Alpha chip (21264B) on FP, but is a little slower for integer. The FP stuff is a huge leap forward though as this is usually an Intel weakness.
It seems to me (and having read the other comments, to many others too) that
SGI are really struggling. I take this announcement simply as another attempt
to keep in the market.
Contrary to popular belief they haven't ditched Irix. They plan to keep going
with it, but to use Linux to increase their market share. I doubt it'll
work. Linux users are likely to buy a cheaper PC anyway. They need to
concentrate on making Irix actually work. I've administered Irix systems and I
know just how miserable it is! I strongly disagree with the "stable" statement
that someone else brought up. Stable relative to Windoze maybe, but not when
compared to other unix based OSes.
Security - laughable. SGI's notion of security is to make all sysadmin tools
graphical, make then setuid root, and then ask for a password. No concept of
keeping high-secure details to a nice small compact "su" program. Result -
virtually EVERY sgi admin tool has been hacked, often by many means. SGI also
used to ship systems with "+ +" in hosts.equiv.
Ease of use - again laughable. It's getting better slowly, but for a long time
you couldn't admin an SGI (except by knowing what goes on underneath) remotely
unless you were also sat at another SGI machine. Their desktop is hideous too.
Ease of installation is hideous too. There's umpteen dependencies to
(manually) resolve for doing the most trivial of things. Nothing seems to come
by default (including NFS), and example which compiler do I want - is it the
"Ansi C compiler", the "C compiler (ANSI)" or the "C compiler". (Ok so that's
paraphrased, but you get the picture.)
Compatibility - ugly. We tried connecting several SCSI CDrom drives to our sgi
and all failed. We couldn't load the installation CDs remotely from another
system as they use an SGI specific format (non ISO-9660). They also refused to
provide CDE as an optional desktop. OK so CDE is hideous, but it's almost as
if they _want_ to be out on a limb!
Maintainability - improving slowly. In the past we've had hideous problems
with supporting software on multiple OS releases. They're not even concecutive
with Irix 6.3 and 6.4 both being splits from 6.2, and only merged back in
again at 6.5.
Support - patchy. Sometimes it's good, but other times it is downright
hideous. We found a large bug in their Fortran compiler. We provided them with
a 10 line source example, but they refused to fix the compiler (or even
acknowledge the bug). One year later (give or take) I mention this to a large
pharma, who use many many SGIs and wanted our software to run (which it didn't acknowledge the bug). One year later (give or take) I mention this to a large
pharma, who use many many SGIs and wanted our software to run (which it didn't
- due to the bug). The very next day SGI release a patch. Right - so I don't
count because there's only one of me, despite paying for support?
Ahh, I feel better for that whinge!
Anyway, as far as I'm concerned the sooner SGI curl up and die the
better. It'll certainly make my life easier!
So now it's actually for sale I'm assuming that the Intel NDAs people had to sign still hold for people who signed them, but new customers don't have to sign such things.
We used an i860 based system here for a while. It was an Alliant FX2800 with (IIRC) 26 i860s for processing power and a further 2 dedicated purely to I/O. It was pretty efficient and could handle multi-processor vector optimisations pretty well.
All well and good, except that the OS (concentrix) was hideously buggy and prone to crashes. In the end it was junked in favour of a 4-processor Alpha box which was faster, had more memory, and cost the same amount as the yearly hardware support for the FX2800:)
I guess the comment was referring to quantum computing as a means of cracking RSA by mass parallel computing.
However your comment is amusing - would the US gov want to do its own research on quantum computing if it knows that the same technology used to break existing codes can be used to make new ones impossible to break.
Perhaps they'd simply be better to stick to what we know now - bigger and more computers are better:)
Somehow I can't imagine the same considerations with sharing technology with China under the current administration;-)
I'm suprised that all this stuff is being made public so soon. We're talking about 1997 for some of this stuff, so I can't believe it's just being released under the freedom of information act. I don't know what the US rules are though, but the UK keep things secret for umpteen years! Even when it becomes public they can refuse to acknowledge it; eg consider public-key cryptography and the subsequent RSA patents, despite the fact that a chap in GCHQ (in England) had already discovered it years before. What's the point of keeping it secret then?
Anyway, I find this stuff rather suspect. If they're really declassified documents, then where is the link to the real stuff? Do they exist on the web, or simply is some dinghy office where access is granted by appointment only (like most of the "public" EU information).
The article also states that there were a further 12 documents which were not declassified "in the interest of national defense or foreign relations". Don't you just love conspiracy theories?:-)
I'm inclined to agree with much of this, corporate users of MS products already upgrade more often. Given that probably the most widely used MS component is Office (ignoring for now the OS), hardly anyone is likely to be in serious business with an old copy of Office - simply because of the (deliberately designed) continual format change.
However, the key issue is price, and what you get for your money.
For example, should buying software by subscription automatically mean that you're entitled to new updates? Clearly when the three years is up and you renew then you'll get new versions, but what if a new version is released part way through your 3-year licence. Are you automatically entitled to obtain that (for free)?
If not, then people will inevitably need to upgrade anyway (due to deliberate incompatibilities) and so the 3 year is just a maximum, with a typical life-span being much shorter.
Ultimately though, I'd like a choice of purchasing options. This is yet one more area where control is wrested from the user.
You're basically making the same mistake as many Americans are accused of: Assuming that the USA means the world.
Now that sounds a fimiliar statement;-)
However there are certain (official or unofficial?) agreements between governments though, so that patents filed in (some) contries will be honoured elsewhere. This suprises me as I'd have thought that the patent lawyers would just love it if more work was produced by requiring patents to be filed in _every_ country.
Personally I don't completely buy Peren's statement that the net effect of patents held by large companies just cancel each other out. They may happen to cancel each other out, but they also prevent new smaller companies from joining the "big boys", which clearly is also in their benefit.
However I do feel, perhaps contraversially, that software patents should exist. What I disagree with is the standard length of patent - it's simply too long for such a fast moving field. (The same applies to many fields, such as genetics.)
With a short-term patent, say 3 years at most, people would be able to protect their design (and investment) from others for a short period allowing them to bring to the market a new product. If they fail in that aim then other people should be allowed to take up the challenge, instead of the patent languishing for years to come preventing further work.
If we abolish patents completely then many ideas will simply become secret technology; no published articles from commercial orgs. It may also reduce the amount of R&D done in such places, which would be a bad thing.
Re:What about the speed at which this happened?
on
Genetic Stone Soup
·
· Score: 1
I agree - extra funding was supplied to HGP because of Celera.
However a key point to take into account is that it's not yet finished. The actual expected date of "finished" sequence is, as far as I know, not too far off the original estimate. They key thing that has changed has been going from "churn it out in high-quality finished data" mode to "churn out a rough draft and tidy up later" mode. The second strategy was employed mainly to prevent patents.
I think you overestimate the influence of any single person in the HGP. The nature paper listed nearly 3000 authors (and I don't consider it to be complete either).
The total task of sequencing goes all the way from mapping into sets of BAC clones, preparing the physical samples, the ABI instruments doing the sequencing itself, base calling, vector clipping, assembly, joining, editing, and finally producing consensus sequences for each clone.
GigAssembler comes into play (as far as I know) on assembling the "clones". Each of these is typically 150Kb or so in size and contains many thousands of overlapping separate sequences (each a few hundred base pairs long), which themselves have already been assembled using different algorithms (eg Alewife at the Whitehead Institute, or Phrap at Sanger and St.Louis, and our stuff (at MRC) for incremental additions and "finishing" work).
I'm not knocking GigAssembler - it sounds like a fantastic achievement, but just putting it all into perspective. This is a HUGE project and it's still nowhere near finished. This is just the start!
I assume he doesn't expect you to write code in
a similar fashion?:-)
A friend of mine was asked to fix a bug in his bosses code, which turned out to be a BASIC program. The program was only 10 lines long, but it contained 13 GOTO statements! After extensive analysis it turned out that 3 of the lines were no longer possible to even get to! It's frightening that it was so well obfuscated, by accident.
I couldn't disagree more. I won in 1991 (and later - eg 2000;-)), just about the time that I was applying for a job. I put 'winning the IOCCC' in my CV, which may sound like suicide to some.
However consider this. Would you like to work for a boss who feels that winning the IOCCC is something to be ashamed from? Or would you rather work for a boss who feels it's a decent bit of fun?
I later found out that after dwindling the job candidates down there were just two suitable people left - I was one of them. My boss told me that winning the IOCCC was one of the factors that helped him to pick me over the other candidate.
So in conclusion - it can actuall _help_ your job prospects!
Just to correct a few inaccuracies. Firstly it hasn't been completed yet. Celera Genomics decided to claim they'd finished it, which spurred the public project to also claim the same. In reality both had only 90 - 95% of the consensus sequence and only then at "draft" quality.
Secondly, I feel it's wrong to claim that Celera were the people to "complete" it. Celera used their own data in conjunction with the public data, and yet they have (more or less) comparable results in terms of coverage, number of contigs, quality and so on. Personally I feel that this is like admitting that their own work doesn't add anything new to the public effort - ie they failed.
The bottleneck with sequencing at the moment is in the "finishing" process - tidying up the results to produce highly accurate answers. This is largely caused due to the randomness of the shotgun sequencing approach. As it's effectively solving a jigsaw puzzle from lots of randomly cut pieces of DNA, in some places you'll get lots stacking up and in others you'll find none. It is unrealistic (not to mention expensive) to keep increasing the coverage so that everywhere gets covered by the random shotgun process.
Instead a fixed depth (only 3 or 4 fold in the draft sequences) is used followed by directed sequencing where the user, or an automatic program, analyses the data set and chooses experiments to perform (primer walking typically). The graphs I've seen of draft vs finished data show quite well how the finishing is lagging behind.
However this new strategy, not being random, will greatly reduce the amount of finishing needed. However note that at present they are using it for probes rather than full scale sequencing. It has great potential, but looks to be years away from replacing the current work.
Anyway, 1 minute into this real player crashed in a heap and then I couldn't reconnect.
Conclusion? The feed was solar powered :-)
I didn't see Gnome, CDE or NFS bundled either, but I get my SGI stuff from the local varsity plan which seems rather poorly organised (with the CDs turning up addressed to any one of three people - I suspect some things have got lost; no fault of SGIs directly though). I've found obtaining licences for varsity stuff a nightmare though; and their tech support weren't helptful either. Again this is local trouble - we have varisty, but no varsity number (and SGI cannot provide it to us, despite the fact that clearly we're on their books as they send us stuff).
So, that said, indeed my knowledge is from older systems. I'll reinstall our SGI systems once we need to make a new release of our software, and naturally I'll use 6.5. (Until recently this wasn't and option though as we need to maintain compatibility with our users. They should all be on 6.5 by now though.)
It's good to see that they have addressed most of my complaints, so maybe they will survive after all. Certainly my experience with 6.2 and 6.3 indicated that they didn't even deserve to survive.
Anyway, some specific points I want to reply to:
Security - SGI _has_ had security flaws this year, but bugtraq seems to discuss mainly solaris and linux (as there are many more users than on SGI). Lots of the bugs discussed also effect IRIX though. Eg the bind vulnerabilities, and the irix specific (I think) rpc.espd. Admittedly it isn't as many as the recent spate of Solaris holes.
Admin - I know how to admin systems and I greatly prefer modifying files rather than system specific GUIs. However I'm really complaining about the policy of adopting a GUI that runs from any user, rather than using existing UNIX concepts such as "su". It just stinks of bad design and no defensive programming (which is important if you take security seriously).
Installation - rpm also tells you the dependencies, but I was really thinking about debian's apt. Anyway I often seem to get into cases of impossible conflicts which cannot be resolved. Maybe this is just due to upgrading systems rather than reinstalling from scratch.
The bottom line? You've changed my views slightly - I'm pleased that things are changing for the better, but I suspect it's FAR too late. We provide precompiled binaries for our software on suns, alphas, x86 linux and sgis. Other the last few years the call for sgi binaries has dropped considerably.
Ok so they're possibly biased, but comparing against all the other biased benchmarks (ie other vendor-generated SPEC marks at htpp://www.spec.org) this looks pretty fast!
Eg it considerably beats the similar clocked Alpha chip (21264B) on FP, but is a little slower for integer. The FP stuff is a huge leap forward though as this is usually an Intel weakness.
Let's hope the prices come down quick :-)
Contrary to popular belief they haven't ditched Irix. They plan to keep going with it, but to use Linux to increase their market share. I doubt it'll work. Linux users are likely to buy a cheaper PC anyway. They need to concentrate on making Irix actually work. I've administered Irix systems and I know just how miserable it is! I strongly disagree with the "stable" statement that someone else brought up. Stable relative to Windoze maybe, but not when compared to other unix based OSes.
Security - laughable. SGI's notion of security is to make all sysadmin tools graphical, make then setuid root, and then ask for a password. No concept of keeping high-secure details to a nice small compact "su" program. Result - virtually EVERY sgi admin tool has been hacked, often by many means. SGI also used to ship systems with "+ +" in hosts.equiv.
Ease of use - again laughable. It's getting better slowly, but for a long time you couldn't admin an SGI (except by knowing what goes on underneath) remotely unless you were also sat at another SGI machine. Their desktop is hideous too.
Ease of installation is hideous too. There's umpteen dependencies to (manually) resolve for doing the most trivial of things. Nothing seems to come by default (including NFS), and example which compiler do I want - is it the "Ansi C compiler", the "C compiler (ANSI)" or the "C compiler". (Ok so that's paraphrased, but you get the picture.)
Compatibility - ugly. We tried connecting several SCSI CDrom drives to our sgi and all failed. We couldn't load the installation CDs remotely from another system as they use an SGI specific format (non ISO-9660). They also refused to provide CDE as an optional desktop. OK so CDE is hideous, but it's almost as if they _want_ to be out on a limb!
Maintainability - improving slowly. In the past we've had hideous problems with supporting software on multiple OS releases. They're not even concecutive with Irix 6.3 and 6.4 both being splits from 6.2, and only merged back in again at 6.5.
Support - patchy. Sometimes it's good, but other times it is downright hideous. We found a large bug in their Fortran compiler. We provided them with a 10 line source example, but they refused to fix the compiler (or even acknowledge the bug). One year later (give or take) I mention this to a large pharma, who use many many SGIs and wanted our software to run (which it didn't acknowledge the bug). One year later (give or take) I mention this to a large pharma, who use many many SGIs and wanted our software to run (which it didn't - due to the bug). The very next day SGI release a patch. Right - so I don't count because there's only one of me, despite paying for support?
Ahh, I feel better for that whinge!
Anyway, as far as I'm concerned the sooner SGI curl up and die the better. It'll certainly make my life easier!
So,... anyone got any benchmarks?
All well and good, except that the OS (concentrix) was hideously buggy and prone to crashes. In the end it was junked in favour of a 4-processor Alpha box which was faster, had more memory, and cost the same amount as the yearly hardware support for the FX2800 :)
However your comment is amusing - would the US gov want to do its own research on quantum computing if it knows that the same technology used to break existing codes can be used to make new ones impossible to break.
Perhaps they'd simply be better to stick to what we know now - bigger and more computers are better :)
I'm suprised that all this stuff is being made public so soon. We're talking about 1997 for some of this stuff, so I can't believe it's just being released under the freedom of information act. I don't know what the US rules are though, but the UK keep things secret for umpteen years! Even when it becomes public they can refuse to acknowledge it; eg consider public-key cryptography and the subsequent RSA patents, despite the fact that a chap in GCHQ (in England) had already discovered it years before. What's the point of keeping it secret then?
Anyway, I find this stuff rather suspect. If they're really declassified documents, then where is the link to the real stuff? Do they exist on the web, or simply is some dinghy office where access is granted by appointment only (like most of the "public" EU information).
The article also states that there were a further 12 documents which were not declassified "in the interest of national defense or foreign relations". Don't you just love conspiracy theories? :-)
However, the key issue is price, and what you get for your money.
For example, should buying software by subscription automatically mean that you're entitled to new updates? Clearly when the three years is up and you renew then you'll get new versions, but what if a new version is released part way through your 3-year licence. Are you automatically entitled to obtain that (for free)? If not, then people will inevitably need to upgrade anyway (due to deliberate incompatibilities) and so the 3 year is just a maximum, with a typical life-span being much shorter.
Ultimately though, I'd like a choice of purchasing options. This is yet one more area where control is wrested from the user.
However there are certain (official or unofficial?) agreements between governments though, so that patents filed in (some) contries will be honoured elsewhere. This suprises me as I'd have thought that the patent lawyers would just love it if more work was produced by requiring patents to be filed in _every_ country.
Me? Lawyer bashing? No, 'course not gov :)
Personally I don't completely buy Peren's statement that the net effect of patents held by large companies just cancel each other out. They may happen to cancel each other out, but they also prevent new smaller companies from joining the "big boys", which clearly is also in their benefit. However I do feel, perhaps contraversially, that software patents should exist. What I disagree with is the standard length of patent - it's simply too long for such a fast moving field. (The same applies to many fields, such as genetics.) With a short-term patent, say 3 years at most, people would be able to protect their design (and investment) from others for a short period allowing them to bring to the market a new product. If they fail in that aim then other people should be allowed to take up the challenge, instead of the patent languishing for years to come preventing further work. If we abolish patents completely then many ideas will simply become secret technology; no published articles from commercial orgs. It may also reduce the amount of R&D done in such places, which would be a bad thing.
I agree - extra funding was supplied to HGP because of Celera.
However a key point to take into account is that it's not yet finished. The actual expected date of "finished" sequence is, as far as I know, not too far off the original estimate. They key thing that has changed has been going from "churn it out in high-quality finished data" mode to "churn out a rough draft and tidy up later" mode. The second strategy was employed mainly to prevent patents.
I think you overestimate the influence of any single person in the HGP. The nature paper listed nearly 3000 authors (and I don't consider it to be complete either).
The total task of sequencing goes all the way from mapping into sets of BAC clones, preparing the physical samples, the ABI instruments doing the sequencing itself, base calling, vector clipping, assembly, joining, editing, and finally producing consensus sequences for each clone.
GigAssembler comes into play (as far as I know) on assembling the "clones". Each of these is typically 150Kb or so in size and contains many thousands of overlapping separate sequences (each a few hundred base pairs long), which themselves have already been assembled using different algorithms (eg Alewife at the Whitehead Institute, or Phrap at Sanger and St.Louis, and our stuff (at MRC) for incremental additions and "finishing" work).
I'm not knocking GigAssembler - it sounds like a fantastic achievement, but just putting it all into perspective. This is a HUGE project and it's still nowhere near finished. This is just the start!
I assume he doesn't expect you to write code in a similar fashion? :-)
A friend of mine was asked to fix a bug in his bosses code, which turned out to be a BASIC program. The program was only 10 lines long, but it contained 13 GOTO statements! After extensive analysis it turned out that 3 of the lines were no longer possible to even get to! It's frightening that it was so well obfuscated, by accident.
I couldn't disagree more. I won in 1991 (and later - eg 2000 ;-)), just about the time that I was applying for a job. I put 'winning the IOCCC' in my CV, which may sound like suicide to some.
However consider this. Would you like to work for a boss who feels that winning the IOCCC is something to be ashamed from? Or would you rather work for a boss who feels it's a decent bit of fun?
I later found out that after dwindling the job candidates down there were just two suitable people left - I was one of them. My boss told me that winning the IOCCC was one of the factors that helped him to pick me over the other candidate.
So in conclusion - it can actuall _help_ your job prospects!
(Yes I know it's a troll, but who cares.)