Software Emulates Organism's Entire Lifespan
An anonymous reader
"Scientists have developed a software simulation, running on 128 computers, of an entire organism, a step toward carrying out full experiments without traditional instruments (abstract). 'For their computer simulation, the researchers had the advantage of extensive scientific literature on the bacterium. They were able to use data taken from more than 900 scientific papers to validate the accuracy of their software model. Still, they said that the model of the simplest biological system was pushing the limits of their computers. "Right now, running a simulation for a single cell to divide only one time takes around 10 hours and generates half a gigabyte of data," Dr. Covert wrote. "I find this fact completely fascinating, because I don’t know that anyone has ever asked how much data a living thing truly holds. We often think of the DNA as the storage medium, but clearly there is more to it than that." In designing their model, the scientists chose an approach that parallels the design of modern software systems, known as object-oriented programming. Software designers organize their programs in modules, which communicate with one another by passing data and instructions back and forth. Similarly, the simulated bacterium is a series of modules that mimic the different functions of the cell.'"
We often think of the DNA as the storage medium...
You might, but I'm betting physicists think differently. It all depends on the information to which you're referring.
I always figured OOP had some usefulness when used properly :)
So how much extra data do we need for a parrot?
Well I, for one, welcome our uploaded lobster simulations, and the following Vile Offspring overlords :)
Thank God this is slashdot, where submissions contain poor explanations about what OOP is, yet leave out interesting informations such as which bacterium.
For a moment, I read that as Software Emulates Orgasim's Entire Lifespan.
Life is not for the lazy.
I wonder how naturally an object oriented design worked out, given that molecular pathways are extremely complex and there are causal links between almost any pairs of phenomena. While OO is OK for CAD and man-made things, nature was much less restrained about high cohesion, low coupling, encapsulation and other heuristics. So the details would be interesting about inheritance, state representation, graph complexity, time-varying behavior etc.
Software Emulates Orgasms in Japan
The Internet King? I wonder if he could provide faster nudity.
I think the important thing to remember is that DNA is destroyed and replaced constantly whereas the experimental model is likely being documented at every possible point. There is no realistic comparison to the memory usage in a computer-generated model vs. biology. I can honestly say nobody remembers what that skin cell on my pinkie toe was doing yesterday. Somewhere in there the neighboring cells recognized it died and the dna got copied into a new cell.
How are they calculating protien conformations? I can't believe this is being calculated in realtime on only 128 processors.
And so it begins.
Mycoplasma genitalium. No jokes, please. This is Science.
"No fear. No envy. No meanness." Liam Clancy
Giga Pets were awful things. Can you imagine the facebook tie in to growing and nurturing your new organism. Organismville.
It's just a model of gene-expression and metabolism. Not exactly what I would call emulation. They haven't generated any hypotheses with it which have been found to be true, so of course, there is no reason to think it has anything to do with anything.
Some colleagues of mine did just this thing on a smaller scale in a different system last year, it didn't go too well and generated absurd predictions which were so assumption heavy as to be interesting only as a theoretical exercise.
It's not like they can tell you what will happen if you treated the bacteria with a drug by running a simulation.
What? You mean the dividing cell doesn't just call fork() ?
Software designers organize their programs in modules, which communicate with one another by passing data and instructions back and forth.
You communicate by passing data? Really? I thought it communicates using the magic tooth fairy.
I thought we didn't know exactly how a cell divides. Something about thousands of little strands inside a cell that help guide things to where they need to be. How do those strands get there and how do they know where they belong? I'm really fuzzy on it. Does anyone know what I'm talking about?
"Right now, running a simulation for a single cell to divide only one time takes around 10 hours and generates half a gigabyte of data," Dr. Covert wrote. "I find this fact completely fascinating, because I don’t know that anyone has ever asked how much data a living thing truly holds.
Wrong paradigm.. I can create a 100k program that generates that much data. DNA is storage and instructions, but it creates more than it holds from that small data set.
Silence is a state of mime.
At what level are they simulating the organism? Is it at a gross molecular level, or are they simulating each individual electron transport path through the Krebs Cycle, or somewhere in between?
Some mornings it's hardly worth chewing through the restraints to get out of bed.
Is this some ploy by the united front of Object-Oriented Programmers to convince the world that multiple inheritance isn't as bad as everyone thinks it is? I'm sure the global union of Object-Oriented Programmers is already planning a seminar on how Interfaces are appropriate subsitutes for multiple inheritance.
Not 3D printing, not space colonies. LIFE and how it works!
It would have been a lot easier to created a virus lifespan analogue with OO technique, especially if their dev platform was windows. ;->
But I guess these guys are most likely stuck on doing things the old fashioned way on an xterm
> "generates half a gigabyte of data"
> "I find this fact completely fascinating, because I don’t know that anyone has ever asked how much data a living thing truly holds."
Someone should tell them to set the log level to FATAL.
It's a distributed processing system with interprocess (regardless of the node on which the process resides) message passing.
While some, or even all, of he process modules MAY have been written in a object-oriented language AND style, this sort of processing predates all of the OOP languages and nearly all of the literature.
If I wanted to get there quickly and scalably, I'd use the distributed systems created for weather or nuclear simulations as a starting point, since intracellular activity has no small amount chaos (for example, due to Brownian Motion, the collisions and binding of various transmitters is not directed to a specific site on a specific RNA strand, but may attach to any compatible site along its trajectory).
Long time ago :)
http://www.bitstorm.org/gameoflife/
Ok, so maybe it's not as sophisticated as this version...
Visit the Arcade Restoration Workshop @ http://www.arcaderestoration.com
601.
Don't get me wrong (I think that's really cool and stuff)...
But I wonder what will take to simulate a few amoebas in a petri dish, what is not exactly the apex of biological complexity.
Obviously multicellular organisms are out of question for some time still.
From the summary:
In designing their model, the scientists chose an approach that parallels the design of modern software systems, known as object-oriented programming.
No wonder the program is enormous!
and I thought "Damn, that must be a pretty short program.
while(1)
{ fork(); }
Done.
"I find this fact completely fascinating, because I don’t know that anyone has ever asked how much data a living thing truly holds." To posit that information came from ignorance (vacuum, nothing) is astonishing. To posit that it arose with a mere 13,700,000,000 years boggles the mind at the imagination of some people.
Cranky educator.
This is cool, but as I read it here (and someone correct me if I'm wrong), it's no substitute for doing a real experiment. I'm going to launch into a long explanatory diatribe - models like this one can be VERY useful for hypothesis generation, or to try and understand seemingly disconnected results that (very often) arise in a biological experiment. They are especially useful when you have some hypothesis/theory of how a complex system is governed and you need to generate some prediction which you can experimentally test based on your theory.
But not a substitute for the real experiment, no way no how. Why? Because living things aren't designed, and they don't respect your modularity, abstract data typing, etc. etc.
For example, suppose your bacterium starts making some huge amount of a membrane protein (a common thing you do in the lab, for reasons outside the scope of this example). What's going to happen?
Well, that protein is going to try and fold up in the membrane, but as you make more and more of it, the protein is going to fail to get there. Other proteins destined for the membrane are going to experience the same problem. Are you going to update every single module that contains something membrane bound, to reflect this? As they accumulate in the membrane, the membrane curvature is going to change, and this in turn is going to change the relative concentrations of various lipids on each leaf of the membrane, which alters the chemistry of everything that interacts with the membrane in any way (a whole bunch more modules.)
Even if you have those effects covered, they're going to have indirect (and non-linear) effects on the concentration of various ions in the cytosol (all of which, just for starters, interact with the inner membrane with different affinities), the excess protein is going to start accumulating in inclusion bodies which are going to start taking up physical space inside the cell. These two changes alter the likelihood of interaction and the energy of interaction of every single other thing going on in the cell (!). So good luck with that.
That's just one example. The same thing would happen if you sheared the DNA, or heat shocked the cell, or put the cell in an environment of rapidly changing nutrient concentrations. To put all that in CS terms - the actual cell isn't object oriented, there's all sorts of cross-talk between the different components (because they're physical objects in a little tiny soap bubble, they're bumping into each other) and no abstraction layer or anything of that kind.
To be quite honest, I am of the opinion that a living cell is an irreducible system, and the only way you'd get a real substitute for experiments on actual cells would be JUST MAYBE if you ran a molecular dynamics simulation on all 10^14 or so atoms; and if you did so with a much better physics engine than we have now.
The good and new comes from no quarter where it is looked for, and is always something different from what is expected.
These so-called constraint flux based analysis (FBA) models are a big fraud. Perhaps most people in this field do not even realize this, but I think such models are of little direct biological value.
Some time ago, I switched from experimental physiology/mol biology to computational biology. As part of my job, I was overseeing two students starting on such FBA models. What these models generally do is ask a genome or expression database for a list of all expressed genes. Then filter out things that have a biochemical annotation (from www.genome.jp/kegg for example) to make a network of the biochemistry.
Because kinetic info on the enzymes is usually sparse, it is ignored. Instead, steady state is assumed and input/output fluxes+maximum fluxes on every reaction are guestimated. This creates a linear problem that is contraint by the network topology and the max/min rates for every reaction (which are mostly guessed). The solution space is a couple of million (depending on the nr of reactions) options.
Although methods exist to analyse the solution space, it obviously does not tell you that much (most of the solutions are bullshit and for example thermodynamically impossible). A great innovation was the definition of an "objective funtion" which assumes that the organism is perfectly optimized in something. Usually growth rate. Thus, you optimize your system of reactions to yield the maximum biomass. This gives only 1 solution.
My students analyzed a couple of papers published. In one of them (a photosynthetic organism), the optimization caused the respiratory cycle to run backwords (effectively fixing carbon) because of this optimization. It is perfectly valid from a model perspective, but it is completely contradictory our current biological understanding, and shows how silly this "objective function" is. However, the authors of such papers only look at individual pathways or reactions when it is correct, they do not highlight problems. It is impossible to talk about every reaction in a paper anyway, so nobody notices (unless you go over the model yourself for a couple of weeks). Other problems I noticed in incorrect annotations, incorrect reaction stoichiometry. There are many many errors in a lot of the published works.
There are more problems. Regulated exchange between compartments in eukaryotic organisms (from cytosol to mitochondria and such), is often ignored. In multicellular organism the method does not even make sense because only a single cell type is modelled (and assumed to aim for max growth).
I tried to contact the authors of such obviously erroneous works. Without much luck. Most of them are CS graduates. They have little understanding of the biology, and do not care. The models work and they publish in high IF journals. So why bother that a proton is missing from a reaction?
Saying that, as in the FA, this models the complete organism is wrong. It makes many assumptions, mostly focusses on the biochemistry (although some regulation can be included) and is generally steady-state. It is not predictive (the models are usually tweaked to whatever experiments are shown in the same papers). For bacteria, it might work to some extend, but although it has been claimed that it has lead to some discoveries, I think this has been mostly retrospective prediction....
So far it is more of an assembler, but it includes all the "header files" for basic life functions like cell_wall.h, DNA_replication.h, ribosome.h, etc. Each of the header files describes the DNA code for all the needed proteins with all the switches needed.
It is called YADA.jar (Yet Another DNA Assembler)
Right now, the "printer" to get a real organism is cumbersome, but you can run the whole life form as a simulation. If you are Google, you may even grow real humans, and give them all kinds of fun functionality.
don't cut it off www.mgmbill.org
...the most complicated and pricy Tamagochi I've ever heard of.
Maxis SimLife
The idea of imitating a simple bacterial organism as a piece of software came through my mind a couple of times in the past. Although, it's really interesting, I think there is a fundamental problem with it. How deep the simulation has to go? To properly simulate a living organism, you would have to simulate more fundamental ideas laying inside, like chemistry behind substances involved or physics on a molecular level. As far as I know, there are still parts of these matters that have not been explored / discovered yet. It's pretty impossible to simulate something that you don't really know about everything.
Also, regarding the data contained in a cell. It's pretty obvious there is much more than just DNA. One example may be millions of different states that the cell can be at any particular time throughout its lifespan.
xkcd
Artificial Intelligence bachelors at Sussex on a genetic algorithms module. I did almost exactly this only using one machine, a Sun Ray. I wasn't the only one either, it was a fairly popular project. Mine was a cow called PolyCow... it and its virtual offspring were represented as polygons. This is over a decade ago now... give me a break.
This is not new, it's just a bit bigger.