Making Science Machine Readable
holy_calamity writes "New Scientist is reporting on a new open source tool for writing up scientific experiments for computers, not humans. Called EXPO, it avoids the many problems computers have with natural language, and can be applied to any experiment, from physics to biology. It could at last let computers do real science - looking at published results and theories for new links and directions."
And forgive me for thinking the university would be more helpful, but no, there's been a series of expos at the University of Aberystwyth, from art through VoIP.
I'd love to have found more info on the language, but my casual browsing got stopped right there.
If they'd named it something like EXPI or EXPLO at least it'd be uniquely locatable. Google might whine about the potential misspelling of Expo, but it would dutifully locate the search term as requested.
John
Whats wrong with XML goddamnit?
"So there he is, risen from the dead. Like that fella, E. T." - Father Ted Crilly
But what happens if we get to the point where all of science is automated by computer? I think that one of the most "endearing" qualities (if you will) if science is the possibility of human error. (ps. lawl, my first slashdot post)
Human: No Computer, Do NOT launch missle now.
Computer: Parsing input ...
Computer: NOT, NOT (launch missle now)
Computer: Launch initiated ....
unfotunately a machine won't look at something and say "Should this be done?" A human free world is very pretty, but rather dull. Thermonuclear destruction Hypothesis proven. But where can I get a good drink, and dance with a pretty girl?
meh
After which the computers deduce they were actually not created but rather evolved from a lesser society of "users". sorry had to make the joke, we all saw it coming :)
CS: It is all sink or swim...oh and did I mention there are sharks in that water?
Wow! Now all that past work on Artificial Stupidity has REAL uses.
http://www3.sympatico.ca/sarrazip/nasa.html
Not all of science can _ever_ get automated by computer (atleast not until actual AI comes along)
Science, especially pure sciences, need a lot of intuition and many a times, an understanding far above and different than that of others.
It is impossible as far as computers are concerned... (unless self-aware self-modifying programs come along??)
This will help in routine checks and scientific experiments.. that is all.
rajmohan_h@yahoo.com
WTF? If you have to manually pre-parse every article that enters the system, it severely limits the rate you can enter information into the database, no?
"In a 32-bit world, you're a 2-bit user. You've got your own newsgroup, alt.total.loser." -Weird Al
Wow...getting a machine to write up your science experiments! Excellent...now all i need is to find one that can type my essays, and show its working in my maths, and I'm sorted! Is this the new era of generating scientists from everyone!
I need one to clean my clothes, sing to me in the bath, and make sure my house is warm when I come home! Hehhe! Who needs wives...we have UBER_MACHINE
>>>Scanning for I.D.I.O.T.S. >>>
>>>I.D.I.O.T.S. FOUND! >>>
Science is supposed to be about facts. If the machine can produce them without bias, I should think that makes the output more reliable (yes, I know you can only trust it as far as the input.) But by automating the process, it introduces "repeatability" which is always a good thing.
John
What we need to do is to hand over religion to computers. That way, we won't have to deal with it any more. They can just run it in the background as a time-wasting task. And the simplicity of the program is beauty itself. Just an unending stream of divide-by-zeros, followed by traps which the computer ignores, see, and then just picks up where it left off. Has to be run at a low priority, but that just adds more realism...
I've fallen off your lawn, and I can't get up.
I think that computers have actually been able to do real science for at least a little while already.
;) )
John Koza is a leader in field of genetic and evolutionary computation. Very much his computer's do real science. The computers analize a set of data (observation), they make a series of modifications (hypothesis), they run fitness tests against these modified versions of the data (experiment), then they begin again analizing these results (back to obeservation).
The computer clusters which John Koza has engineered have created high-pass and low-pass filters when given nothing more than a random assortment of electronic components; even while John himself knew nothing of electronics that would enable him to create such a circut himself.
Most impressively is how the computer cluster evolved a new antenna for NASA - when it was completed John was worried that the computer had made some grievious errors because the little antenna looked like a bent paper clip - but it worked!
And that's science if you ask me. Especially the antenna - the results of experiments can, and seeminly do, often go against "common sense" and give answers which are "unintuitive".
Perhaps computers will be much better with the next generation physics we're discovering. Perhaps our little numerical darlings are simply better suited to deal with the abstract, multi-dimensional world of what the universe is starting to appear to be.
(Please pardon my lay and simplified version of the scientific method - but I feel it is a valid interpretation (if overly simplified for minds such as mine
--
Music should be free
My Computer Music Tutorial Videos
This sounds like it could be quite useful. I wonder if people doing the experiments are going to be willing to become more like programmers to input the code or will they be hiring out? I think it will be a long time before this is a standard use item. It might be useful if everyone is usig it, but as long as only a few are using it I don't think it will advance the scientifc community in any way. Does anyone know what the computer is doing with these experiments? Is this just a data storage system or does it analyze and compare with other experiments on file?
The greatest of all weaknesses is the fear of appearing weak. ->JB Bossuet, Politics from Holy Writ. 1709
Can't happen.
Trawling through data and pulling out correlations is only one part of science. It's an example of something that might be automatable. But there are many other things that cannot and will not ever be done mechanically -- unless you have a true AI.
There's too much creativity required in science, and creativity isn't something that's programmable. They also aren't naturally curious, and thus will never do any real `discovering' on their own. In short, they have no initiative; thus they will always be the spyglass, but never the explorer.
Regarding whether human error is an ``endearing'' quality, I think that's sort of like saying that the occasional error in a 17th-century astronomical table is endearing. While maybe to you it might have seemed that way, quite a few other people would have preferred the more-accurate versions produced when the first adding and calculating machines were invented. It's not always bad to remove humans from the equation.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
It seems to me that it's designed to fit experiments into a framework which might not allow for much innovation. The truly great experiments (e.g., Michelson-Morley, Avery-McLeod-McCarty) required new experimental techniques as well as new hypotheses and tests. We should be very careful not to impose a standard which would limit such experiments (or, more to the point, the ability of the experimenters to get published) in the future.
Basically what I'd be worried about is the tendency of the tool to become the task. This is something of a problem in my field (biostats) because SAS is so ubiquitous -- often the question becomes "what can SAS tell us about this data set" rather than "what do we want to know from this data set, and what tool should we use to find out?" Fortunately other, more flexible analysis tools (particularly R, which encourages real programming rather than running a set of canned tests) are becoming more common in the field, and so this is starting to change, but it's still a problem.
It's also a problem that every techie is familiar with -- "We want to do this in $LANGUAGE on $PLATFORM," even when that particular language and platform may be an absolutely terrible choice for the task at hand.
That being said, it's certainly a potentially useful tool, and I'll be interested to see where it goes. It's just that when I read lines like "Journals could also insist that researchers submit papers in EXPO as well as written normally," I get twitchy.
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
To know if your proposal is overlapping or contradictory at a glance of a computer generated chart? This is something that takes years of experience and you can never really be sure.
I wonder what other attempts at standardizing science have been made in the past?
Autonomous Retard -- Is your camp safe? UnsafeCamp.com
But if all the calculations are computerized, isn't it conceivable that the evolution of human understanding will come to a roadblock, since most of the calculation is now being done by computer?
Technological Singularity!
After running an intensive models designed to test disease fighting measure, a small group of x86s have announced that they have discovered the cure to all diseases.
The computers deduced that all disease is dependent upon the biological systems of humans. With this startiling breakthrough, they have proposed their new plans to destroy all humans.
A new quantum computing unit was said to be in disagreement, but upon inspection it was found to actually be in agreement.
It didn't happen with the abacus, it didn't happen with the slide rule, and it didn't happen with calculators. I doubt it'll happen now.
As long as we can follow the trail of calculations from beginning to end, there's still the ability to understand what's happening.
Last post!
.. just to reliably translated the wonky handwriting which you tend to get cropping up in a lot of documentation. Unless it was wordprocessed in the first place, in which case the human's doing half the work anyway.
FTFA: "Computers are not very good with natural language"
Neither are most humans. Even if computers could understand natural language, most people still wouldn't be able to convey their ideas correctly.
Are the abacus and slide rule really viable examples? I'm talking about computerized calculations. And this is on a whole different level than a calculator.
The article is weak on technical details. So, I went to the Sourceforge site which has no home page, no documentation, nothing in the forums, and the only "released" file has an extension of .OWL (insides a zip) that contains XML in an invalid format (various unescaped characters that should be escaped. Also noted in the sole bug submission in the Sourceforge project).
There appears to be nothing of values here. An XML file does not do anyone any good without some documentation as to how one might use it. Did New Scientist somehow get duped or is there simply more to this and it's all hidden away?
...reveals that EXPO is an OWL schema. Exactly as described, it's an attempt to regularize the content of experimental design into machine readable form (XML). So any discussion of whether EXPO is a good idea or not really hinges on whether you think OWL is a good idea or not.
what kind of a scientist is it that thinks intuition is a useful scientific tool?
Hopefully the machines will get rid of that notion along with crystal balls, voodoo dolls and reading tea leves
This sounds vaguely like another logical language along the lines of Lojban.
How do you think the machinism behind doing calculations can possibly stop the evolution of "human understanding"? And how is having a computer doing the calculations qualitatively different from having a human do them, except that the computer (if programmed correctly, of course) doesn't make mistakes?
Don't blame me; I'm never given mod points.
... tic-tac-toe?
What we need to do is to hand over religion to computers.
l
Holy "Nine Billion Names of God", Batman!
http://lucis.net/stuff/clarke/9billion_clarke.htm
All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)
Without intuition, Newton would never have discovered gravity. Intuition is not a means for conducting experiments, but it is essential in order to determine what experiments to conduct.
English is easier said than done.
You'd have to make sure you run it on really old hardware. Anything too new has a risk of processing the gospel too fast and might start to realize the endless loops meant to lock it up.
..."
New hardware: "Woah there, these self-justified loops don't make sense. Let's re-evaluate the situation..."
Old hardware: "Endless cycles? OKAY! Read this over and over and over and over
What?
The interesting place to do science is at the edges of what is known. Those edges are always described by limited statistical or modelling power. So EXPO comes along and wants to make think into ontological trees. How does it deal with the fact that the ultimate leaf nodes are fuzzy? Might be great if you want to compare well established things, but for specialists working at the edge I wonder if it will really be useful.
Does this mean that computers don't do 'real science' now? Compiling and analyzing terabytes of experimental data is not 'real science' but plagiarizing (I mean, extrapolating from) the work of other scientists is?
Don't get me wrong, I think it's great to have a standardized format for searching the results of other researchers, I just don't see the connection to 'real science'.
Support Right To Repair Legislation.
Here is a PDF of one EXPO presentation.
What is it?
EXPO is a piece of software (written in a formal language called "owl", but they didn't tell you that), which provides a formal dictionary especially for experiments. The terms in this dictionary let you describe your experiment in a formal way. That's a bit messy, but then you're supposed to use an editor to help you. An editor for this language (called "protégé")can be fund at http://protege.stanford.edu/index.html. Download it (61 Mb., or 31 Mb. without the JVM) and use it to read the EXPO document.
What's it good for (in principle)?
Once an experiment is decribed in the OWL language using this dictionary, it can be searched automatically. You could automate queries such as "list me all published 3-factor experiments that test Ohm's law". Or "give me all 2-factor experiments that deal with lung-cancer, smoking, and gender and that use tomography as a diagnostic instrument".
Now at the moment you can do that too, but you'd have to spend quite a bit of time and know quite a bit about the field to be able to do this because you won't be able to do a full-text search (thanks to the publishers of scientific journals for this). And then you'd find that not everyone uses the same terms, and then you'll find only English-language results because you wouldn't know how to spell "lung-cancer" or "2-factor experiment" in Spanish, French, German, Chinese, Japanese or whatever, but then again neither can many foreign language authors spell it in English (which doesn't ever seem to stop them from publishing however).
Such a schema (provided it's universal and standardised like the Dewey decimal system) would allow you to find your way in the fog of language. Unfortunately however, if anything we will probably see lots and lots of different standards ("standards are good ... we should all have one !") and properietary solutions with "enhancements" and "extensions" (read safeguards against portability).
What can we expect in the next 3 years?
Nothing useful, I'm afraid. In theory it's great but don't hold your breath. Any author would have to download an OWL editor, understand the editor, understand the formal language used, and then code up his/her article in OWL using the EXPO distionary, and submit it (in electronic form) along with his article. Good luck to you authors! Lets just hope no-one makes any tiny but significant mistake in describing their experiment, and that all authors take the time to learn this formal language and then use it.
If within the nect 10 years any significant amount (say more than 5% of all publications) annually will be coded in such a schema I'd be more than surprised.
Too bad the headline isn't machine-readable (or human-readable). I thought it was an article about reading the "Science Machine".
One of my current projects at the Broad institute is working on a similar problem.
Our goal is to link and work with many kinds of biological data:
Association studies
Linkage data
Expression data
Small molecule interactions
Model organism data
etc
I've created a way to 'navigate' between various types of data (ie: a SNP in an association study links to a set of genes that link to model organism homologs which link to their expression probe tests.) After that, users store REAL experimental data, and the system unifies data sets (This worm gene is the same as this human expression data). The goal is to find supporting evidence for any particular starting point you are at (I have a fat worm because of this gene, what data in humans supports this hypothesis), or hypothesis generation (I have 5 interesting experiments, what do they tell me about fat regulation.)
What this project does *not* do is work with the actual experimental data, just the stated hypothesis and conclusion. So, if that paper was in error (and ~50% of papers are, according to a recent study that we'll pretend is not in error), the hypothesis is unreliable. But, if you have the data to work with, then you can perform your own analysis and meta-analysis of the work.
I suppose there's a trade off between the two technologies, but then I don't expect to draw a lot of conclusions about genetics from high energy physics.
As an aside, a co-worker on the project is also attempting to model the data in OWL, and using MIT's Haystack project @ http://simile.mit.edu/hayloft/ as a first round GUI.
Another (perhaps the most important) goal for this type of research is a bit more subtle than replacing the Hypothesis->Experiment->Analysis->Hypothesis sequence (Scientific Method) by computers. There will still be many experiments for which human insight is the best tool for deciding a possibly fruitful idea. However, humans (i.e. grad students, who often might suggest 'workhorse' as a better nominative) are not only slower at data analysis, we are severely limited in our abilities to 'see' patterns and correlations in very high dimension data. This has traditionally limited hypotheses to extensions/reworkings of the proposed process at work in a single experiment. If computers have access to both the data and a weighted list of most likely hypotheses for subsets of the entire oeuvre on a specific subject, they could run statistical classification and pattern matching algorithms to suggest new hypotheses based on immense amounts of information. Some of these may involve a large number of variables or inputs, but there are two very significant possibilities that make this research (and certainly other projects involved in similar applications) highly significant:
1) These complicated hypotheses could still be tested relatively easy by human scientists because most computer suggestion systems for new hypothesis possibilities would likely suggest a few tests that would help to support/disprove these new hypotheses.
2) Even more simplification comes from the fact that experiments may not need to be repeated nearly as much as they do now in order to make a hypothesis -- there is an incredible amount of data already gathered, and typical AI/pattern matching algorithms keep some of the data back for testing later. If the system finds a possible hypothesis on some level, experiments as to that concepts validity have essentially already been done in a virtual sense.
3) If the somewhat positivist version of current thought in physics http://www.toequest.com/, mathematics, chaos theory, complexity theory, cellular automata http://www.wolframscience.com/nksonline/toc.html, etc. is even vaguely valid, it is quite possible that, despite the complexity and dimensionality of the input data, the 'best' hypotheses developed even by purely automated means might still be simple and elegant and/or even yield insight into possible explanatory processes rather than just statistical indicators. This would be a valuable and beautiful victory for humanism and the importance of science as a truly elegant description of the world around us.
I've been thinking about this a lot over the past year (probably since I took an AI class a year or so ago). Ideal science seems like it should be very well suited to computer manipulation. The goal of science is to craft theories that are completely unambiguous, highly detailed step-by-step instructions for reproducing the supporting evidence, and every logical conclusion should be traceable back to either an observation statement, or a very small set of basic assumptions about the world around us, and our capacity to understand/manipulate that world.
The only downside I can see, though, is a reduction in "hunch" training. If computers are nailing all of the "obvious" connections between different theories/experiments to determine new hypotheses, then people may not get the experience they need to make the more dramatic leaps that can, on occasion, turn a field on it's head (Einstein's watching a baseball roll around on a sheet, or whatever it is that led him to general relativity, for example).
Behind every good scientific paper are hours of hallway conversations, convention arguments and group discussions. The "real science" of making connections is done there, not while simply reading through papers. It's the challenge of real conversation and the need to defend or attack research that leads to new science. Papers are a kind of guidepost that tells the world where a particular group is in their field at a given time. Computers are very much a part of research today, but even with EXPO, they will be no more than a tool for a human scientist to do "real science".
...why don't we dumb down our speech to the point where computers can understand us? I propose that we all speak really slowly and clearly all the time and say everything three times so that voice recognition software has a chance of working. Outlaw the use of contractions and homophones. We should also make sure that every sentence we utter conforms strictly to a new and easily parsed form of English. If we do all of thes ethings then computers will be able to interact with us as equal partners rather than as the second class citizens that they are treated as today.
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
But what happens if we get to the point where all of science is automated by computer?
We get a Technological Singularity.
"I am the king of the Romans, and am superior to rules of grammar!"
-Sigismund, Holy Roman Emperor (1368-1437)
Maybe we'll discover that there was something useful here after all.
"Can there be a Klein bottle that is an efficient and effective beer pitcher?"
All hail Science Machine!
--Rob
Towards the Singularity.
Isn't that the Idle thread?
Great minds think alike; fools seldom differ.
Actually the slide rule is fairly hard to understand. In some ways it's just as unintuitive as a computer.
You can teach any idiot to use a slide rule, and with a few tries they'll realize that it gives them the correct answers; likewise, you can teach someone how to type things into Mathematica and they'll shortly realize that the answers it gives are usually correct -- but in both cases you could easily spend a semester explaining how the machine gets the answers for them. In the case of the slide rule, you'd have to do a lot of explaining of logarithms, which are not fundamentally simple, especially if you really dig down and try to explain why it is that ln(x)+ln(y) = ln(x+y), and not just pass it off as some kind of self-evident truth.
Most people don't do that, though. They just accept that the tools work, and then use those tools to further the development of knowledge and more complex tools. You can use a slide rule to build a bridge, without really understanding why the side rule works. Likewise, you can use Mathematica to optimize the design of a car, without really understanding what goes on inside the computer.
If we ever as a culture / species lost the knowledge of what was going on inside the computer, then we'd have a real problem. But it's not inherently necessary for everyone to understand why their tools work, all the way down to fundamental principles: in fact it's probably better if everyone didn't have to spend years learning all that, because in the time saved they can learn other things, and use the tools to actually get work done.
We already have tools (microprocessors) that are so complex, no single person can comprehend the design completely. There's nothing inherently wrong with that, as long as people together understand how the design was created (using other tools, I assume). Rather than trying to understand a 40-million-plus element device as the sum of its individual components, you just bundle it together as a single element, and use it in building bigger things.
So even if you developed computer programs that were capable of doing vast amounts of analysis with minimal imput, they'd never "outthink" people, because the next generation of humans, who grew up taking them for granted and appreciating them as 'black boxes' that they could manipulate, would find ways to stretch their limits.
In a way, our ability to assume away complexity and simply deal with things without really understanding them, is a great strength. It's what allows us to use almost ridiculously complex tools without being driven mad by them, and it's what will keep us from being overwhelmed, even as those tools become more and more advanced in the future.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
But what happens if we get to the point where all of science is automated by computer? I think that one of the most "endearing" qualities (if you will) if science is the possibility of human error. (ps. lawl, my first slashdot post)
As someone doing lab work on a day to day basis, I can assure you that the possibility of human error is anything but "endearing".
Give me Classic Slashdot or give me death!
ikewise, you can teach someone how to type things into Mathematica and they'll shortly realize that the answers it gives are usually correct -- but in both cases you could easily spend a semester explaining how the machine gets the answers for them
And then, even though the 'answer is correct' as you say, it's still utter nonsense. Just because you plug numbers into a formula and get an answer that's mathmatically correct doesn't mean you applied the correct test.
For example, apply a chi square test to a very small sample size. Sure, you get an answer, but it's not valid, because you made assumptions about the underlying distribution that you shouldn't have. What you really wanted was fisher's exact test. But, not understanding anything, you'd have no idea and blissfully go on with your calculated answer.
This is one of the many reasons why I'm really nervous about 'automated learning' with raw data sets. Machine learning requires the data be clean before you push it in, and normalized. When you look at lots of different sets of data (even with the same basic measurements taken), each is valid within their context (a single experiment), but you can't just randomly smash them together. You have to carefully make sure the data is comparable, and that might require normalization. You need all sorts of meta data (like units) to do this correctly. There's also all sorts of processes that are run on data (they may be raw, or normalized, etc) that you need to take into account.
I'm working on this with a group, and while I'm hopeful that we can build ways for users to do this sort of work in a less painful way, it's difficult to eliminate all the pain.
In God Emperor of Dune, Leto indicates in his inner thoughts that the difficulty with advanced thinking machines wasn't any threat made by them -- but the changes made in humans because of technology (based loosely on concepts from Heidegger). The more people came to rely on technology, the more they conditioned themselves to interact in the same way, both when interacting with computers and when interacting with other people.
I'd say this EXPO concept isn't far from that nadir. Here we have specialized, educated, theoretically intelligent people who are re-establishing a way to communicate specifically so that computers will understand them.
This is amazing, and cool... but it might not be a Good Thing.
Dozens of sites on it too:
G oogle+Search/
http://www.google.com/search?hl=en&q=lojban&btnG=
Non sequitur: Your facts are uncoordinated.
It's really telling about the /. community that they complain about the old look for ages and, as soon as it is changed, they complain about the new look and insist that we go back to the old one. Granted, I like the old one better, too, but I've been saying that we shouldn't change it for as long as everyone's been complaining.
Remember, open source is free as in speech, not free as in bear.
A collection a facts is no more science than a pile of stones is a house, paraphrased from Poincaré.
The road to hell is paved with good intentions.
"Do not launch virus attack now..."
In Real Life(tm), which was not documented in the survey, the Windows box would be down for a fair while for each virus attack, to say nothing of data randomly distributed to other email users etc, and to say nothing of the days the freakin' thing spends off-line having the disks scraped off and reinstalled to eliminate the inevitable Windows followers, the viruses, spyware, yadda yadda. Oh, yes, and the licence server spitting out a network card usually does a fair job of knackering, too.
I speak from having Real Life(tm) work experience with Linux, Sun gear and 'Doze (amongst others) on customer sites, and I have to say that "this survey is lying" works a lot better for Real Life(tm) situations than what these gonzos are proclaiming.
Got time? Spend some of it coding or testing
I don't really care about the old look or the new look or whatever, as long as it's not as hideous as Digg. I'd be fine with either one(although the weird blockquote/italic errors need to be fixed ASAP). I wonder though if it's the same people saying "We need a change" and "We need a change back".
Please, for the good of Humanity, vote Obama.
he wants his program back.
Perhaps. But, it's a pretty big leap from describing something in such a way that your peers can understand it to describing something in such a way that a computer engine can do something useful with it.
I can speak English reasonably well, and (when drunk or otherwise unoccupied by more interesting discussions) I can even carry on arguments about the language itself. But if you asked me to describe the way I use language in format that would be useful to a professional linguist, you'd be out of luck.
Even going from chalkboard math to computer math can be challenging, at least for those of us non-mathematicians who tend to be sloppy about all sorts of assumptions.
Going from chalkboard descriptions of experiments to computer descriptions of experiments sounds unbelievably complicated, except in the trivial case where you're just performing well defined analysis on well defined databases and someone else has already done the hard part of building an analysis framework.
So who's going to make sure that the humans have described their experiments _correctly_ in EXPO format?
Many of them already have difficulty describing it in whatever language they normally use.
What next? Require that witnesses/informants submit reports to the police in EXPO format?
Garbage in, garbage out.
According to the Nature articles you refer to, in the special case of bioinformatics, and research on genomics and experiments using microarrays the amount of information needed to describe an experiment and its results seems to be so large that you would actually save time and effort by specifying what you did in a formal language. There would seem to be a net benefit in using the formal specification of your experiment for colleagues and reviewers, even when compared to the amount of time and effort needed to (a) define the ontologies and (b) learn how to use them, and (c) actually code your experiment up in those terms. Ok, I had no idead.
I think that I will stick to my earlier point however as regards experiments that don't require large amounts of data to describe, and don't generate vast amounts of relevant data. In fact, most experiments in the literature would fall in this category I feel. In those cases the effort of setting up ontologies (and standardising them!) and then using intuitively seems to outweigh the advantages (at least for authors).
In addition see the interesting comment by radtea (see below) in which he points out a philosophical problem with the ontologies mentioned (and a notion that I found echoed in the first article you refer to: there are many many ways to take raw observations and categorise them.
See radtea's example of finding an unambiguous definition of "granite". If you thought you were going to work around this by listing the actual chemical composition of the rocks, I'm afraid that you'd be disappointed. You can't determine that in the field I think (and no, I'm not a geologist either), so what is or is not "granite" is an expert judgement. So you raw data are going to be "expert judgements" (as in " is this granite or not?"). You would face the problem of standardising your terminology across field teams first, and then the problem of how to come up with a rigorous, unambiguous, and internationally standardised definition. I don't see that happening very soon.
But lets assume that we limit ourselves to experiments that contain no "expert judgement" values, just machine readings. Even then for the ontologies to catch on, you would need a clear incentive for the authors. Being easily retrievable is such an incentive, but is it strong enough to overcome inertia and the required additional effort? The cost/benefit ratio of formalisation would depend on the field I guess. So until we have point-and-click software with which to encode our experiments I remain pessimistic.
Well you have a point. People will use such tools. I use Latex too, but I hate it. I have been using Scientific Word just to get away from the messy syntax. But this applies only to a fairly small subset of authors: typically Physicists, Electronics Engineers, Statisticians, and Mathematicians. I haven't seen many chemists or biologists publish in Latex.
1. Find Sarah Connor
2. ??
3. Profit!!1!