Making Science Machine Readable
holy_calamity writes "New Scientist is reporting on a new open source tool for writing up scientific experiments for computers, not humans. Called EXPO, it avoids the many problems computers have with natural language, and can be applied to any experiment, from physics to biology. It could at last let computers do real science - looking at published results and theories for new links and directions."
And forgive me for thinking the university would be more helpful, but no, there's been a series of expos at the University of Aberystwyth, from art through VoIP.
I'd love to have found more info on the language, but my casual browsing got stopped right there.
If they'd named it something like EXPI or EXPLO at least it'd be uniquely locatable. Google might whine about the potential misspelling of Expo, but it would dutifully locate the search term as requested.
John
Whats wrong with XML goddamnit?
"So there he is, risen from the dead. Like that fella, E. T." - Father Ted Crilly
WTF? If you have to manually pre-parse every article that enters the system, it severely limits the rate you can enter information into the database, no?
"In a 32-bit world, you're a 2-bit user. You've got your own newsgroup, alt.total.loser." -Weird Al
Just the spot for an observation - I think the problem with 'double negatives' has to do with emotional versus logical thinkers. Emotional, or romantic types, see an extra negative as a cumulative emphasis - using a negative twice means a more forceful 'no' than just one. Logical, or classical types, see it as canceling like a mathematical operation. Of course it's not always that clear cut with lots of exceptions, as even an emotional type will read 'not false' as 'true', etc.
try { do() || do_not(); } catch (JediException err) { yoda(err); }
It seems to me that it's designed to fit experiments into a framework which might not allow for much innovation. The truly great experiments (e.g., Michelson-Morley, Avery-McLeod-McCarty) required new experimental techniques as well as new hypotheses and tests. We should be very careful not to impose a standard which would limit such experiments (or, more to the point, the ability of the experimenters to get published) in the future.
Basically what I'd be worried about is the tendency of the tool to become the task. This is something of a problem in my field (biostats) because SAS is so ubiquitous -- often the question becomes "what can SAS tell us about this data set" rather than "what do we want to know from this data set, and what tool should we use to find out?" Fortunately other, more flexible analysis tools (particularly R, which encourages real programming rather than running a set of canned tests) are becoming more common in the field, and so this is starting to change, but it's still a problem.
It's also a problem that every techie is familiar with -- "We want to do this in $LANGUAGE on $PLATFORM," even when that particular language and platform may be an absolutely terrible choice for the task at hand.
That being said, it's certainly a potentially useful tool, and I'll be interested to see where it goes. It's just that when I read lines like "Journals could also insist that researchers submit papers in EXPO as well as written normally," I get twitchy.
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
It also depends heavily on language. In many languages, repeated negatives are explicitly used to emphasize the negative nature of the phrase. Negatives were even used this way in english until its modernization.
====
Crudely Drawn Games
How do you think the machinism behind doing calculations can possibly stop the evolution of "human understanding"? And how is having a computer doing the calculations qualitatively different from having a human do them, except that the computer (if programmed correctly, of course) doesn't make mistakes?
Don't blame me; I'm never given mod points.
Without intuition, Newton would never have discovered gravity. Intuition is not a means for conducting experiments, but it is essential in order to determine what experiments to conduct.
English is easier said than done.
The interesting place to do science is at the edges of what is known. Those edges are always described by limited statistical or modelling power. So EXPO comes along and wants to make think into ontological trees. How does it deal with the fact that the ultimate leaf nodes are fuzzy? Might be great if you want to compare well established things, but for specialists working at the edge I wonder if it will really be useful.
That's impressive. But it is engineering, not science. When computers start proposing new experiments to which will help us understand things unknown, then they will be doing science!
Another (perhaps the most important) goal for this type of research is a bit more subtle than replacing the Hypothesis->Experiment->Analysis->Hypothesis sequence (Scientific Method) by computers. There will still be many experiments for which human insight is the best tool for deciding a possibly fruitful idea. However, humans (i.e. grad students, who often might suggest 'workhorse' as a better nominative) are not only slower at data analysis, we are severely limited in our abilities to 'see' patterns and correlations in very high dimension data. This has traditionally limited hypotheses to extensions/reworkings of the proposed process at work in a single experiment. If computers have access to both the data and a weighted list of most likely hypotheses for subsets of the entire oeuvre on a specific subject, they could run statistical classification and pattern matching algorithms to suggest new hypotheses based on immense amounts of information. Some of these may involve a large number of variables or inputs, but there are two very significant possibilities that make this research (and certainly other projects involved in similar applications) highly significant:
1) These complicated hypotheses could still be tested relatively easy by human scientists because most computer suggestion systems for new hypothesis possibilities would likely suggest a few tests that would help to support/disprove these new hypotheses.
2) Even more simplification comes from the fact that experiments may not need to be repeated nearly as much as they do now in order to make a hypothesis -- there is an incredible amount of data already gathered, and typical AI/pattern matching algorithms keep some of the data back for testing later. If the system finds a possible hypothesis on some level, experiments as to that concepts validity have essentially already been done in a virtual sense.
3) If the somewhat positivist version of current thought in physics http://www.toequest.com/, mathematics, chaos theory, complexity theory, cellular automata http://www.wolframscience.com/nksonline/toc.html, etc. is even vaguely valid, it is quite possible that, despite the complexity and dimensionality of the input data, the 'best' hypotheses developed even by purely automated means might still be simple and elegant and/or even yield insight into possible explanatory processes rather than just statistical indicators. This would be a valuable and beautiful victory for humanism and the importance of science as a truly elegant description of the world around us.