Slashdot Mirror


Making Science Machine Readable

holy_calamity writes "New Scientist is reporting on a new open source tool for writing up scientific experiments for computers, not humans. Called EXPO, it avoids the many problems computers have with natural language, and can be applied to any experiment, from physics to biology. It could at last let computers do real science - looking at published results and theories for new links and directions."

25 of 135 comments (clear)

  1. EXPO has a serious naming problem by plover · · Score: 4, Insightful
    It's virtually hopeless to try to find information about EXPO on Google. You've got the Home Depot Expo site, you've got E3, Macworld Expo, Linuxworld Expo, Book Expo; expositions seem to be coming out of your ears, and if you try to qualify it with helpful keywords such as science and/or language, it seems that every elementary school is hawking their science expos, in addition to documents from historical expos going back to the 1970s and possibly even earlier!

    And forgive me for thinking the university would be more helpful, but no, there's been a series of expos at the University of Aberystwyth, from art through VoIP.

    I'd love to have found more info on the language, but my casual browsing got stopped right there.

    If they'd named it something like EXPI or EXPLO at least it'd be uniquely locatable. Google might whine about the potential misspelling of Expo, but it would dutifully locate the search term as requested.

    --
    John
    1. Re:EXPO has a serious naming problem by mapkinase · · Score: 3, Informative

      Just look by the name of the authors: Ross King and Larisa Soldatova.

      I personally knew Ross by his time in Mike Sternberg's lab, and have only high praise for his intellectual abilities.

      --
      I do not believe in karma. "Funny"=-6. Do good and forbid evil. Yours, Oft-Offtopic Flamebaiting Troll.
    2. Re:EXPO has a serious naming problem by dfedfe · · Score: 3, Insightful

      'Tis a good point. But a search for 'expo science ontology' (without the single quotes) brings up a little bit. Here is a pdf of a presentation on EXPO that explains a bit more than TFA.

    3. Re:EXPO has a serious naming problem by marekrud · · Score: 2, Funny

      Have you tried to search for `LaTeX'? ;)

  2. ok .... by icepick72 · · Score: 4, Funny
    Let's look at one simple human english speaking scenario

    Human: No Computer, Do NOT launch missle now.

    Computer: Parsing input ...
    Computer: NOT, NOT (launch missle now)

    Computer: Launch initiated ....

    1. Re:ok .... by ch-chuck · · Score: 2, Insightful

      Just the spot for an observation - I think the problem with 'double negatives' has to do with emotional versus logical thinkers. Emotional, or romantic types, see an extra negative as a cumulative emphasis - using a negative twice means a more forceful 'no' than just one. Logical, or classical types, see it as canceling like a mathematical operation. Of course it's not always that clear cut with lots of exceptions, as even an emotional type will read 'not false' as 'true', etc.

      --
      try { do() || do_not(); } catch (JediException err) { yoda(err); }
    2. Re:ok .... by Valar · · Score: 2, Insightful

      It also depends heavily on language. In many languages, repeated negatives are explicitly used to emphasize the negative nature of the phrase. Negatives were even used this way in english until its modernization.

  3. deduction by COMON$ · · Score: 2, Funny

    After which the computers deduce they were actually not created but rather evolved from a lesser society of "users". sorry had to make the joke, we all saw it coming :)

    --
    CS: It is all sink or swim...oh and did I mention there are sharks in that water?
  4. Artificial Stupidity now has a use by Mr+Pippin · · Score: 2, Funny

    Wow! Now all that past work on Artificial Stupidity has REAL uses.

    http://www3.sympatico.ca/sarrazip/nasa.html

  5. Wait, what does it do? by Jonboy+X · · Score: 4, Insightful
    The article is kind of unclear. What exactly does EXPO do? At first it seemed to me that the system helped translate the more-or-less natural language format of your average scientific experiment writeup into some other more machine-parsable format, but then I saw this at the bottom of the article:

    King admits that for the moment using EXPO is time-consuming because experimental write-ups must be translated by hand.


    WTF? If you have to manually pre-parse every article that enters the system, it severely limits the rate you can enter information into the database, no?
    --

    "In a 32-bit world, you're a 2-bit user. You've got your own newsgroup, alt.total.loser." -Weird Al
    1. Re:Wait, what does it do? by IDontAgreeWithYou · · Score: 2, Insightful

      Try and keep up. The whole point of EXPO is that computers can't parse a scientific article written in human language. If you could write a piece of software that could parse the original article there would be no point in having EXPO. If everyone starts using EXPO, both for new papers and going back through old ones, you will quickly devleop a database that can be used to help streamline future research.

      --
      Finding other idiots on /. that agree with your opinion doesn't make it any less stupid.
    2. Re:Wait, what does it do? by mapkinase · · Score: 2, Informative

      Just read the next sentences to the quote. It is the same idea that lies behind RSS: the author is responsible for providing results in an EXPO format.

      For automatical data mining from scientific papers check the leading software on that matter (disclaimer: it is a plug):

      http://ariadnegenomics.com/technology/medscan/

      Currently works for biology, but it is expandable.

      --
      I do not believe in karma. "Funny"=-6. Do good and forbid evil. Yours, Oft-Offtopic Flamebaiting Troll.
  6. Ooo Machine Readable! by mcai8rw2 · · Score: 2, Funny

    Wow...getting a machine to write up your science experiments! Excellent...now all i need is to find one that can type my essays, and show its working in my maths, and I'm sorted! Is this the new era of generating scientists from everyone!


    I need one to clean my clothes, sing to me in the bath, and make sure my house is warm when I come home! Hehhe! Who needs wives...we have UBER_MACHINE

    --
    >>>Scanning for I.D.I.O.T.S. >>>
    >>>I.D.I.O.T.S. FOUND! >>>
  7. "At last" do real science? by w33t · · Score: 5, Informative

    I think that computers have actually been able to do real science for at least a little while already.
    John Koza is a leader in field of genetic and evolutionary computation. Very much his computer's do real science. The computers analize a set of data (observation), they make a series of modifications (hypothesis), they run fitness tests against these modified versions of the data (experiment), then they begin again analizing these results (back to obeservation).

    The computer clusters which John Koza has engineered have created high-pass and low-pass filters when given nothing more than a random assortment of electronic components; even while John himself knew nothing of electronics that would enable him to create such a circut himself.

    Most impressively is how the computer cluster evolved a new antenna for NASA - when it was completed John was worried that the computer had made some grievious errors because the little antenna looked like a bent paper clip - but it worked!

    And that's science if you ask me. Especially the antenna - the results of experiments can, and seeminly do, often go against "common sense" and give answers which are "unintuitive".

    Perhaps computers will be much better with the next generation physics we're discovering. Perhaps our little numerical darlings are simply better suited to deal with the abstract, multi-dimensional world of what the universe is starting to appear to be.

    (Please pardon my lay and simplified version of the scientific method - but I feel it is a valid interpretation (if overly simplified for minds such as mine ;) )
    --
    Music should be free

    1. Re:"At last" do real science? by miro2 · · Score: 2, Insightful
      And that's science if you ask me. Especially the antenna - the results of experiments can, and seeminly do, often go against "common sense" and give answers which are "unintuitive".


      That's impressive. But it is engineering, not science. When computers start proposing new experiments to which will help us understand things unknown, then they will be doing science!

  8. Hmmm by Daniel+Dvorkin · · Score: 3, Insightful

    It seems to me that it's designed to fit experiments into a framework which might not allow for much innovation. The truly great experiments (e.g., Michelson-Morley, Avery-McLeod-McCarty) required new experimental techniques as well as new hypotheses and tests. We should be very careful not to impose a standard which would limit such experiments (or, more to the point, the ability of the experimenters to get published) in the future.

    Basically what I'd be worried about is the tendency of the tool to become the task. This is something of a problem in my field (biostats) because SAS is so ubiquitous -- often the question becomes "what can SAS tell us about this data set" rather than "what do we want to know from this data set, and what tool should we use to find out?" Fortunately other, more flexible analysis tools (particularly R, which encourages real programming rather than running a set of canned tests) are becoming more common in the field, and so this is starting to change, but it's still a problem.

    It's also a problem that every techie is familiar with -- "We want to do this in $LANGUAGE on $PLATFORM," even when that particular language and platform may be an absolutely terrible choice for the task at hand.

    That being said, it's certainly a potentially useful tool, and I'll be interested to see where it goes. It's just that when I read lines like "Journals could also insist that researchers submit papers in EXPO as well as written normally," I get twitchy.

    --
    The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
  9. We've got one thing in common. by ABoerma · · Score: 3, Funny

    FTFA: "Computers are not very good with natural language"

    Neither are most humans. Even if computers could understand natural language, most people still wouldn't be able to convey their ideas correctly.

  10. A quick peek at the SourceForge download... by frankie · · Score: 4, Informative

    ...reveals that EXPO is an OWL schema. Exactly as described, it's an attempt to regularize the content of experimental design into machine readable form (XML). So any discussion of whether EXPO is a good idea or not really hinges on whether you think OWL is a good idea or not.

  11. Re:I don't mean to sound like a conspiricy theoris by geoffspear · · Score: 2, Insightful

    How do you think the machinism behind doing calculations can possibly stop the evolution of "human understanding"? And how is having a computer doing the calculations qualitatively different from having a human do them, except that the computer (if programmed correctly, of course) doesn't make mistakes?

    --
    Don't blame me; I'm never given mod points.
  12. Re:I don't mean to sound like a conspiricy theoris by hunterx11 · · Score: 2, Insightful

    Without intuition, Newton would never have discovered gravity. Intuition is not a means for conducting experiments, but it is essential in order to determine what experiments to conduct.

    --
    English is easier said than done.
  13. What's going on? by golodh · · Score: 5, Informative
    The New Scientist article was clear enough but a little short on technical detail. Note: I'didn't know any of this until I read the article, so my comments are based on nothing more than a few minutes of experience.

    What is it?

    EXPO is a piece of software (written in a formal language called "owl", but they didn't tell you that), which provides a formal dictionary especially for experiments. The terms in this dictionary let you describe your experiment in a formal way. That's a bit messy, but then you're supposed to use an editor to help you. An editor for this language (called "protégé")can be fund at http://protege.stanford.edu/index.html. Download it (61 Mb., or 31 Mb. without the JVM) and use it to read the EXPO document.

    What's it good for (in principle)?

    Once an experiment is decribed in the OWL language using this dictionary, it can be searched automatically. You could automate queries such as "list me all published 3-factor experiments that test Ohm's law". Or "give me all 2-factor experiments that deal with lung-cancer, smoking, and gender and that use tomography as a diagnostic instrument".

    Now at the moment you can do that too, but you'd have to spend quite a bit of time and know quite a bit about the field to be able to do this because you won't be able to do a full-text search (thanks to the publishers of scientific journals for this). And then you'd find that not everyone uses the same terms, and then you'll find only English-language results because you wouldn't know how to spell "lung-cancer" or "2-factor experiment" in Spanish, French, German, Chinese, Japanese or whatever, but then again neither can many foreign language authors spell it in English (which doesn't ever seem to stop them from publishing however).

    Such a schema (provided it's universal and standardised like the Dewey decimal system) would allow you to find your way in the fog of language. Unfortunately however, if anything we will probably see lots and lots of different standards ("standards are good ... we should all have one !") and properietary solutions with "enhancements" and "extensions" (read safeguards against portability).

    What can we expect in the next 3 years?

    Nothing useful, I'm afraid. In theory it's great but don't hold your breath. Any author would have to download an OWL editor, understand the editor, understand the formal language used, and then code up his/her article in OWL using the EXPO distionary, and submit it (in electronic form) along with his article. Good luck to you authors! Lets just hope no-one makes any tiny but significant mistake in describing their experiment, and that all authors take the time to learn this formal language and then use it.

    If within the nect 10 years any significant amount (say more than 5% of all publications) annually will be coded in such a schema I'd be more than surprised.

  14. Key Aim by pr0f3550rcha05 · · Score: 3, Insightful

    Another (perhaps the most important) goal for this type of research is a bit more subtle than replacing the Hypothesis->Experiment->Analysis->Hypothesis sequence (Scientific Method) by computers. There will still be many experiments for which human insight is the best tool for deciding a possibly fruitful idea. However, humans (i.e. grad students, who often might suggest 'workhorse' as a better nominative) are not only slower at data analysis, we are severely limited in our abilities to 'see' patterns and correlations in very high dimension data. This has traditionally limited hypotheses to extensions/reworkings of the proposed process at work in a single experiment. If computers have access to both the data and a weighted list of most likely hypotheses for subsets of the entire oeuvre on a specific subject, they could run statistical classification and pattern matching algorithms to suggest new hypotheses based on immense amounts of information. Some of these may involve a large number of variables or inputs, but there are two very significant possibilities that make this research (and certainly other projects involved in similar applications) highly significant:
    1) These complicated hypotheses could still be tested relatively easy by human scientists because most computer suggestion systems for new hypothesis possibilities would likely suggest a few tests that would help to support/disprove these new hypotheses.
    2) Even more simplification comes from the fact that experiments may not need to be repeated nearly as much as they do now in order to make a hypothesis -- there is an incredible amount of data already gathered, and typical AI/pattern matching algorithms keep some of the data back for testing later. If the system finds a possible hypothesis on some level, experiments as to that concepts validity have essentially already been done in a virtual sense.
    3) If the somewhat positivist version of current thought in physics http://www.toequest.com/, mathematics, chaos theory, complexity theory, cellular automata http://www.wolframscience.com/nksonline/toc.html, etc. is even vaguely valid, it is quite possible that, despite the complexity and dimensionality of the input data, the 'best' hypotheses developed even by purely automated means might still be simple and elegant and/or even yield insight into possible explanatory processes rather than just statistical indicators. This would be a valuable and beautiful victory for humanism and the importance of science as a truly elegant description of the world around us.

  15. Re:I don't mean to sound like a conspiricy theoris by vertinox · · Score: 2, Informative

    But what happens if we get to the point where all of science is automated by computer?

    We get a Technological Singularity.

    --
    "I am the king of the Romans, and am superior to rules of grammar!"
    -Sigismund, Holy Roman Emperor (1368-1437)
  16. Re:the edge is always fuzzy by uid7306m · · Score: 2, Interesting

    Darn right! The universe does not fall into
    hard-edged classes, at least not often.
    Some good classes like "protons" and
    "neutron stars" exist, of course, but
    concepts like "words" and "species" are
    intrinsically fuzzy if you think about them
    long enough.

    Same with experiments. Let's take a Linguistic
    example: deciding whether or not a sentence is
    gramatically correct. You can do this experiment
    in several ways:

    1) Give the person a sentence, a library, and
    some paper. Let them take as long as they want.

    2) Or, we can make it more like a conversation:
    read them the sentence, and put a time limit on
    it. In real speech, you have about a second
    to understand a sentence, so we only accept
    a "yes" or "no" if it happens within a second.

    3) Make it into a reaction-time experiment.
    Get them to hit a yes button or a no button
    and measure how long it takes.

    The point is, you can do dozens of variants of any
    experiment, and any ontology will lump together
    some things that are different in some important
    way, or (alternatively) will split apart some
    experiments that have critical similarities.

    Likewise for data analysis.

    Personally, I feel that Linguistics has been held
    back for about two decades by linguist's expectation
    that everything falls into nice categories.
    I'd hate for the same thing to happen to other fields.

    Just think of the Dewey Decimal system: that's an
    ontology, and like all ontologies, it puts the
    dividing lines in the wrong place.

  17. Science Machine! by autophile · · Score: 3, Funny
    I agree completely! Science Machine should be totally readable. If it isn't readable, where will we get our daily fix of Science? Not from Science Machine, that's for sure!

    All hail Science Machine!

    --Rob

    --
    Towards the Singularity.