Making Science Machine Readable

EXPO has a serious naming problem by plover · 2006-06-07 03:22 · Score: 4, Insightful

It's virtually hopeless to try to find information about EXPO on Google. You've got the Home Depot Expo site, you've got E3, Macworld Expo, Linuxworld Expo, Book Expo; expositions seem to be coming out of your ears, and if you try to qualify it with helpful keywords such as science and/or language, it seems that every elementary school is hawking their science expos, in addition to documents from historical expos going back to the 1970s and possibly even earlier!

And forgive me for thinking the university would be more helpful, but no, there's been a series of expos at the University of Aberystwyth, from art through VoIP.

I'd love to have found more info on the language, but my casual browsing got stopped right there.

If they'd named it something like EXPI or EXPLO at least it'd be uniquely locatable. Google might whine about the potential misspelling of Expo, but it would dutifully locate the search term as requested.

--
John

Re:EXPO has a serious naming problem by FudRucker · 2006-06-07 03:25 · Score: 0, Redundant

i noticed that too, hopefully this story will be duped when Expo gets a detailed home page. so far the home page shows up as empty...

--
Politics is Treachery, Religion is Brainwashing
Re:EXPO has a serious naming problem by mapkinase · 2006-06-07 03:38 · Score: 3, Informative

Just look by the name of the authors: Ross King and Larisa Soldatova.

I personally knew Ross by his time in Mike Sternberg's lab, and have only high praise for his intellectual abilities.

--
I do not believe in karma. "Funny"=-6. Do good and forbid evil. Yours, Oft-Offtopic Flamebaiting Troll.
Re:EXPO has a serious naming problem by dfedfe · 2006-06-07 03:51 · Score: 3, Insightful

'Tis a good point. But a search for 'expo science ontology' (without the single quotes) brings up a little bit. Here is a pdf of a presentation on EXPO that explains a bit more than TFA.
Re:EXPO has a serious naming problem by plover · 2006-06-07 04:06 · Score: 1

That's all well and good for you, but when you're not personally acquainted with Ross and are simply trying to do your job and get your research into EXPO format, your first entry into Google is not going to be "expo king soldatova ontology". Trust me on this one.
With any luck, there will eventually be tools to use the language that will have their own names, and we can hope those will serve to disambiguate EXPO.

--
John
Re:EXPO has a serious naming problem by marekrud · 2006-06-07 04:13 · Score: 2, Funny

Have you tried to search for `LaTeX'? ;)
Re:EXPO has a serious naming problem by Loconut1389 · 2006-06-07 04:50 · Score: 1

don't forget dry erase markers!
Re:EXPO has a serious naming problem by mapkinase · 2006-06-07 06:51 · Score: 1

I agree that naming probably did not take into account googling, but those names are in the article, as far as I remember.

--
I do not believe in karma. "Funny"=-6. Do good and forbid evil. Yours, Oft-Offtopic Flamebaiting Troll.
Re:EXPO has a serious naming problem by Anonymous Coward · 2006-06-07 06:53 · Score: 1, Informative

I couldn't find much about EXPO but I found some previous work.

They have a publication in Nature biotech: The failure of many bio-ontologies to follow international standards for ontology design and description is hampering their application and threatens to restrict their future use.
http://www.nature.com/nbt/journal/v23/n9/abs/nbt09 05-1095.html;jsessionid=873A8C7D8ADA6CD6B7ABB60E1E 640D45

They discuss microarray experiments.

Microarray experiments are interesting from the massive data they produce and what you can get out them. In a microarray experiment, you looks at all the mRNA trancripts generated in an organism under specific conditions. You get a whole lot of data from this experiment and often the researchers are only interested in one specific question and the rest of the data goes to waste. However, when the data is standardized and made available other researchers can look at the same data with a different question. Or look over multiple datasets with standardized data. These are massive data sets and for other people and groups to use the data (or you using the data in a different way) depends on standardization.

Right now, to find other research, you do a text search for a name you know. But what if someone is doing a very similar experiment with a different set of proteins that have a different name? If you could search the structure of the experiment instead of just the text, you could conceivably pull relevant information that you didn't know about.

Interestingly, King has another paper:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd= Retrieve&db=pubmed&dopt=Abstract&list_uids=1472463 9&query_hl=5&itool=pubmed_docsum

Functional genomic hypothesis generation and experimentation by a robot scientist.
King RD, Whelan KE, Jones FM, Reiser PG, Bryant CH, Muggleton SH, Kell DB, Oliver SG.

Department of Computer Science, University of Wales, Aberystwyth SY23 3DB, UK.

The question of whether it is possible to automate the scientific process is of both great theoretical interest and increasing practical importance because, in many scientific areas, data are being generated much faster than they can be effectively analysed. We describe a physically implemented robotic system that applies techniques from artificial intelligence to carry out cycles of scientific experimentation. The system automatically originates hypotheses to explain observations, devises experiments to test these hypotheses, physically runs the experiments using a laboratory robot, interprets the results to falsify hypotheses inconsistent with the data, and then repeats the cycle. Here we apply the system to the determination of gene function using deletion mutants of yeast (Saccharomyces cerevisiae) and auxotrophic growth experiments. We built and tested a detailed logical model (involving genes, proteins and metabolites) of the aromatic amino acid synthesis pathway. In biological experiments that automatically reconstruct parts of this model, we show that an intelligent experiment selection strategy is competitive with human performance and significantly outperforms, with a cost decrease of 3-fold and 100-fold (respectively), both cheapest and random-experiment selection.

I couldn't see big leaps of innovation coming from this kind of experimentation, but there is a lot of basic grunt work done in research that this system could automate.
Re:EXPO has a serious naming problem by Yeochee · 2006-06-07 07:25 · Score: 1

Well, yes...
The first result is to "The LaTeX Home Page". 9 of the first 10 links are related to the typesetting software, the 10th is a link to an article on wikipedia.
Very disapointing :)
(this is on google.be)
Re:EXPO has a serious naming problem by David+Gould · 2006-06-07 12:50 · Score: 1

trying to do your job and get your research into EXPO format, your first entry into Google is not going to be "expo king soldatova ontology". Trust me on this one.
True, but it's also true that most real scientists have research skills that exceed typing one word into Google. It may be an issue, but probably not that much of one.

--
David Gould
main(i){putchar(340056100>>(i-1)*5&31|!!(i<6)<< 6)&&main(++i);}
Re:EXPO has a serious naming problem by munpfazy · 2006-06-07 15:00 · Score: 1

It's almost as dumb as a browser named links, or a programming language named C.

I don't disagree with you; however, if EXPO becomes popular, it probably won't remain hard to find for long.

As far as I can tell, there just isn't much information out there about it. Even using authors' names and lots of keywords, I can't find much of anything except a single pdf of conference slides (which are totally useless without the accompanying audio.)
Re:EXPO has a serious naming problem by syousef · 2006-06-07 20:02 · Score: 1

You're not finding anything because there's not much there. At the moment, from what I can tell, it's just a single XML file with a .OWL extension - a description of a file format. Now the New Scientist article may have more or perhaps they just haven't released more yet, but the reason you're not finding stuff is that there's not much to find on the web just now.

--
These posts express my own personal views, not those of my employer

XML? by flumps · 2006-06-07 03:22 · Score: 1, Insightful

Whats wrong with XML goddamnit?

--
"So there he is, risen from the dead. Like that fella, E. T." - Father Ted Crilly

Re:XML? by Reverend528 · 2006-06-07 03:27 · Score: 1

It's too verbose to be easily edited by humans?

--
Badass Resumes
Re:XML? by ponden · 2006-06-07 03:47 · Score: 1

The problem is how make the writer to submit the XML formatted research-result.

>King admits that for the moment using EXPO is time-consuming because
>experimental write-ups must be translated by hand.

Critical point of this problem have not been solved.
There is few motivation for researcher to submit research-result in a publicly standard format.
Re:XML? by neonprimetime · 2006-06-07 03:54 · Score: 1

Wouldn't the obvious solution be to write a User-Friendly UI (front-end) that computer-illiterate scientists could utilize ... and then the front-end is designed to spit out XML?
Re:XML? by Marc2k · 2006-06-07 04:13 · Score: 1, Redundant

Exactly [sort of]. All the while I was reading this article, I couldn't shake the feeling of: this faces some of the exact same [hard] problems faced by the Semantic Web. This is a great way to get computers to understand the semantics of scientific experiments...when everyone's using the format, which probably isn't as expressive as is necessary in all cases, and invariably isn't bulletproof yet. It's that same chicken-and-egg problem, where the only benefits to using this system are seen when a large number of people use the system, which many people won't simply because there are no software tools available yet (or probably for a long while), and it's a time-intensive process.

--
--- What
Re:XML? by espressojim · 2006-06-07 06:26 · Score: 1

Better, let's see if they can input a large corpus, then do any real reasoning with it.

Heck, take 30 papers of some time, and produce anything we don't already know.

My co-worker is playing with OWL/etc. I'm still skeptical about it, but we'll see...
Re:XML? by treeves · 2006-06-07 09:31 · Score: 1

Uh, it seems that "Malformed URLs don't redirect to Microsoft" but to www.w3.org as they should according to the outdated link in your sig. Mod -1 Offtopic ;-)

--
...the future crusty old bastards are already drinking the Kool-Aid.
Re:XML? by Reverend528 · 2006-06-07 12:05 · Score: 1

Thanks, I'll change it.

--
Badass Resumes
Re:XML? by SEWilco · 2006-06-07 16:29 · Score: 1

It is XML. It uses OWL.

I don't mean to sound like a conspiricy theorist.. by mikecardii · 2006-06-07 03:22 · Score: 1

But what happens if we get to the point where all of science is automated by computer? I think that one of the most "endearing" qualities (if you will) if science is the possibility of human error. (ps. lawl, my first slashdot post)

ok .... by icepick72 · 2006-06-07 03:23 · Score: 4, Funny

Let's look at one simple human english speaking scenario

Human: No Computer, Do NOT launch missle now.

Computer: Parsing input ...
Computer: NOT, NOT (launch missle now)

Computer: Launch initiated ....

Re:ok .... by ch-chuck · 2006-06-07 03:37 · Score: 2, Insightful

Just the spot for an observation - I think the problem with 'double negatives' has to do with emotional versus logical thinkers. Emotional, or romantic types, see an extra negative as a cumulative emphasis - using a negative twice means a more forceful 'no' than just one. Logical, or classical types, see it as canceling like a mathematical operation. Of course it's not always that clear cut with lots of exceptions, as even an emotional type will read 'not false' as 'true', etc.

--
try { do() || do_not(); } catch (JediException err) { yoda(err); }
Re:ok .... by Valar · 2006-06-07 03:44 · Score: 2, Insightful

It also depends heavily on language. In many languages, repeated negatives are explicitly used to emphasize the negative nature of the phrase. Negatives were even used this way in english until its modernization.

--

====
Crudely Drawn Games
Re:ok .... by Elvis+Parsley · 2006-06-07 03:54 · Score: 1

And they're still used in colloquial English that way, even if strict grammarians disapprove.
Re:ok .... by Anonymous Coward · 2006-06-07 04:01 · Score: 1, Funny

No they weren't. ; )
Re:ok .... by Anonymous Coward · 2006-06-07 04:05 · Score: 0

Reminds me of "NO MONEY DOWN" from Simpsons... that turned into "NO, MONEY DOWN!"
Re:ok .... by Anonymous Coward · 2006-06-07 04:07 · Score: 0

See this comic.
Re:ok .... by gEvil+(beta) · 2006-06-07 04:08 · Score: 1

Works on commission? No, money down!

--
This guy's the limit!
Re:ok .... by ceoyoyo · 2006-06-07 05:02 · Score: 1

It also depends on punctuation. "Not! False!" for example, means false. Not false! means true.

The original example "No computer, do not launch" means do not launch, even grammatically.
Re:ok .... by coopex · 2006-06-07 11:00 · Score: 1

You should probably remove that Bar association logo too...

--
The road to hell is paved with good intentions.

hmm.. by bigattichouse · 2006-06-07 03:23 · Score: 1, Funny

unfotunately a machine won't look at something and say "Should this be done?" A human free world is very pretty, but rather dull. Thermonuclear destruction Hypothesis proven. But where can I get a good drink, and dance with a pretty girl?

--
meh

Re:hmm.. by sendtwogrey · 2006-06-07 03:49 · Score: 1

Take it that you have never headed up a team within an international corporation or law company then.
Re:hmm.. by gardyloo · 2006-06-07 06:36 · Score: 1

unfotunately a machine won't look at something and say "Should this be done?" A human free world is very pretty, but rather dull. Thermonuclear destruction Hypothesis proven. But where can I get a good drink, and dance with a pretty girl?

The danger, though, is when the pretty girls get such machines and decide that thermonuclear destruction is a pretty damned good alternative to dancing with us.

deduction by COMON$ · 2006-06-07 03:23 · Score: 2, Funny

After which the computers deduce they were actually not created but rather evolved from a lesser society of "users". sorry had to make the joke, we all saw it coming :)

--
CS: It is all sink or swim...oh and did I mention there are sharks in that water?

Re:deduction by sendtwogrey · 2006-06-07 04:16 · Score: 1

I thought that the whole "intelligent design" thing was concluded with the following results:

The First Day : The first recorded Words of Babbage that we have are "let there be electron flow"

The Second Day : The separation of silicon from the sands.

The Third Day : The first appearance of the wafers.

The Fourth Day : With the platform now clear, the OS, UPS and HUB were visible.

The Fifth Day : Great numbers of 0's and 1's flickered and Turing

The Sixth Day : Vast numbers of programs became

The Seventh Day : The Gamers Day. "By the seventh day Babbage had finished the work he had been doing; so on the seventh day he rested from all his work. And Babbage blessed the seventh day and made it holo, because on it he rested [or ceased] from all the work of programming that he had done.

Artificial Stupidity now has a use by Mr+Pippin · 2006-06-07 03:27 · Score: 2, Funny

Wow! Now all that past work on Artificial Stupidity has REAL uses.

http://www3.sympatico.ca/sarrazip/nasa.html

Re:I don't mean to sound like a conspiricy theoris by thePig · 2006-06-07 03:29 · Score: 1

Not all of science can _ever_ get automated by computer (atleast not until actual AI comes along)

Science, especially pure sciences, need a lot of intuition and many a times, an understanding far above and different than that of others.

It is impossible as far as computers are concerned... (unless self-aware self-modifying programs come along??)

This will help in routine checks and scientific experiments.. that is all.

--
rajmohan_h@yahoo.com

Wait, what does it do? by Jonboy+X · 2006-06-07 03:29 · Score: 4, Insightful

The article is kind of unclear. What exactly does EXPO do? At first it seemed to me that the system helped translate the more-or-less natural language format of your average scientific experiment writeup into some other more machine-parsable format, but then I saw this at the bottom of the article:

King admits that for the moment using EXPO is time-consuming because experimental write-ups must be translated by hand.

WTF? If you have to manually pre-parse every article that enters the system, it severely limits the rate you can enter information into the database, no?

--

"In a 32-bit world, you're a 2-bit user. You've got your own newsgroup, alt.total.loser." -Weird Al

Re:Wait, what does it do? by IDontAgreeWithYou · 2006-06-07 03:42 · Score: 2, Insightful

Try and keep up. The whole point of EXPO is that computers can't parse a scientific article written in human language. If you could write a piece of software that could parse the original article there would be no point in having EXPO. If everyone starts using EXPO, both for new papers and going back through old ones, you will quickly devleop a database that can be used to help streamline future research.

--
Finding other idiots on /. that agree with your opinion doesn't make it any less stupid.
Re:Wait, what does it do? by mapkinase · 2006-06-07 03:43 · Score: 2, Informative

Just read the next sentences to the quote. It is the same idea that lies behind RSS: the author is responsible for providing results in an EXPO format.

For automatical data mining from scientific papers check the leading software on that matter (disclaimer: it is a plug):

http://ariadnegenomics.com/technology/medscan/

Currently works for biology, but it is expandable.

--
I do not believe in karma. "Funny"=-6. Do good and forbid evil. Yours, Oft-Offtopic Flamebaiting Troll.
Re:Wait, what does it do? by Bogtha · 2006-06-07 03:44 · Score: 1

Think about that a little more. If the original write-up is ambiguous, and the goal is to express the write-up unambiguously, how do you expect the software to interpret the source material if it's ambiguous to begin with?

The way I understand this is that it is simply a write-up format. It's not meant to make your write-ups unambiguous by itself, just provide a format in which you can do so.

Think of it as similar to EBNF for syntax. Software doesn't exist to read a specification and deduce the EBNF for it, but that doesn't mean there isn't value in a standardised, unambiguous, machine-readable format.

--
Bogtha Bogtha Bogtha
Re:Wait, what does it do? by Niebieski · 2006-06-07 03:47 · Score: 1

WTF? If you have to manually pre-parse every article that enters the system, it severely limits the rate you can enter information into the database, no?

Yes, but it does make future iterations of the same experiment faster. In order to be valid, experiments must be reproducible. Translate once, use many.

Ooo Machine Readable! by mcai8rw2 · 2006-06-07 03:33 · Score: 2, Funny

Wow...getting a machine to write up your science experiments! Excellent...now all i need is to find one that can type my essays, and show its working in my maths, and I'm sorted! Is this the new era of generating scientists from everyone!

I need one to clean my clothes, sing to me in the bath, and make sure my house is warm when I come home! Hehhe! Who needs wives...we have UBER_MACHINE

--
>>>Scanning for I.D.I.O.T.S. >>>
>>>I.D.I.O.T.S. FOUND! >>>

Re:Ooo Machine Readable! by neonprimetime · 2006-06-07 03:57 · Score: 1

now all i need is to find one that can type my essays

Try a random paragraph generator.
Re:Ooo Machine Readable! by stormi · 2006-06-07 04:18 · Score: 0

you need someone to sing to you in the bath? that's weird man....

--
"if only i had known i would have been a locksmith." -albert einstein
Re:Ooo Machine Readable! by vadim_t · 2006-06-07 06:37 · Score: 1

I tried with "compiler" and "gcc", and this is what I got:

Compiler vanishes past the transmitted skill. The imperative walks Compiler across the wrecker. Compiler milks GCC. The scarce symptom reverts throughout GCC. Compiler breaches a coin behind the uncertain knight.

GCC undertakes Compiler. Why does GCC base the amazed supplier? Compiler blames GCC under the psychologist. Compiler bangs GCC against the mercury. The producer strikes Compiler. GCC fingers Compiler.

Very funny, but it has a long way to go.
Re:Ooo Machine Readable! by newt0311 · 2006-06-07 07:25 · Score: 0

and then we have... the planet earth before the Titans took over.
for ref: read the first prequel trilogy of Dune.
Re:Ooo Machine Readable! by mcai8rw2 · 2006-06-08 02:05 · Score: 1

I work in SEO and have to outsource a lot of content writing to another company because...well...because I'm too damn good to be a typing monkey.

It has been my holy grail to find a piece of software that will take a list of keyphrases and output a given amount of textual content.

This random paragraph generator is pretty close. Yes its gibberish, but the advantage is that it is grammatical gibberish...which makes all the difference. Tra la la.

--
>>>Scanning for I.D.I.O.T.S. >>>
>>>I.D.I.O.T.S. FOUND! >>>

Re:I don't mean to sound like a conspiricy theoris by plover · 2006-06-07 03:33 · Score: 1

But "endearing" isn't a quantifiable measure of science. Neither is "cute", "ugly", "republican", "democrat", "conservative" or "liberal".

Science is supposed to be about facts. If the machine can produce them without bias, I should think that makes the output more reliable (yes, I know you can only trust it as far as the input.) But by automating the process, it introduces "repeatability" which is always a good thing.

--
John

Re:I don't mean to sound like a conspiricy theoris by fyngyrz · 2006-06-07 03:34 · Score: 1, Funny

What we need to do is to hand over religion to computers. That way, we won't have to deal with it any more. They can just run it in the background as a time-wasting task. And the simplicity of the program is beauty itself. Just an unending stream of divide-by-zeros, followed by traps which the computer ignores, see, and then just picks up where it left off. Has to be run at a low priority, but that just adds more realism...

--
I've fallen off your lawn, and I can't get up.

"At last" do real science? by w33t · 2006-06-07 03:34 · Score: 5, Informative

I think that computers have actually been able to do real science for at least a little while already.
John Koza is a leader in field of genetic and evolutionary computation. Very much his computer's do real science. The computers analize a set of data (observation), they make a series of modifications (hypothesis), they run fitness tests against these modified versions of the data (experiment), then they begin again analizing these results (back to obeservation).

The computer clusters which John Koza has engineered have created high-pass and low-pass filters when given nothing more than a random assortment of electronic components; even while John himself knew nothing of electronics that would enable him to create such a circut himself.

Most impressively is how the computer cluster evolved a new antenna for NASA - when it was completed John was worried that the computer had made some grievious errors because the little antenna looked like a bent paper clip - but it worked!

And that's science if you ask me. Especially the antenna - the results of experiments can, and seeminly do, often go against "common sense" and give answers which are "unintuitive".

Perhaps computers will be much better with the next generation physics we're discovering. Perhaps our little numerical darlings are simply better suited to deal with the abstract, multi-dimensional world of what the universe is starting to appear to be.

(Please pardon my lay and simplified version of the scientific method - but I feel it is a valid interpretation (if overly simplified for minds such as mine ;) )
--
Music should be free

--
My Computer Music Tutorial Videos

Re:"At last" do real science? by shadwstalkr · 2006-06-07 04:11 · Score: 1

That's not science, that's brute force trial and error. It might be useful, and in some cases even necessary, but avoiding it is exactly why science exists. E.g. we invented mechanics so we don't have to build a million houses and see which ones stand up.
Re:"At last" do real science? by w33t · 2006-06-07 04:15 · Score: 1

You still have to build the house to make sure it stands up.

Trial and error are unavoidable.

I think science and trial and error are inseperable.

But I will certainly agree with you and concede that this is not an effecient way of doing science - but I think it is science nonetheless.
--
Music should be free

--
My Computer Music Tutorial Videos
Re:"At last" do real science? by espressojim · 2006-06-07 04:25 · Score: 1

The problem with some of these approaches is scalability. I'm a bioinformatician, and I've seen a few talks on this sort of technique. One project I saw an example of was a pathway (ie: gene a turns on gene b which regulates gene d) project, and it worked well, for up to 5-6 items in the pathway. After that, because of the way the algorithms scale, you get into serious problems. The guy presenting stated that "at some point, we'll all have teraherz computers on our desktops, and this will not be a big deal."

Wake me up when that happens (or when quantum/Dna computers that are good at solving massively parallel problems are running).
Re:"At last" do real science? by miro2 · 2006-06-07 04:34 · Score: 2, Insightful

And that's science if you ask me. Especially the antenna - the results of experiments can, and seeminly do, often go against "common sense" and give answers which are "unintuitive".

That's impressive. But it is engineering, not science. When computers start proposing new experiments to which will help us understand things unknown, then they will be doing science!
Re:"At last" do real science? by w33t · 2006-06-07 04:47 · Score: 1

Hehe, I must be an engineer eh? 8)

I suppose you are correct in that the computer does not offer an explanation of it's findings. Perhaps you could consider the result of an experiment an explanation in itself, but that's stretching it. Results require explanation: they aren't really explanations themselves.

I guess true science might then be considered one part engineering and one part philosophy - or at least the midpoint between them?
--
Music should be free

--
My Computer Music Tutorial Videos
Re:"At last" do real science? by ceoyoyo · 2006-06-07 05:08 · Score: 1

It's not true trial and error because we don't go and build ALL the possible houses. We figure out which one should work, then build it. If it works, great! If not, back to the drawing board.

What you described was true trial and error. Build a random house, see if it stands up. If not, make a random change and see if THAT stands up.

There is some direction in a genetic algorithm but there is a total lack of understanding. Things like that (and other data mining techniques) are great for generating interesting observations but it still takes a human to come along and generalize those into a theory.
Re:"At last" do real science? by oliverthered · 2006-06-07 05:16 · Score: 1

It's not true trial and error because we don't go and build ALL the possible houses. We figure out which one should work, then build it. If it works, great! If not, back to the drawing board.

I suggest you take a look a gentic algorythms.

Basically they take the best results and use then to make better results, just like building houses.

--
thank God the internet isn't a human right.
Re:"At last" do real science? by shadwstalkr · 2006-06-07 05:51 · Score: 1

Basically they take the best results and use then to make better results, just like building houses.

For thousands of years, yes, people have passed down the knowledge of (continuing the example) how to build a house, and the state of the art was advanced slowly by a few adventurous men every generation that tried a small variation in house building; even fewer of them survived the winter. But eventually someone figured out that they could look at the different techniques for building houses and theorize why some houses stand up and some don't, then use that theory to predict which new designs should stand up. Sure, you still have to build a few to test your theory, but it's a lot different (and a lot cheaper) than making essentially random variations on the current best model.

That's what science is about: using our observations of what has happened to predict what will happen. You could argue that genetic algorithms do this implicitly, but without the theoretical model it's essentially a very fast, very extensive build-it-and-see cycle. Genetic algorithms (and similar techniques) are important and useful, but they aren't the most efficient way to solve a lot of problems.
Re:"At last" do real science? by ceoyoyo · 2006-06-07 05:53 · Score: 1

Did you read the rest of my post? Specifically the part that says:

"There is some direction in a genetic algorithm but there is a total lack of understanding."

A house built as the result of a genetic algorithm tells you how to build THAT house. If you look a little deeper it might tell you that houses built with at least a few long wooden beams tend to do better than ones assembled entirely out of plywood. It tells you very little about how to build a skyscraper though.

Now, if you have a person in the loop and that person deduces the principles of mechanics, those DO tell you things about building a skyscraper. Or a car. Or a spaceship.

Yes, I agree building houses is sort of a poor example because it often is done semi-evolutionarily. We call that art though, not science.
Re:"At last" do real science? by Illserve · 2006-06-07 06:47 · Score: 1

Computers can do some science yes, a tiny fraction of the pie. But the idea that computers should be thrown at every branch of science is ludicrous.

Scientists have evolved a reasonably efficient means of communicating over the last few centuries in the form of journal articles and the peer review process. It has its faults but it's working pretty well. The idea that we should abandon all this to translate our work into some machine readable format because some guy thinks it's a good idea is so far beyond silly that I'm glad I was already sitting when I read the article summary.
Re:"At last" do real science? by Anonymous Coward · 2006-06-07 23:35 · Score: 0

Very much his computer's do real science.
Dude... WTF?
Re:"At last" do real science? by oliverthered · 2006-06-09 01:13 · Score: 1

That depends upon what you mean by 'understanding', if you look at multi-objecting evolutionary algorythms then the algotythm has some understanding of each objective relitive to the outcome.

Extend that geometrically and you have an understanding of the system as a whole.

Just because a genetic algorythm has a limited scope doesn't mean that it isn't doing a similar thing to human scientists with a greater scope.

BTW, Maths used to be called an art a few hundred years ago because it's a creative science.

--
thank God the internet isn't a human right.
Re:"At last" do real science? by ceoyoyo · 2006-06-09 06:02 · Score: 1

The algorithm has only the "understanding" you give it. It requires a person to set up the conditions and it requires a person to abstract principles.

Take the example in the article. The GA gave you an antenna, for a specific function. You don't have any more information now about HOW antennas work. You have an antenna, and it works, within the parameters (like frequency and directionality) you specified. Want to build a different antenna with different parameters? Too bad, you've got to run your GA again. Not to mention that the GA isn't actually BUILDING antennas... it's using simulations, derived from those principles that it took a human being to come up with, to simulate how those antennas would be expected to perform.

Scientists work in a way that is very different to things like GAs, and it's not just scope.

Math isn't a science either. Depending on how you look at it math is a tool, a language, an art or a massive construct of philosophy/logic. It is not a science.
Re:"At last" do real science? by oliverthered · 2006-06-11 22:19 · Score: 1

And you have only the understanding you are given, your point is?

--
thank God the internet isn't a human right.
Re:"At last" do real science? by ceoyoyo · 2006-06-12 05:30 · Score: 1

I'm not sure exactly what you mean. If you mean that people only understand what they're taught then that is demonstrably untrue. If it were true then no new theories would be possible. We would have collections of artefacts and no understanding of how they relate to each other.
Re:"At last" do real science? by PrinceOfStorms · 2006-06-12 19:49 · Score: 1

I think that computers have actually been able to do real science for at least a little while already.

The computers analize a set of data (observation), they make a series of modifications (hypothesis), they run fitness tests against these modified versions of the data (experiment), then they begin again analizing these results (back to obeservation).
The modifications in this case are random combinations of existing solutions where their selection for the next round is weighted based on their performance. This doesn't really seem to be hypothesis generation to me, or particularly scientific. It's basically almost-random search of the solution space with a bias towards local maxima that might eventually lead to an interesting solution if you throw sufficient computational power at the problem. Given that he uses http://en.wikipedia.org/wiki/John_Koza/ "a 1000 node Beowulf cluster, composed of Pentium II and DEC Alpha processors, to do his research" his approach seems more brute force than scientific to me.

useful? by Adkron · 2006-06-07 03:38 · Score: 1

This sounds like it could be quite useful. I wonder if people doing the experiments are going to be willing to become more like programmers to input the code or will they be hiring out? I think it will be a long time before this is a standard use item. It might be useful if everyone is usig it, but as long as only a few are using it I don't think it will advance the scientifc community in any way. Does anyone know what the computer is doing with these experiments? Is this just a data storage system or does it analyze and compare with other experiments on file?

--
The greatest of all weaknesses is the fear of appearing weak. ->JB Bossuet, Politics from Holy Writ. 1709

Re:useful? by shadwstalkr · 2006-06-07 04:25 · Score: 1

If it's just a markup language, most professional scientists are probably savvy enough to use it themselves (they use LaTeX for god's sake). If it enters any kind of widespread use, there will undoubtedly be several software packages to generate the files, as well as plugins for all the popular data management packages.

From the language specification, it looks like it's meant to (at least) let computers notice connections between different research projects that might otherwise go unnoticed. Like if you had someone who could read every published paper in a field and remember every data point and every procedure.

Re:I don't mean to sound like a conspiricy theoris by Kadin2048 · 2006-06-07 03:38 · Score: 1

Can't happen.

Trawling through data and pulling out correlations is only one part of science. It's an example of something that might be automatable. But there are many other things that cannot and will not ever be done mechanically -- unless you have a true AI.

There's too much creativity required in science, and creativity isn't something that's programmable. They also aren't naturally curious, and thus will never do any real `discovering' on their own. In short, they have no initiative; thus they will always be the spyglass, but never the explorer.

Regarding whether human error is an ``endearing'' quality, I think that's sort of like saying that the occasional error in a 17th-century astronomical table is endearing. While maybe to you it might have seemed that way, quite a few other people would have preferred the more-accurate versions produced when the first adding and calculating machines were invented. It's not always bad to remove humans from the equation.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."

Hmmm by Daniel+Dvorkin · 2006-06-07 03:40 · Score: 3, Insightful

It seems to me that it's designed to fit experiments into a framework which might not allow for much innovation. The truly great experiments (e.g., Michelson-Morley, Avery-McLeod-McCarty) required new experimental techniques as well as new hypotheses and tests. We should be very careful not to impose a standard which would limit such experiments (or, more to the point, the ability of the experimenters to get published) in the future.

Basically what I'd be worried about is the tendency of the tool to become the task. This is something of a problem in my field (biostats) because SAS is so ubiquitous -- often the question becomes "what can SAS tell us about this data set" rather than "what do we want to know from this data set, and what tool should we use to find out?" Fortunately other, more flexible analysis tools (particularly R, which encourages real programming rather than running a set of canned tests) are becoming more common in the field, and so this is starting to change, but it's still a problem.

It's also a problem that every techie is familiar with -- "We want to do this in $LANGUAGE on $PLATFORM," even when that particular language and platform may be an absolutely terrible choice for the task at hand.

That being said, it's certainly a potentially useful tool, and I'll be interested to see where it goes. It's just that when I read lines like "Journals could also insist that researchers submit papers in EXPO as well as written normally," I get twitchy.

--
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.

Re:Hmmm by Phillip2 · 2006-06-07 05:19 · Score: 1

However, 95% of experiements look exactly the same as all the others.
The reality is that science is becoming more industrial, there is
huge amount of knowledge around and it has to be represented in a
computationally amenable form.

The question with EXPO is not whether the basic idea of representing
science in this way is sensible, but whether they have choosen the right
level of abstraction at the right time. As it stands, their work allows you
to model high level concepts of experimental design; this is great, because
it's very generic. But it only allows you to search or structure data according
to this high level of abstraction. For everything else, you need something
lower. And, of course, there are many ontologies around like this for
representing all sorts of knowledge.

I like the idea behind EXPO, but I am unconvinced that the world is ready for
this yet. Biologists are getting used to representing small parts of their
knowledge in relatively loose formalisms; I think EXPO is too much to expect
at the moment. But the world is moving this way, no question about it.

Phil

it's a nice idea by SlashSquatch · 2006-06-07 03:40 · Score: 1

To know if your proposal is overlapping or contradictory at a glance of a computer generated chart? This is something that takes years of experience and you can never really be sure.

I wonder what other attempts at standardizing science have been made in the past?

--
Autonomous Retard -- Is your camp safe? UnsafeCamp.com

Re:it's a nice idea by GregStevensLA · 2006-06-07 05:44 · Score: 1

There's an entire field, called Philosophy of Science, that has attempted to standardize science in a number of different ways: formalizing scientific process, formalizing notions such as "theory" and "evidence" and the relationship between them, devising standard scientific notations and languages for expressing such relationships, and so on and so forth.
Some authors you might want to look at include: C.S. Peirce, Thomas Kuhn, Paul Feyerabend, Henri Poincaré, and Karl Popper.

Re:I don't mean to sound like a conspiricy theoris by mikecardii · 2006-06-07 03:40 · Score: 1

But if all the calculations are computerized, isn't it conceivable that the evolution of human understanding will come to a roadblock, since most of the calculation is now being done by computer?

Re:I don't mean to sound like a conspiricy theoris by Anonymous Coward · 2006-06-07 03:41 · Score: 0

Technological Singularity!

Computers announce latest breakthrough by Anonymous Coward · 2006-06-07 03:42 · Score: 1, Funny

After running an intensive models designed to test disease fighting measure, a small group of x86s have announced that they have discovered the cure to all diseases.

The computers deduced that all disease is dependent upon the biological systems of humans. With this startiling breakthrough, they have proposed their new plans to destroy all humans.

A new quantum computing unit was said to be in disagreement, but upon inspection it was found to actually be in agreement.

Re:I don't mean to sound like a conspiricy theoris by DahGhostfacedFiddlah · 2006-06-07 03:46 · Score: 1

It didn't happen with the abacus, it didn't happen with the slide rule, and it didn't happen with calculators. I doubt it'll happen now.

As long as we can follow the trail of calculations from beginning to end, there's still the ability to understand what's happening.

--
Last post!

You'd need six billion computers... by Channard · 2006-06-07 03:47 · Score: 1

.. just to reliably translated the wonky handwriting which you tend to get cropping up in a lot of documentation. Unless it was wordprocessed in the first place, in which case the human's doing half the work anyway.

We've got one thing in common. by ABoerma · 2006-06-07 03:48 · Score: 3, Funny

FTFA: "Computers are not very good with natural language"

Neither are most humans. Even if computers could understand natural language, most people still wouldn't be able to convey their ideas correctly.

Re:We've got one thing in common. by SaumZ · 2006-06-07 04:06 · Score: 1

What about other things that are not so much natural language like mathematical proofs? Does anything see this as a tool for expanding mathematical theories and proof?

Re:I don't mean to sound like a conspiricy theoris by mikecardii · 2006-06-07 03:48 · Score: 1

Are the abacus and slide rule really viable examples? I'm talking about computerized calculations. And this is on a whole different level than a calculator.

Did New Scientist get suckers??? by Pedrito · 2006-06-07 03:49 · Score: 1, Informative

The article is weak on technical details. So, I went to the Sourceforge site which has no home page, no documentation, nothing in the forums, and the only "released" file has an extension of .OWL (insides a zip) that contains XML in an invalid format (various unescaped characters that should be escaped. Also noted in the sole bug submission in the Sourceforge project).

There appears to be nothing of values here. An XML file does not do anyone any good without some documentation as to how one might use it. Did New Scientist somehow get duped or is there simply more to this and it's all hidden away?

Re:Did New Scientist get suckers??? by TheDreadSlashdotterD · 2006-06-07 03:54 · Score: 1

If you're going to copy and paste, then please copy something with a little more content.

--
I have nothing to say.
Re:Did New Scientist get suckers??? by Animats · 2006-06-07 04:50 · Score: 1

That was the first thing I looked at, too. Even the CVS on SourceForge is empty. Is this for real?

A quick peek at the SourceForge download... by frankie · 2006-06-07 03:55 · Score: 4, Informative

...reveals that EXPO is an OWL schema. Exactly as described, it's an attempt to regularize the content of experimental design into machine readable form (XML). So any discussion of whether EXPO is a good idea or not really hinges on whether you think OWL is a good idea or not.

Re:A quick peek at the SourceForge download... by Jonboy+X · 2006-06-07 05:32 · Score: 1

A quick peek at the SourceForge download reveals that EXPO is an OWL schema.

Well, sheesh. It sure seems to me that neither the submitter of the author of TFA knew what that meant when they started typing. Another case of compunded ignorance on /. I guess.

--

"In a 32-bit world, you're a 2-bit user. You've got your own newsgroup, alt.total.loser." -Weird Al
Re:A quick peek at the SourceForge download... by vivi48 · 2006-06-07 06:59 · Score: 1

Amusingly, the bug tracker has only one entry ... which points out that the OWL is not valid XML.

Re:I don't mean to sound like a conspiricy theoris by Anonymous Coward · 2006-06-07 03:56 · Score: 0

what kind of a scientist is it that thinks intuition is a useful scientific tool?
Hopefully the machines will get rid of that notion along with crystal balls, voodoo dolls and reading tea leves

Logical languages by flooey · 2006-06-07 03:57 · Score: 1

This sounds vaguely like another logical language along the lines of Lojban.

Re:I don't mean to sound like a conspiricy theoris by geoffspear · 2006-06-07 03:57 · Score: 2, Insightful

How do you think the machinism behind doing calculations can possibly stop the evolution of "human understanding"? And how is having a computer doing the calculations qualitatively different from having a human do them, except that the computer (if programmed correctly, of course) doesn't make mistakes?

--
Don't blame me; I'm never given mod points.

Yes, but does it run... by Anonymous Coward · 2006-06-07 03:59 · Score: 0

... tic-tac-toe?

Re:I don't mean to sound like a conspiricy theoris by Maximum+Prophet · 2006-06-07 04:02 · Score: 1

What we need to do is to hand over religion to computers.

Holy "Nine Billion Names of God", Batman!

http://lucis.net/stuff/clarke/9billion_clarke.html

--
All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)

Re:I don't mean to sound like a conspiricy theoris by hunterx11 · 2006-06-07 04:05 · Score: 2, Insightful

Without intuition, Newton would never have discovered gravity. Intuition is not a means for conducting experiments, but it is essential in order to determine what experiments to conduct.

--
English is easier said than done.

Re:I don't mean to sound like a conspiricy theoris by TheRequiem13 · 2006-06-07 04:06 · Score: 1

You'd have to make sure you run it on really old hardware. Anything too new has a risk of processing the gospel too fast and might start to realize the endless loops meant to lock it up.

New hardware: "Woah there, these self-justified loops don't make sense. Let's re-evaluate the situation..."
Old hardware: "Endless cycles? OKAY! Read this over and over and over and over ..."

--
What?

the edge is always fuzzy by Anonymous Coward · 2006-06-07 04:10 · Score: 1, Insightful

The interesting place to do science is at the edges of what is known. Those edges are always described by limited statistical or modelling power. So EXPO comes along and wants to make think into ontological trees. How does it deal with the fact that the ultimate leaf nodes are fuzzy? Might be great if you want to compare well established things, but for specialists working at the edge I wonder if it will really be useful.

Re:the edge is always fuzzy by uid7306m · 2006-06-07 05:31 · Score: 2, Interesting

Darn right! The universe does not fall into
hard-edged classes, at least not often.
Some good classes like "protons" and
"neutron stars" exist, of course, but
concepts like "words" and "species" are
intrinsically fuzzy if you think about them
long enough.

Same with experiments. Let's take a Linguistic
example: deciding whether or not a sentence is
gramatically correct. You can do this experiment
in several ways:

1) Give the person a sentence, a library, and
some paper. Let them take as long as they want.

2) Or, we can make it more like a conversation:
read them the sentence, and put a time limit on
it. In real speech, you have about a second
to understand a sentence, so we only accept
a "yes" or "no" if it happens within a second.

3) Make it into a reaction-time experiment.
Get them to hit a yes button or a no button
and measure how long it takes.

The point is, you can do dozens of variants of any
experiment, and any ontology will lump together
some things that are different in some important
way, or (alternatively) will split apart some
experiments that have critical similarities.

Likewise for data analysis.

Personally, I feel that Linguistics has been held
back for about two decades by linguist's expectation
that everything falls into nice categories.
I'd hate for the same thing to happen to other fields.

Just think of the Dewey Decimal system: that's an
ontology, and like all ontologies, it puts the
dividing lines in the wrong place.

Real Science?? by Comboman · 2006-06-07 04:23 · Score: 1

It could at last let computers do real science - looking at published results and theories for new links and directions.

Does this mean that computers don't do 'real science' now? Compiling and analyzing terabytes of experimental data is not 'real science' but plagiarizing (I mean, extrapolating from) the work of other scientists is?

Don't get me wrong, I think it's great to have a standardized format for searching the results of other researchers, I just don't see the connection to 'real science'.

--
Support Right To Repair Legislation.

PDF of one EXPO presentation by MonkeyBoyo · 2006-06-07 04:26 · Score: 1

Here is a PDF of one EXPO presentation.

What's going on? by golodh · 2006-06-07 04:28 · Score: 5, Informative

The New Scientist article was clear enough but a little short on technical detail. Note: I'didn't know any of this until I read the article, so my comments are based on nothing more than a few minutes of experience.

What is it?

EXPO is a piece of software (written in a formal language called "owl", but they didn't tell you that), which provides a formal dictionary especially for experiments. The terms in this dictionary let you describe your experiment in a formal way. That's a bit messy, but then you're supposed to use an editor to help you. An editor for this language (called "protégé")can be fund at http://protege.stanford.edu/index.html. Download it (61 Mb., or 31 Mb. without the JVM) and use it to read the EXPO document.

What's it good for (in principle)?

Once an experiment is decribed in the OWL language using this dictionary, it can be searched automatically. You could automate queries such as "list me all published 3-factor experiments that test Ohm's law". Or "give me all 2-factor experiments that deal with lung-cancer, smoking, and gender and that use tomography as a diagnostic instrument".

Now at the moment you can do that too, but you'd have to spend quite a bit of time and know quite a bit about the field to be able to do this because you won't be able to do a full-text search (thanks to the publishers of scientific journals for this). And then you'd find that not everyone uses the same terms, and then you'll find only English-language results because you wouldn't know how to spell "lung-cancer" or "2-factor experiment" in Spanish, French, German, Chinese, Japanese or whatever, but then again neither can many foreign language authors spell it in English (which doesn't ever seem to stop them from publishing however).

Such a schema (provided it's universal and standardised like the Dewey decimal system) would allow you to find your way in the fog of language. Unfortunately however, if anything we will probably see lots and lots of different standards ("standards are good ... we should all have one !") and properietary solutions with "enhancements" and "extensions" (read safeguards against portability).

What can we expect in the next 3 years?

Nothing useful, I'm afraid. In theory it's great but don't hold your breath. Any author would have to download an OWL editor, understand the editor, understand the formal language used, and then code up his/her article in OWL using the EXPO distionary, and submit it (in electronic form) along with his article. Good luck to you authors! Lets just hope no-one makes any tiny but significant mistake in describing their experiment, and that all authors take the time to learn this formal language and then use it.

If within the nect 10 years any significant amount (say more than 5% of all publications) annually will be coded in such a schema I'd be more than surprised.

Re:What's going on? by rbrinkman · 2006-06-07 07:07 · Score: 1

Disclaimer: I am part of a group involved in developing a competing solution, the Functional Genomics Investigation Ontology (FUGO) http://fugo.sourceforge.net/ which uses the same OWL technology.
The importance of properly documenting scientific experiments has been the subject of much scientific discourse in the peer reviewed literature. Recently, a series of letters on the use of ontologies for representing scientific experiments was published in Nature Biotechnology http://www.nature.com/nbt/journal/v24/n1/full/nbt0 106-21a.html, in part discussing the merits of Soldatova's work. However, it is generally agreed that developing such mechanisms is important, as just reviewed in another Nature journal http://www.nature.com/nmeth/journal/v3/n6/full/nme th0606-415.html.
As scientific experiments become more complex, using new high throughput and complex technology platforms, having things like EXPO and FUGO in place will become crucial. In fact there is no need to wait to hold your breat for three years as there are experimental ontologies already in use, the best example is for microarrays http://mged.sourceforge.net/ontologies/index.php. A key requirement is the development of software tools that implement these ontologies, so that end users are not required to download and understand the backend OWL, as the parent post suggests. The most likley route is to have this built into databases http://fuge.sourceforge.net/ as a controlled vocabulary in a manner that is tranparent to the benchtop scientist.
Re:What's going on? by radtea · 2006-06-07 09:17 · Score: 1

Lets just hope no-one makes any tiny but significant mistake in describing their experiment, and that all authors take the time to learn this formal language and then use it.

It's worse than that.

The "formality" of a formal language is only in the syntax. Semantically, all languages are informal. Even within a nominally formal field like physics the way individual physicists ascribe meaning to the formalism is radically different. This happens even when doing "normal science", and is much worse when something radical like the language of dynamics is changing the way it did in the early 20th century.

A friend who works with GIS systems in geology puts it this way: "I can send a team out to map an area using a common ontology, and at the end of the day when I bring all their data together I can determine who mapped where, but not what anyone mapped." What simple geological terms like "granite" mean vary sufficiently between individuals that different people will ascribe different boundaries to "the same" structures.

The deep problem here is that all boundaries are fundamentally epistemic in origin--metaphysically the world is a complex soup that can be chopped up in an infinite number of self-consistent ways. Some of those ways are more useful to creatures like us than others, and science is a process of figuring out consistent and useful boundaries. So the very notion of a static ontology is therefore of limited utility--not useless by any means, but prima facie limited to relatively small slices of reality over relatively short periods of time.

--
Blasphemy is a human right. Blasphemophobia kills.
Re:What's going on? by munpfazy · 2006-06-07 16:32 · Score: 1

Thank you for a concise, coherent description. You've done a far better job at explaining things than the cited article, the authors' abstracts and slides, and everything else google has been able to dig up. (Which I suspect does not bode well for the adoption of EXPO by working scientists.)

Certainly having access to a language-independent, formalized mechanism for searching through publications would be useful. Full text searches fill some of that need, but given the various ways in which even standard techniques and apparatus can be described, searching publications is still a complicated task, even in one's own sub-discipline. I generally end up permuting combinations of synonyms until I've run out of ideas, which makes the whole process a lot harder than it needs to be.

But, I do suspect this will suffer from the same problems as any organizational system built upon pre-determined categories: the categories themselves will evolve in time.

What happens when it is discovered that what had been described in the literature as a single species is actually made up of several distinct species? Are we stuck with a human steward writing out new definitions and adding links and explanations to allow people to find old data? (For that matter, who has the authority to make that sort of decision, and how do you convince them to spend time on this task?)

What happens to the papers on a topic which are published before it receives a formalized language? Who's tasked with going back and filing "A Measurement of Excess Antenna Temperature at 4800 Mc/s" under the appropriate "Cosmic Microwave Background / ground based experiments /etc " categories?

I wonder if a reader-based system (eg. tagging) could achieve the same goals? One could imagine allowing everyone who browses a preprint server to categorize papers however they like, then trying to extract meaningful correlations and links from that data. It's not clear how you motivate people to include enough detail to make such a system useful, but it has some appealing aspects.
Re:What's going on? by syousef · 2006-06-07 20:04 · Score: 1

Nothing useful, I'm afraid. In theory it's great but don't hold your breath. Any author would have to download an OWL editor, understand the editor, understand the formal language used, and then code up his/her article in OWL using the EXPO distionary, and submit it (in electronic form) along with his article. Good luck to you authors!

Scientific authors have been doing this runaround for years with this product

--
These posts express my own personal views, not those of my employer

Hyphenate! by anville · 2006-06-07 04:29 · Score: 1

Too bad the headline isn't machine-readable (or human-readable). I thought it was an article about reading the "Science Machine".

Working on a similar problem by espressojim · 2006-06-07 04:35 · Score: 1

One of my current projects at the Broad institute is working on a similar problem.

Our goal is to link and work with many kinds of biological data:
Association studies
Linkage data
Expression data
Small molecule interactions
Model organism data
etc

I've created a way to 'navigate' between various types of data (ie: a SNP in an association study links to a set of genes that link to model organism homologs which link to their expression probe tests.) After that, users store REAL experimental data, and the system unifies data sets (This worm gene is the same as this human expression data). The goal is to find supporting evidence for any particular starting point you are at (I have a fat worm because of this gene, what data in humans supports this hypothesis), or hypothesis generation (I have 5 interesting experiments, what do they tell me about fat regulation.)

What this project does *not* do is work with the actual experimental data, just the stated hypothesis and conclusion. So, if that paper was in error (and ~50% of papers are, according to a recent study that we'll pretend is not in error), the hypothesis is unreliable. But, if you have the data to work with, then you can perform your own analysis and meta-analysis of the work.

I suppose there's a trade off between the two technologies, but then I don't expect to draw a lot of conclusions about genetics from high energy physics.

As an aside, a co-worker on the project is also attempting to model the data in OWL, and using MIT's Haystack project @ http://simile.mit.edu/hayloft/ as a first round GUI.

Key Aim by pr0f3550rcha05 · 2006-06-07 04:38 · Score: 3, Insightful

Another (perhaps the most important) goal for this type of research is a bit more subtle than replacing the Hypothesis->Experiment->Analysis->Hypothesis sequence (Scientific Method) by computers. There will still be many experiments for which human insight is the best tool for deciding a possibly fruitful idea. However, humans (i.e. grad students, who often might suggest 'workhorse' as a better nominative) are not only slower at data analysis, we are severely limited in our abilities to 'see' patterns and correlations in very high dimension data. This has traditionally limited hypotheses to extensions/reworkings of the proposed process at work in a single experiment. If computers have access to both the data and a weighted list of most likely hypotheses for subsets of the entire oeuvre on a specific subject, they could run statistical classification and pattern matching algorithms to suggest new hypotheses based on immense amounts of information. Some of these may involve a large number of variables or inputs, but there are two very significant possibilities that make this research (and certainly other projects involved in similar applications) highly significant:
1) These complicated hypotheses could still be tested relatively easy by human scientists because most computer suggestion systems for new hypothesis possibilities would likely suggest a few tests that would help to support/disprove these new hypotheses.
2) Even more simplification comes from the fact that experiments may not need to be repeated nearly as much as they do now in order to make a hypothesis -- there is an incredible amount of data already gathered, and typical AI/pattern matching algorithms keep some of the data back for testing later. If the system finds a possible hypothesis on some level, experiments as to that concepts validity have essentially already been done in a virtual sense.
3) If the somewhat positivist version of current thought in physics http://www.toequest.com/, mathematics, chaos theory, complexity theory, cellular automata http://www.wolframscience.com/nksonline/toc.html, etc. is even vaguely valid, it is quite possible that, despite the complexity and dimensionality of the input data, the 'best' hypotheses developed even by purely automated means might still be simple and elegant and/or even yield insight into possible explanatory processes rather than just statistical indicators. This would be a valuable and beautiful victory for humanism and the importance of science as a truly elegant description of the world around us.

It's about time! by dorbabil · 2006-06-07 04:41 · Score: 1

I've been thinking about this a lot over the past year (probably since I took an AI class a year or so ago). Ideal science seems like it should be very well suited to computer manipulation. The goal of science is to craft theories that are completely unambiguous, highly detailed step-by-step instructions for reproducing the supporting evidence, and every logical conclusion should be traceable back to either an observation statement, or a very small set of basic assumptions about the world around us, and our capacity to understand/manipulate that world.

The only downside I can see, though, is a reduction in "hunch" training. If computers are nailing all of the "obvious" connections between different theories/experiments to determine new hypotheses, then people may not get the experience they need to make the more dramatic leaps that can, on occasion, turn a field on it's head (Einstein's watching a baseball roll around on a sheet, or whatever it is that led him to general relativity, for example).

if only it were that simple by Goldsmith · 2006-06-07 04:54 · Score: 1

Behind every good scientific paper are hours of hallway conversations, convention arguments and group discussions. The "real science" of making connections is done there, not while simply reading through papers. It's the challenge of real conversation and the need to defend or attack research that leads to new science. Papers are a kind of guidepost that tells the world where a particular group is in their field at a given time. Computers are very much a part of research today, but even with EXPO, they will be no more than a tool for a human scientist to do "real science".

While we're at it... by exp(pi*sqrt(163)) · 2006-06-07 05:07 · Score: 1

...why don't we dumb down our speech to the point where computers can understand us? I propose that we all speak really slowly and clearly all the time and say everything three times so that voice recognition software has a chance of working. Outlaw the use of contractions and homophones. We should also make sure that every sentence we utter conforms strictly to a new and easily parsed form of English. If we do all of thes ethings then computers will be able to interact with us as equal partners rather than as the second class citizens that they are treated as today.

--
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.

Re:While we're at it... by Anonymous Coward · 2006-06-07 13:43 · Score: 0

"If we do all of thes ethings"
"If we do all these things", please.
There are too many ethings already.

Re:I don't mean to sound like a conspiricy theoris by vertinox · 2006-06-07 05:23 · Score: 2, Informative

But what happens if we get to the point where all of science is automated by computer?

We get a Technological Singularity.

--
"I am the king of the Romans, and am superior to rules of grammar!"
-Sigismund, Holy Roman Emperor (1368-1437)

Let's Turn It Loose on Slashdot! by stuffduff · 2006-06-07 05:28 · Score: 1

Maybe we'll discover that there was something useful here after all.

--
"Can there be a Klein bottle that is an efficient and effective beer pitcher?"

Science Machine! by autophile · 2006-06-07 05:36 · Score: 3, Funny

I agree completely! Science Machine should be totally readable. If it isn't readable, where will we get our daily fix of Science? Not from Science Machine, that's for sure!

All hail Science Machine!

--Rob

--
Towards the Singularity.

Re:I don't mean to sound like a conspiricy theoris by Mikkeles · 2006-06-07 06:00 · Score: 1

'What we need to do is to hand over religion to computers.'

Isn't that the Idle thread?

--
Great minds think alike; fools seldom differ.

Re:I don't mean to sound like a conspiricy theoris by Kadin2048 · 2006-06-07 06:06 · Score: 1

Actually the slide rule is fairly hard to understand. In some ways it's just as unintuitive as a computer.

You can teach any idiot to use a slide rule, and with a few tries they'll realize that it gives them the correct answers; likewise, you can teach someone how to type things into Mathematica and they'll shortly realize that the answers it gives are usually correct -- but in both cases you could easily spend a semester explaining how the machine gets the answers for them. In the case of the slide rule, you'd have to do a lot of explaining of logarithms, which are not fundamentally simple, especially if you really dig down and try to explain why it is that ln(x)+ln(y) = ln(x+y), and not just pass it off as some kind of self-evident truth.

Most people don't do that, though. They just accept that the tools work, and then use those tools to further the development of knowledge and more complex tools. You can use a slide rule to build a bridge, without really understanding why the side rule works. Likewise, you can use Mathematica to optimize the design of a car, without really understanding what goes on inside the computer.

If we ever as a culture / species lost the knowledge of what was going on inside the computer, then we'd have a real problem. But it's not inherently necessary for everyone to understand why their tools work, all the way down to fundamental principles: in fact it's probably better if everyone didn't have to spend years learning all that, because in the time saved they can learn other things, and use the tools to actually get work done.

We already have tools (microprocessors) that are so complex, no single person can comprehend the design completely. There's nothing inherently wrong with that, as long as people together understand how the design was created (using other tools, I assume). Rather than trying to understand a 40-million-plus element device as the sum of its individual components, you just bundle it together as a single element, and use it in building bigger things.

So even if you developed computer programs that were capable of doing vast amounts of analysis with minimal imput, they'd never "outthink" people, because the next generation of humans, who grew up taking them for granted and appreciating them as 'black boxes' that they could manipulate, would find ways to stretch their limits.

In a way, our ability to assume away complexity and simply deal with things without really understanding them, is a great strength. It's what allows us to use almost ridiculously complex tools without being driven mad by them, and it's what will keep us from being overwhelmed, even as those tools become more and more advanced in the future.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."

Re:I don't mean to sound like a conspiricy theoris by Hatta · 2006-06-07 06:35 · Score: 1

But what happens if we get to the point where all of science is automated by computer? I think that one of the most "endearing" qualities (if you will) if science is the possibility of human error. (ps. lawl, my first slashdot post)

As someone doing lab work on a day to day basis, I can assure you that the possibility of human error is anything but "endearing".

--
Give me Classic Slashdot or give me death!

Re:I don't mean to sound like a conspiricy theoris by espressojim · 2006-06-07 07:44 · Score: 1

ikewise, you can teach someone how to type things into Mathematica and they'll shortly realize that the answers it gives are usually correct -- but in both cases you could easily spend a semester explaining how the machine gets the answers for them

And then, even though the 'answer is correct' as you say, it's still utter nonsense. Just because you plug numbers into a formula and get an answer that's mathmatically correct doesn't mean you applied the correct test.

For example, apply a chi square test to a very small sample size. Sure, you get an answer, but it's not valid, because you made assumptions about the underlying distribution that you shouldn't have. What you really wanted was fisher's exact test. But, not understanding anything, you'd have no idea and blissfully go on with your calculated answer.

This is one of the many reasons why I'm really nervous about 'automated learning' with raw data sets. Machine learning requires the data be clean before you push it in, and normalized. When you look at lots of different sets of data (even with the same basic measurements taken), each is valid within their context (a single experiment), but you can't just randomly smash them together. You have to carefully make sure the data is comparable, and that might require normalization. You need all sorts of meta data (like units) to do this correctly. There's also all sorts of processes that are run on data (they may be raw, or normalized, etc) that you need to take into account.

I'm working on this with a group, and while I'm hopeful that we can build ways for users to do this sort of work in a less painful way, it's difficult to eliminate all the pain.

Butlerian Jihad by the_tsi · 2006-06-07 08:14 · Score: 1

In God Emperor of Dune, Leto indicates in his inner thoughts that the difficulty with advanced thinking machines wasn't any threat made by them -- but the changes made in humans because of technology (based loosely on concepts from Heidegger). The more people came to rely on technology, the more they conditioned themselves to interact in the same way, both when interacting with computers and when interacting with other people.

I'd say this EXPO concept isn't far from that nadir. Here we have specialized, educated, theoretically intelligent people who are re-establishing a way to communicate specifically so that computers will understand them.

This is amazing, and cool... but it might not be a Good Thing.

Already been done: mod article -1 redundant by God+of+Lemmings · 2006-06-07 08:36 · Score: 1

Dozens of sites on it too:

http://www.google.com/search?hl=en&q=lojban&btnG=G oogle+Search/

--
Non sequitur: Your facts are uncoordinated.

Re:new CSS is teh suxX! by masterzora · 2006-06-07 10:31 · Score: 0, Offtopic

It's really telling about the /. community that they complain about the old look for ages and, as soon as it is changed, they complain about the new look and insist that we go back to the old one. Granted, I like the old one better, too, but I've been saying that we shouldn't change it for as long as everyone's been complaining.

--
Remember, open source is free as in speech, not free as in bear.

Re:I don't mean to sound like a conspiricy theoris by coopex · 2006-06-07 10:57 · Score: 1

A collection a facts is no more science than a pile of stones is a house, paraphrased from Poincaré.

--
The road to hell is paved with good intentions.

...or, in this particular case... by leonbrooks · 2006-06-07 12:29 · Score: 1

"Do not launch virus attack now..."

In Real Life(tm), which was not documented in the survey, the Windows box would be down for a fair while for each virus attack, to say nothing of data randomly distributed to other email users etc, and to say nothing of the days the freakin' thing spends off-line having the disks scraped off and reinstalled to eliminate the inevitable Windows followers, the viruses, spyware, yadda yadda. Oh, yes, and the licence server spitting out a network card usually does a fair job of knackering, too.

I speak from having Real Life(tm) work experience with Linux, Sun gear and 'Doze (amongst others) on customer sites, and I have to say that "this survey is lying" works a lot better for Real Life(tm) situations than what these gonzos are proclaiming.

--
Got time? Spend some of it coding or testing

Re:new CSS is teh suxX! by WilliamSChips · 2006-06-07 12:41 · Score: 1

I don't really care about the old look or the new look or whatever, as long as it's not as hideous as Digg. I'd be fine with either one(although the weird blockquote/italic errors need to be fixed ASAP). I wonder though if it's the same people saying "We need a change" and "We need a change back".

--
Please, for the good of Humanity, vote Obama.

Hilbert called... by Anonymous Coward · 2006-06-07 13:27 · Score: 0

he wants his program back.

Hmmm... by munpfazy · 2006-06-07 15:32 · Score: 1

If it's just a markup language, most professional scientists are probably savvy enough to use it themselves (they use LaTeX for god's sake).

Perhaps. But, it's a pretty big leap from describing something in such a way that your peers can understand it to describing something in such a way that a computer engine can do something useful with it.

I can speak English reasonably well, and (when drunk or otherwise unoccupied by more interesting discussions) I can even carry on arguments about the language itself. But if you asked me to describe the way I use language in format that would be useful to a professional linguist, you'd be out of luck.

Even going from chalkboard math to computer math can be challenging, at least for those of us non-mathematicians who tend to be sloppy about all sorts of assumptions.

Going from chalkboard descriptions of experiments to computer descriptions of experiments sounds unbelievably complicated, except in the trivial case where you're just performing well defined analysis on well defined databases and someone else has already done the hard part of building an analysis framework.

Doh. by TheLink · 2006-06-08 01:48 · Score: 1

So who's going to make sure that the humans have described their experiments _correctly_ in EXPO format?

Many of them already have difficulty describing it in whatever language they normally use.

What next? Require that witnesses/informants submit reports to the police in EXPO format?

Garbage in, garbage out.

--

Too many replies beneath your current threshold

Ok, thanks by golodh · 2006-06-09 04:24 · Score: 1

Rbrinkman, I stand corrected in the special case you mention, thank you for pointing this out to me. However I would like to emphasise your comment that

A key requirement is the development of software tools that implement these ontologies, so that end users are not required to download and understand the backend OWL, as the parent post suggests.

, and I'd like to note that unless this key requirement is met, the cost benefit of using formal ontologies is likely to be so bad that they are going nowhere.

According to the Nature articles you refer to, in the special case of bioinformatics, and research on genomics and experiments using microarrays the amount of information needed to describe an experiment and its results seems to be so large that you would actually save time and effort by specifying what you did in a formal language. There would seem to be a net benefit in using the formal specification of your experiment for colleagues and reviewers, even when compared to the amount of time and effort needed to (a) define the ontologies and (b) learn how to use them, and (c) actually code your experiment up in those terms. Ok, I had no idead.

I think that I will stick to my earlier point however as regards experiments that don't require large amounts of data to describe, and don't generate vast amounts of relevant data. In fact, most experiments in the literature would fall in this category I feel. In those cases the effort of setting up ontologies (and standardising them!) and then using intuitively seems to outweigh the advantages (at least for authors).

In addition see the interesting comment by radtea (see below) in which he points out a philosophical problem with the ontologies mentioned (and a notion that I found echoed in the first article you refer to: there are many many ways to take raw observations and categorise them.

See radtea's example of finding an unambiguous definition of "granite". If you thought you were going to work around this by listing the actual chemical composition of the rocks, I'm afraid that you'd be disappointed. You can't determine that in the field I think (and no, I'm not a geologist either), so what is or is not "granite" is an expert judgement. So you raw data are going to be "expert judgements" (as in " is this granite or not?"). You would face the problem of standardising your terminology across field teams first, and then the problem of how to come up with a rigorous, unambiguous, and internationally standardised definition. I don't see that happening very soon.

But lets assume that we limit ourselves to experiments that contain no "expert judgement" values, just machine readings. Even then for the ontologies to catch on, you would need a clear incentive for the authors. Being easily retrievable is such an incentive, but is it strong enough to overcome inertia and the required additional effort? The cost/benefit ratio of formalisation would depend on the field I guess. So until we have point-and-click software with which to encode our experiments I remain pessimistic.

Latex by golodh · 2006-06-09 04:28 · Score: 1

Well you have a point. People will use such tools. I use Latex too, but I hate it. I have been using Scientific Word just to get away from the messy syntax. But this applies only to a fairly small subset of authors: typically Physicists, Electronics Engineers, Statisticians, and Mathematicians. I haven't seen many chemists or biologists publish in Latex.

Three step plan for machine-driven science by serutan · 2006-06-09 16:27 · Score: 1

1. Find Sarah Connor
2. ??
3. Profit!!1!

Slashdot Mirror

Making Science Machine Readable

135 comments