Slashdot Mirror


Better AI in Image Analysis Software?

J.P. Duke asks: "There is an excellent research article published by the Mayo Clinic in the J Ortho Sci that compares two common software-based approaches in analyzing scanned protein gels. Among other conclusions, they found that the two most popular applications for this research had different tendencies in quantifying proteins -- and that differences in AI algorithms show clearly different results for proteins that are less-separated on gels. This implies that much major scientific research that depends on these tools might be suspect to flaws very early in analysis. Being a cancer researcher at a large research institution, much of my work depends on software being able to accurately analyze scanned images of protein gels in which proteins are simply displayed as spots on the gel. Among other things, the software needs to be able to precisely calculate the density of protein in a spot as well as the number of actual proteins contained in a spot. What we choose to investigate further as potential biomarkers for cancer depends heavily on the ability of the AI built into these applications." Exactly how far has image-based AI improved in the last several years? Might some of those improvements help someone in J.P.'s situation? "My questions for Slashdot are as follows:
  • Overall, how good has research image software AI become in recent years? Have there been any key software or mathematical breakthroughs that have substantially increased the 'intelligence' of software? How far along is this technology?

  • Based on your knowledge of software, what are some things researchers can do to help the software better do its job? For example, using a high quality scanner at higher resolutions generally helps results. What other things can be done to promote better results?

  • Finally, all applications that I know of in this area are expensive commercial solutions. As the companies that produce the applications are for-profit, the algorithms and technology used are completely closed and proprietary. Thus it is hard to understand what the software is really doing. Does anybody know of any open source (or at least 'open algorithm') solutions? Even if they are inferior at this point in time, being able to clearly understand what the AI is doing makes us better off in several ways.
Thank you all for your help, it is greatly appreciated."

19 comments

  1. use the Gimp 8p by Anonymous Coward · · Score: 0

    Lol

  2. AI might not be the proper term... by ComputerSlicer23 · · Score: 3, Interesting
    I've worked with several people who are grad students or professors in this area. The first thing you'll probably want to do when looking for research, is use the term "expert systems", possibly in place of "Artificial Intelligence", possibly in conjunction with it. Full Disclosure: I don't know anything about this area really, they just used me as a sounding board to practice presentations, or get ideas about how to tweak an algorithm or two.

    My former boss was working on his Master's Thesis, what worked on recongizing shapes based on edge boundary analaysis (among other things as I recall). He worked with the professor who as an expert in "Artificial Intelligence". However, they generally referred to the types of work my boss was doing as "Expert Systems", not as "Artificial Intelligence".

    Kirby

    1. Re:AI might not be the proper term... by deblau · · Score: 1
      Mod parent up. A lot of people call a lot of things "AI", but "expert system" is a perfectly good way to describe most of what they're talking about. I reserve "AI" for 'thinking machines', not just searching, sorting, and pattern recognition algorithms in some very limited problem space.

      I propose the following Test for Artificial Intelligence: the machine has to (gently) stop a four year old girl wielding a peanut butter and jelly sandwich from destroying the DVD player. If you can show me a machine that does that, I'll award you the Prize. Medical scanners, image recognition software, genetic algorithms, and anything else these days that claims to be "AI" don't have a snowball's chance of passing my test, and make very little progress in that direction.

      P.S. It is the year 2005. But where are the AI bots? I was promised AI bots, I don't see any AI bots! Why? Why? Why?

      --
      This post expresses my opinion, not that of my employer. And yes, IAAL.
  3. Amazing.... by zappepcs · · Score: 3, Interesting

    This is amazing for several reasons. First, I think I'd get fired for letting software (that may or may not be working correctly) do a job that is so important and not have any humans checking the work.

    Second, AI in general has been smoke and mirrors from the start, in all of its generalized forms. Its amazing that there is little to show for this particular sci/tech branch of engineering after so many years and attempts.

    Currently, there are tons of people investigating how the human (and animal) brains work to better understand 'intelligence' in order to create a better AI. Everyday, if not on /., on other news lists, there are little news stories of some observation or breakthrough in that area.

    So the answer is that yes, AI is coming along, and specifically computer vision. You can google it yourself. From the DARPA Grand Challenge to NASA and many other ventures, computer vision is being improved. The more improvement there is for computer vision, the better the algorithms can get for recognizing protien smears on a picture.

    I think that you will find there are people who are not only using visual scanning, but compiling this with IR and other types of scanning to better analyze the material.

    Computer based vision analysis is everywhere around you. The airline industry uses robotic scanners to look for structural defects in planes by scanning every mm of the surface in several ways. This is done mostly by computers.

    Mining and geologic communities are putting robots with computer vision and scanning software to work to find thing that is just impossible by the human eye. Say a robotic helicopter flying over a mountainous area scanning for fire prone areas using IR, sonic and other types of scanning.

    The oil industry has been using image analysis for years to find better oil sources in the earth.

    This type of stuff is all around us. Finding F/OSS sources of it is perhaps just a matter of scanning for it. Better yet, when you find some, put out some payola to support their efforts. There are open source computer vision projects. Intel has made efforts to support this among others.

    Electronics manufacturing is using it as well. I think that if you can focus some funds toward the right group, they will have the tools to develop the specific types of image analysis that you require for your industry.

    I imagine that scanning protien smears is not much more difficult than finding micrometer sized fractures in the skin of an airplane, or finding hard to see stars using amature telescopes and computer driven camera technology.

    Spend some time with your new friend Google.

    1. Re:Amazing.... by Hast · · Score: 3, Interesting

      The problem with "AI" as a term is that it is pretty much all encompassing. It's a bit like Theory of Everything amongs physics.

      But I wouldn't say that research has been futile though, far from it really. The thing is that ever since humans began thinking about intelligence we have formed new hypothesises about how it works. However it really wasn't until we started to try and replicate the effect in a "dumb" system that we really got to the bottom of things. In the early days everyone was confident that we'd soon have this AI thing working. Since they we have discovered a lot of things which are not AI which we previously thought would be AI. (Things like chess computers, expert system and so on.) It turns out that most of these are just different (elaborate) ways to search.

      While that knowledge hasn't really given us any AI computers, we have gained a lot of knowledge about how intelligence works. (Or rather, how it doesn't work.)

      I have worked a bit with computer vision, image analysis and AI; I think what you really want to look for is computer vision or image analysis tools, not "AI". In my experience if anything has "AI" in the title it's likely bogus marketing at work - buyer beware!

      When I was taking classes in this area some interesting work wrt cancer was done by multispectral imaging tools. This basically mean that you use images that are taken outside the range of our human vision to look for things. This is one quite efficient way to find things that are very hard or impossible to find in normal pictures.

      My advice to the original poster is to look for help at technical universities in your area. I know that the university where I studies they cooperated with research hospitals and other medical places to find new ways of doing this type of things. You could also look around for companies that does this sort of thing and start examining what they have.

    2. Re:Amazing.... by jmt9581 · · Score: 1
      This is amazing for several reasons. First, I think I'd get fired for letting software (that may or may not be working correctly) do a job that is so important and not have any humans checking the work.

      You very obviously don't work in the field of proteomics. There's really not a good way for a human to check the algorithm's work without doing more laboratory experiments on the spots. So, having a human checking the work is more detailed (and more expensive) than having a guru glance over the output. The problems that the OP was talking about aren't the kind that a human eye can interpret, and I have some doubts about whether a 2D-gel even has the necessary information to answer some of the questions that he asks. There may be some software vendors that claim that their software can tell you how many different proteins are in a spot, but I wouldn't buy that claim without seeing some very good experimental evidence.

      I imagine that scanning protien smears is not much more difficult than finding micrometer sized fractures in the skin of an airplane, or finding hard to see stars using amature telescopes and computer driven camera technology. I disagree with your imagination. It's not just a problem with identification of spots on the protein gel, the big problem is in interpreting what the spots actually mean. To use your example, it might be like finding micrometer sized fractures on the skin of an airplane skin, and then attempting to pick what sort of object initially caused the fracture (assuming that the fractures are caused by collisions with debris, rather than some defect of the manufacturing process).

      You are correct in that rather than AI, the OP should be looking for image analysis algorithms. After taking a course in computer vision last fall, I found that even object recognition algorithms aren't as refined as I thought they would be.

      --

      My blog

  4. Image Processing by jd · · Score: 2, Interesting
    This is still a very primitive field, so any advances are likely to be major ones (relatively speaking). Techniques used in practice are likely to be much more primitive than techniques established in the field (see the story from a while back on digitizing old Disney animations).


    Added to all this, a lot of scientific software is less than spectacular. I can still remember using a package that simulated protein extraction experiments, used to train people in this sort of field. I was routinely extracting 102% - 105% of the available protein, which suggests an error in the calculations somewhere!


    In general, the techniques you want to use on something like this will involve fairly basic, well-established methods and minimal software intelligence. For an image, you can generally place lower and upper limits on values that are of interest, so you use filters to give yourself the middle band of interest, and contrast stretching to amplify that data.


    You want non-linear contrast stretching, so that the values of greatest interest are differentiated the best.


    To make something visible to humans, you want to make the baseline - in this case the medium - to be your background, which should always be black when displayed. This is because the eye is better at finding something that is present than something that is absent.


    If you use AI, use it to predict where the contours SHOULD be, based on existing information, and then use that to select only those regions which differ from the expected, and contrast-stretch those, with no filtering. If there is an unexpected value, it can be because the data is not being collected in the expected way, or because the data itself has unexpected properties. You cannot find out the nature of the unexpectedness, if you make assumptions based on expected results.


    None of this is rocket science, which amuses me because in my experience, it has been rocket scientists who have used imaging techniques the best.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  5. some suggestions by rensci · · Score: 3, Interesting

    Overall, how good has research image software AI become in recent years? Have there been any key software or mathematical breakthroughs that have substantially increased the 'intelligence' of software? How far along is this technology?

    The problem is not a lack of intelligence, it's a lack of documentation, reproducibility, calibration, and statistical validity.

    Based on your knowledge of software, what are some things researchers can do to help the software better do its job? For example, using a high quality scanner at higher resolutions generally helps results. What other things can be done to promote better results?

    While higher resolution scans are generally a good thing, the don't necessarily increase the accuracy or validity of the results (and could even decrease it, depending on what the software does).

    Until you get better software, you simply can't trust the measurements blindly: you have to go over spots that are important to you manually and possibly carry out measurements by hand. Other conceptually simple things you can do is compare the results from using multiple image analysis packages, multiple scans at slightly different settings and resolutions, and repeating the experiment itself multiple times; results that are consistent across those conditions are more likely to be "real" than results you get from a single analysis.

    Finally, all applications that I know of in this area are expensive commercial solutions. As the companies that produce the applications are for-profit, the algorithms and technology used are completely closed and proprietary. Thus it is hard to understand what the software is really doing. Does anybody know of any open source (or at least 'open algorithm') solutions? Even if they are inferior at this point in time, being able to clearly understand what the AI is doing makes us better off in several ways.

    Well, there are quite a few published algorithms for this problem, and many of them have been implemented in open source form. Many of them work well at identifying and quantifying visually obvious, isolated spots, which is what they were designed for, but there is little reason to believe that they give meaningful results when spots are fuzzy and/or overlapping. There are some methods that potentially can quantify overlapping spots, but validating such methods is difficult and I doubt that the commercial packages have done this.

    I work in an academic research group working on finding and precisely quantifying fuzzy spots in another domain (and we are planning on releasing our software fully documented and in open source form); quantitative analysis of gels would be another possible application. If you like, let me know your contact information and I'll get in touch with you.

  6. If I remember correctly... by 0xC0FFEE · · Score: 2, Insightful
    First, search for "Protein Gels" in google image. What you'll find is COMPLETELY representative of what any software will have to deal with. The situation is completely impossible thanks to those fine biologists. What's a blot, where does it end, what is the intensity, what's the background. Is the background uniform (probably not), how to correct it? You'll find different answers with every fine biologist you speak to.

    The problem is not building some clickety tool for enabling the fine biologists to extract manually their data. That is easy. The problem is having an automated tool to go through the metric ton of images a lab can produce and perform consistently in an otherwise inconsistent set of individual experimental procedures. Suppose you find the magic solution to all those problems, is there scientific background to the image analysis techniques you used? The answer is no, everybody could argue that the results are an artifact of the software. Try explaining how a neural network does it's thing. You can't for a particular image. That's probably why everyone uses NIH image...

    Protein Gels is just a mean for an end. It's an indirect way to obtain information about something. I suggest those fine biologists find another mean to the same end. As it is, gels are extremely unreliable and tricky to get right in a standardized fashion.

  7. Golly...maybe you should read the literature! by Tim · · Score: 2, Interesting

    First off, if you're looking for quality research literature on the automated analysis of 2D protein gels, you're reading the wrong journal (Journal of Orthopaedic Science).

    Rather than posting your question to slashdot, head over to PubMed (still better than google scholar for this type of thing), and search for, say "image analysis algorithm protein gel" and poof! You'll have 38 links, about a quarter of which are seem to be on precisely this topic.

    Second, the whole premise of your question is underinformed. Any scientist involved in proteomics who wants her funding renewed knows enough not to rely on a single computational approach to pick spots from 2D protein gels. Or, if they do, they're usually doing some high-throughput method that feeds into tandem mass-spec, or some other validating experimental approach.

    In short, why are you asking slashdot about this? Go to the library!

    --
    Let's try not to let fact interfere with our speculation here, OK?
  8. adaptive histogram equalization by FleaPlus · · Score: 1

    You want non-linear contrast stretching, so that the values of greatest interest are differentiated the best.

    On that note, I'd like to mention that Matlab's adapthiseq (contrast-limited adaptive histogram equalization) totally rocks. It's rather nice for pre-processing before applying more sophisticated techniques.

  9. Machine vision by Bastian · · Score: 2, Interesting

    The interesting thing about computer image analysis is that it is roughly broken into two camps - machine vision and computer vision. In general, machine vision people aren't as likely to consider what they do to be A.I. - rather, it's more of a devilishly difficult (at times) form of pattern classification, and the methods are often based more on rote statistical methods than anything else. Also, industrial applications generally don't incorporate any sort of learning other than possibly the original evolution of the firmware back in the design shop.

    The A.I. stuff shows up in robotics competitions and journals, but there are two things working against computer vision in industrial applications. The first is that using true A.I. is expensive stuff, and if you're just looking for something that can sort widgets or detect tumors in CAT scans, you're going to look for the cheaper option if you're choosing between a glorified math equation and a complex 'thinking machine', especially when nowadays there isn't much difference between their performance except when dealing with truly novel input. The second is that the most intelligent machine we know of (us) has a habit of being rather unpredictable, and this quality is generally considered to be a Bad Thing when you're looking to buy a machine.

    I guess that this distinction is heavily dependent on your definition of A.I., but I think most industrial vision applications as being similar to Deep Blue - they are really just horribly complicated equation solvers that get some help from a few heuristics and a database of examples. But the fact of the matter is that Deep Blue is vastly more successful than any chess programs that try to actually think. I also think that's a perfectly reasonable situation - AI is at its heart a field that is groping in the dark, because we don't really know what intelligence is just yet, and when you're trying to solve a problem it's much wiser to take an approach where you actually know what you're doing.

  10. a couple things by blackcoot · · Score: 1

    first off, there's intel's open computer vision library (check http://sourceforge.net/projects/opencvlibrary/) . you'll find a large chunk of the building blocks you'd want to build your own algorithms there (edge detection, line extraction, math primitives, etc.).

    secondly: yes, there are ways to improve the performance of these algorithms. in general, the higher the resolution, the better (assuming you have the time to do the processing). if i recall correctly, most medical images use 16 bit gray scale at really high resolutions. in general, higher resolution at greater bit depths will help quite a bit -- it's all about signal to noise ratios.

    hope this helps

    1. Re:a couple things by cecille · · Score: 1

      I'm not sure how keen you are on building your own software for this type of thing, but this toolkit is fantastic. To be honest, I have NO knowledge at all of your field, but I have done some work with these packages. Not only are the algorithms robust, but they're actually fairly fast too.

      We used some of the packages in opencv to build a 3D reconstruction program for some joint movement studies (I know...totally different area, but the basic algoithms can be applied to a huge number of areas). Anyway, the program used these packages and was able to discern position and rotation of joint markers to
      This is actually really good considering that the camera resolution was not great (1024x968 b/w) and the range was fairly large (~2m). The accuracy was actually sub-pixel, which just goes to show how really innovative and well written some of this open source vision software is. And there are tonnes of other packages out there like that. The trick really is finding some way to validate the results later.

      --
      ...no two people are not on fire.
    2. Re:a couple things by blackcoot · · Score: 1

      i have mixed feelings on opencv -- it feels really halfassed a lot of the time (e.g: edge detection only works in grayscale). i'm also really peeved by the documentation (or, in most cases, lack thereof).

      that said, when it does what you want, it works great. unfortunately, they insisted on using C rather than C++, thus placing really arbitrary (and, imho, dumb) limits on the kinds of images you can process. there's no reason for 99% of the algorithms to care about what kinds of pixel types they're working on. logically, some operations only make sense on certain types of images, but to spend my life converting between formats is kinda dumb -- this is one case where using templates would have made *so* much more sense, oh well.

  11. A good resource by The_reformant · · Score: 2, Informative

    Bob Fisher is a advanced vision researcher at Edinburgh Uni, he maintains this rather excellent page on computer vision http://homepages.inf.ed.ac.uk/rbf/CVonline/

    --
    I have discovered a truly remarkable sig which this post is too small to contain.
  12. Image analysis used for decoding CAPTCHA's by Dr+Cool · · Score: 2, Interesting
    I've been reading about various projects that are using software-based image analysis to decode CAPTCHA's. What's a CAPTCHA? It's a "completely automated public test to tell computers and humans apart". In other words, it's one of those incredibly annoying "warped text images" where you have to type the text that is warped and strangely colored. The idea behind these is that a script bot can't decode the image and type in the correct letters, but a person can. Thus, websites can keep out scripts but allow humans. This is used for such things as creating Hotmail accounts, for example.

    Several projects have had excellent luck using image processing algorithms to recognize the warped and mangled text areas, separate out the letters, then figure out what alphanumeric character it is. What's interesting about this research is that it's starting a sort of Cold War between websites that use CAPTCHA's and spammers who are doing their own research to break them with script bots. As the CAPTCHA's get more complex, and the text more convoluted, the script bots are using ever more complex image processing algorithms. This escalating war could be beneficial for anyone interested in using image processing algorithms, especially when the information you're looking for exists in a graphically "noisy" environment.

    The best place to learn more about this is PWNtcha - captcha decoder.

    Another guy spent about 24 hours creating his own image processing algorithm that uses a graphics function to discover the outlines of character shapes, then runs each character through a neural net which could recognize the shape of the character (and these characters are seriously warped, so it's not your typical OCR). Also, be sure to see his other page which talks about how his software can crack 92% of GIMPY-generated CAPTCHA's!

    Also check out http://www.captcha.net/.

    Google for more information about using AI image processing routines to defeat CAPTCHA's. This is an area of active research that will result in a lot of new algorithms and processes in the coming years!

  13. I couldn't read some of SlashDot's CAPTCHAs by Anonymous Coward · · Score: 0
    The "war" you describe between the CAPTCHA creators and crackers is so intense that now CAPTCHA generators are producing illegible CAPCHAs. I mean, I tried 3 pairs of glasses to decode my last SlashDot CAPTCHA and only got lucky on the third try.

    Oh, shit! Here's another one! I thought these had been retired!

  14. FYI: OSS Scientific Image Analysis Software by Anonymous Coward · · Score: 0

    one that I use, in the lab, is called ImageJ. It is released by the NIH, and can be found at:

    http://rsb.info.nih.gov/ij/

    despite it's being java based, it runs fairly quickly (once the VJM has loaded, &c.), is decent, as packaged, and is easy to extend via user written plugins and macros.

    the price beats the heck out of most of the commercial solutions, out there, and it seems to have fewer bugs (it's been a couple of years since i bothered with the $10k+ commercial software, though, so they might have improved their quality...)

    btw: i like the bot-blocking verification, but this one is hard on my eyes...