Slashdot Mirror


IBM Open Sources UIMA

psykocrime writes "Line56 is reporting that IBM has open-sourced core components of their UIMA project. UIMA is (or was) an IBM Research Project for managing "unstructured information" such as free-text. IBM describes UIMA as: 'an open, industrial-strength, scalable and extensible platform for creating, integrating and deploying unstructured information management solutions from combinations of semantic analysis and search components. [...] and makes the core Java framework available as open source software to provide a common foundation for industry and academia to collaborate and accelerate the world-wide development of technologies critical for discovering the vital knowledge present in the fastest growing sources of information today.' The newly opened source is available at SourceForge."

23 comments

  1. Translation... by IAAP · · Score: 2, Interesting
    an open, industrial-strength, scalable and extensible platform for creating, integrating and deploying unstructured information management solutions from combinations of semantic analysis and search components.

    FTFA: UIMA is about text analytics, which encompasses functions like search and pattern analysis. "It's able to create sophisticated solutions that extract insight from a large amount of unstructured data stored in computers," says Dr. Nelson Mattos, IBM VP of Information and Interaction. "The goal is to create an open standard for analytics."

    In other words, it's now an opensource way of picking out themes in text. So, I guess you can (or the company you work for) have a version of Carnivore

    1. Re:Translation... by Eightyford · · Score: 2, Interesting

      In other words, it's now an opensource way of picking out themes in text. So, I guess you can (or the company you work for) have a version of Carnivore

      And so can China. Woohoo...

    2. Re:Translation... by ZachPruckowski · · Score: 2, Interesting

      "semantic analysis" and "themes in text"? I see a great context based spell checker.

  2. I admit it. by jd · · Score: 3, Insightful

    This is one of the very few times I'm actually more confused by TFA than I am by the Slashdot summary. So it's a text object with a semantic grep function? A collection of objects that can be used in a semantic wiki? A variant of Xanadu? A Java version of "Ask Jeeves"?

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:I admit it. by blamanj · · Score: 2, Informative

      It's a framework for running custom analyses. For example, you could write a piece of code that looks for items like "VP" or "Director" or "CEO" next to peoples' names and then tag those as "Corporate executives". Running several different sets of analyses on your text increases the semantic information. Ultimately, you can then search not just for text (a la Google) but also on these advanced semantics, e.g., find all Corporate executives that occur near "convicted".

    2. Re:I admit it. by jdray · · Score: 1

      I suppose you could also use it to create a system log miner to figure out what happened and what it's related to. Possibly, enough intelligence could be put into it so that the thing only paged you when something was ACTUALLY wrong rather than whenever something SEEMS to be wrong.

      --
      The Spoon
      Updated 6/28/2011
    3. Re:I admit it. by hlee · · Score: 1

      Their 340 page user manual (just under 4MB) is actually quite readable. Chapter 2 gives you a conceptual summary, and the rest of it is focused on actually using it.

      What is it in a nutshell? Unstructured data -> UIMA -> Structured Data. It's a means of converting unstructured, or more likely semi-structured data into what appears to be relational tables (with indices).

      UIMA is really a collection of analysis engines, which you can write, and tends to specialize in some kind of knowledge extraction, such as for example identifying people and their phone numbers. Another analysis engine could look for persons and where they live. What makes UIMA special is that it has unified the meaning of its analysis output, so all the results from different engines can be aggregated - now we know where this person lives AND their phone number.

  3. Translation for common people: by Spy+der+Mann · · Score: 1

    IBM is open sourcing all its abandoned garage projects.

    Is it good? Is it bad? Who knows.

    1. Re:Translation for common people: by TheRaven64 · · Score: 4, Insightful

      If they're abandoned, then it can only be good. Either someone will pick them up and do something useful with them, or they will be ignored. If they remain closed, then only the latter of these is an option.

      --
      I am TheRaven on Soylent News
    2. Re:Translation for common people: by taskforce · · Score: 1

      I think it's a good thing. One of the great strengths of Opensource is doing cool things which products which aren't really initially commercially viable, becuase it opens up the code to an audience where there is a lack of profit motive.

      --
      My 3D Texturing Skinning work (under construction)
    3. Re:Translation for common people: by ostiguy · · Score: 1

      IBM sells servers, DB2, and WebSphere (among other products). If more apps use this framework, it would probably help them sell more of all three.

    4. Re:Translation for common people: by browncs · · Score: 2, Interesting
      IBM is open sourcing all its abandoned garage projects.

      If you stop and think about it for a moment, this would be stupid for IBM to do, and it's obviously not what they are doing.

      Paradoxically, it costs money to open source something (at least for a big company like IBM). Getting the package ready for the light of day, getting all the approvals and sign-offs, and staffing the resource to keep working on it, all take money.

      IBM wouldn't do that for something it doesn't consider valuable in some way to its business.

      This is actually a perfect example of the kind of thing IBM likes to open source. It's a framework that (hopefully) will actually make unstructured analysis components work together in an enterprise (at least as developers start adopting it). In order to get to "critical mass", open sourcing the framework lets developers know that they can count on it (a) being there and (b) being free. It'll (again hopefully) make the market space larger for all players, allowing IBM to compete on its merits, rather than having to try to displace entrenched proprietary solutions.

  4. When will they... by Eightyford · · Score: 0, Offtopic

    When will IBM opensource UFIA?

    1. Re:When will they... by twilightzero · · Score: 1

      Hmm...for a moment I thought you were wondering when IBM was going to OUTsource UFIA...

      Interesting thought, eh? ;)

      --

      "Christ what a design! I could eat a handful of iron filings and PUKE a better emergency pump than that!"
  5. Sign me up, man! by IAAP · · Score: 3, Insightful
    Because, I can't make head nor tail of all this marketing speak. That's what I got out of reading TFA.

    If you have to rely on buzz words and jargon when communicating to the popular press, then you are saying nothing, or you don't understand what the fuck you're talking about.

    1. Re:Sign me up, man! by drinkypoo · · Score: 2, Interesting

      That, or maybe you're just trying to get the word out to developers. Since the mainstream press will basically just print anyone's press releases as articles without revision, you have a pretty good chance of getting your word out there verbatim these days. It doesn't matter if they can understand it if they don't even try to understand it.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:Sign me up, man! by stonecypher · · Score: 1

      If you have to rely on buzz words and jargon when communicating to the popular press, then you are saying nothing, or you don't understand what the fuck you're talking about.

      Did it occur to you that it might be you, rather than the IBM researcher, who doesn't know what the IBM researcher is talking about? Pardon my being blunt, but what he was saying wasn't buzz word driven at all, and when you deal with topics this far removed from natural language, it's either communication in jargon or no communication at all.

      High horses aren't the status symbol they once were.

      --
      StoneCypher is Full of BS
  6. From the IBM UIMA SDK by elviscious · · Score: 2, Interesting
    From the IBM UIMA SDK Page (http://www.alphaworks.ibm.com/tech/uima/)

    Unstructured information management (UIM) applications are software systems that analyze unstructured information (text, audio, video, images, etc.) to discover, organize, and deliver relevant knowledge to the user. In analyzing unstructured information, UIM applications make use of a variety of analysis technologies, including statistical and rule-based Natural Language Processing (NLP), Information Retrieval (IR), machine learning, and ontologies. IBM's UIMA is an architectural and software framework that supports creation, discovery, composition, and deployment of a broad range of analysis capabilities and the linking of them to structured information services, such as databases or search engines. The UIMA framework provides a run-time environment in which developers can plug in and run their UIMA component implementations, along with other independently-developed components, and with which they can build and deploy UIM applications. The framework is not specific to any IDE or platform.

    Whatever the hell that means...
  7. What is the UIMA framework? by Coruscater · · Score: 5, Informative

    I am a UIMA user within IBM (but not an official spokesperson), so I can take an unofficial stab at answering that question. The UIMA framework (now on Sourceforge) is primarily a Software Developers Kit. It is not really intended to "do" anything out of the box; it does come with a handful of illustrative example applications but does not really provide direct end-user functionality (any more than a Java SDK does). Instead it provides a variety of interfaces and framework code to enable programmers to build applications that analyze unstructured information such as text (or images, audio, video, etc.).

    Some key capabilities provided by the UIMA framework include:

    1) A data structure, known as the Common Analysis Structure (CAS), that holds the results of analysis. It is essentially a big bag structure that has a lot convenient methods for using that bag to hold a "subject of analysis" (e.g., a text string) and information about that subject of analysis (e.g., annotations over character spans in a text string). Also included are methods for serializing and deserializing the CAS into an XML format.

    2) A set of interfaces for components that manipulate CAS's. For example, an "Annotator" takes a CAS that contains an existing subject of analysis and adds more information about that subject of analysis into that CAS. Developers are expected to implement those interfaces to build analysis capabilities.

    3) An XML format for providing metadata about those components and aggregates of those components.

    4) A "collection processing manager" that takes a metadata specification of an aggregate component and executes those components, either locally or remotely, via SOAP or some other protocol.

    So why use UIMA? If you are building some specialized analysis component, the UIMA framework provides common structures and interfaces for building and deploying these components. If you are building an end user application, basing the application on the UIMA framework allows you to draw on whatever "best of breed" UIMA components are available for providing the various elements of functionality that you want.

    The range of applications that can potentially benefit from a combination of analysis components is virtually limitless. The parent post mentions a few (e.g., semantic grep, natural-language question answering). There are many others such as machine translation, summarization, etc. Many of these applications can benefit from common primitive subcomponents such as being able to identify the subject and verb of an English sentence. A common framework like UIMA can help developers to build, share, and reuse these components.

    1. Re:What is the UIMA framework? by Anonymous Coward · · Score: 0

      I'm sorry, but that sounds just like the dozen+ other examples which exist around the net of what happens when you let NLP scientists work on "basic research" without an applied concrete problem to solve. Is there any evidence that the cost of adopting this framework is anywhere near the payoff?

  8. Translating the translation... by argent · · Score: 1

    So, I guess you can (or the company you work for) have a version of Carnivore

    Or Google.

    1. Re:Translating the translation... by mrchaotica · · Score: 1

      Is there really a difference? ; )

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

  9. This should have been in the summary by psykocrime · · Score: 1

    This should have been in the summary, but I didn't put it in, because I did not realize this had been discussed a few months ago:

      http://slashdot.org/article.pl?sid=05/08/08/233723 6&from=rss.

    Not exactly a dupe, since at the time of that story, the code wasn't yet available as open source. So consider this a follow-up to the above story.

    --
    // TODO: Insert Cool Sig