IBM Open Sources UIMA
psykocrime writes "Line56 is reporting that IBM has open-sourced core components of their UIMA project. UIMA is (or was) an IBM Research Project
for managing "unstructured information" such as free-text. IBM describes UIMA as: 'an open, industrial-strength, scalable and extensible platform for creating, integrating and deploying unstructured information management solutions from combinations of semantic analysis and search components. [...] and makes the core Java framework available as open source software to provide a common foundation for industry and academia to collaborate and accelerate the world-wide development of technologies critical for discovering the vital knowledge present in the fastest growing sources of information today.' The newly opened source is available at SourceForge."
FTFA: UIMA is about text analytics, which encompasses functions like search and pattern analysis. "It's able to create sophisticated solutions that extract insight from a large amount of unstructured data stored in computers," says Dr. Nelson Mattos, IBM VP of Information and Interaction. "The goal is to create an open standard for analytics."
In other words, it's now an opensource way of picking out themes in text. So, I guess you can (or the company you work for) have a version of Carnivore
This is one of the very few times I'm actually more confused by TFA than I am by the Slashdot summary. So it's a text object with a semantic grep function? A collection of objects that can be used in a semantic wiki? A variant of Xanadu? A Java version of "Ask Jeeves"?
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
IBM is open sourcing all its abandoned garage projects.
Is it good? Is it bad? Who knows.
When will IBM opensource UFIA?
Religion for nerds. Stuff that really matters
If you have to rely on buzz words and jargon when communicating to the popular press, then you are saying nothing, or you don't understand what the fuck you're talking about.
Whatever the hell that means...
I am a UIMA user within IBM (but not an official spokesperson), so I can take an unofficial stab at answering that question. The UIMA framework (now on Sourceforge) is primarily a Software Developers Kit. It is not really intended to "do" anything out of the box; it does come with a handful of illustrative example applications but does not really provide direct end-user functionality (any more than a Java SDK does). Instead it provides a variety of interfaces and framework code to enable programmers to build applications that analyze unstructured information such as text (or images, audio, video, etc.).
Some key capabilities provided by the UIMA framework include:
1) A data structure, known as the Common Analysis Structure (CAS), that holds the results of analysis. It is essentially a big bag structure that has a lot convenient methods for using that bag to hold a "subject of analysis" (e.g., a text string) and information about that subject of analysis (e.g., annotations over character spans in a text string). Also included are methods for serializing and deserializing the CAS into an XML format.
2) A set of interfaces for components that manipulate CAS's. For example, an "Annotator" takes a CAS that contains an existing subject of analysis and adds more information about that subject of analysis into that CAS. Developers are expected to implement those interfaces to build analysis capabilities.
3) An XML format for providing metadata about those components and aggregates of those components.
4) A "collection processing manager" that takes a metadata specification of an aggregate component and executes those components, either locally or remotely, via SOAP or some other protocol.
So why use UIMA? If you are building some specialized analysis component, the UIMA framework provides common structures and interfaces for building and deploying these components. If you are building an end user application, basing the application on the UIMA framework allows you to draw on whatever "best of breed" UIMA components are available for providing the various elements of functionality that you want.
The range of applications that can potentially benefit from a combination of analysis components is virtually limitless. The parent post mentions a few (e.g., semantic grep, natural-language question answering). There are many others such as machine translation, summarization, etc. Many of these applications can benefit from common primitive subcomponents such as being able to identify the subject and verb of an English sentence. A common framework like UIMA can help developers to build, share, and reuse these components.
So, I guess you can (or the company you work for) have a version of Carnivore
Or Google.
This should have been in the summary, but I didn't put it in, because I did not realize this had been discussed a few months ago:
3 6&from=rss.
http://slashdot.org/article.pl?sid=05/08/08/23372
Not exactly a dupe, since at the time of that story, the code wasn't yet available as open source. So consider this a follow-up to the above story.
// TODO: Insert Cool Sig