IBM's "Deep Computing"
ZDNet is reporting that IBM is forming
a "$29M research institute". The assigned scientists will focus on using supercomputers to model
real-life scenarios. Apparently, they plan to release some software for visual
data representation in the near future under an open source license. Update: 05/25 09:40 by J : IBM is forming an alliance
with Pacific HiTech where PHT will support IBM's DB2 under Linux. It also seems DB2
will ship with a special version of PHT in the future.
If they're pouring US$29M into an open source project, isn't that demonstrating that good software *does* need big $$$?
is that post at the bottom of the page.
Wouldn't be the one mentioned yesterday on slashdot, would it? :-). :P. :)
I think it's good that IBM are still willing to invest money in such projects after they've released the source, rather than just throwing out old products and saying "here's a worm for you, little fishy"
Of course, this particular piece of software is very drool-worthy, imho
Commodore 64, Loading up the dance floor!
Increasingly over the past few years, scientists have turned to computationally intensive solutions to all kinds of problems. As a student who has changed institutions in the past few years I have worked on two markedly different projects that involve computational power that was unheard of several years ago.
The real news in this announcement is that IBM is opening up supercomputers to researchers. Processing power of this magnitude was previously very, very difficult to come by. (either you worked at a national lab, or you got a grant from NCSA) Otherwise you chugged along on whatever hardware your funding would buy, perhaps a four processor SGI or so. And you could be guaranteed that you would wait on the order of a week for any meaningful results.
That said, this software IBM is creating doesn't seem to be terribly necessary to me. As I said earlier, people have been using computationally intensive methods to solve problems for over five years now. Only in fairly rare cases can a scientist not find code written by collegues that will preform the necessary computations with little modification.
In fact the whole idea seems a little bit misguided. In the article they talk about using supercomputers to model precise weather patterns. Lorentz showed a while back that precise weather patterns are chaotic, thus rendering most modeling quite useless! Unless this data explorer tool is a big improvement over already available packages it will fall by the wayside. The supercomputing time, however is one of the best gifts a computer company has given to science to date.
I honestly believe that most large companies such as IBM are just using open-source as a method to gain publicity and free development out of the open-source community.
The only companies truly doing open-source development in my opinion would be SGI and Redhat. The are giving things to the community and not asking for control over the source like Netscape has.
IBM is not 'pouring' $29 million into an open-source project. The $29 million is going into this research institute in general; the fact that one of the tools which this institute plans to use is being open-sourced does not indicate any sort of silly subsidizing of the software project; remember, it's already completed and in use in mission-critical situations, which were mentioned in the article. Although the software does look interesting, what IBM is really doing here is supporting free (or open-source, if you do insist) software in the ways it was always intended to be used; that is, this IBM-supported organization is a -research- institute, and the tools and findings of their research will be freely available to anyone. Of course, most of us would be hard-pressed to recreate their research, given our general lack of access to hardware as powerful as theirs. The point is that this is neither a commercial (esr, 'open-source') nor moralist (rms, 'free software') venture; the software is open because it is the most appropriate avenue of distribution for the software, given its intended use.
Perhaps so. But isn't this a perfectly acceptable way to get good press? Even if the license is not perfect, lots of people could potentially benefit from this.
Options:
Option 2 is still a good thing. Even if they retain some measure of control, it's still a good gesture - they didn't have to do this. If someone tweaks their code to make their project fly better, and then returns to patch to IBM, then others will still benefit from the patch.
I would like to hereby encourage other software companies to seek publicity by giving me cool software to use.
There are some places really good weather prediction on the scale of only a few hours would be absolutely wonderful:
Hurricane tracking, ie being able to take an enormous amount of ground and satellite data and doing the 'deep processing' IBM is talking about. Even if one can only see for an hour ahead, that could save millions in loss and lives by having that extra hour to prepare.
Tornado tracking, being able to predict the path of a tornado, or even predicting its formation. If a deep procesing unit were permanently spent in each state in the tornado belt, and fed a constant stream of satellite and ground data, even being able to predict a tornado by 5 minutes would be of great benefit, and being able to predict it's path by even a couple hundred meters would be a god send.
Weather for satellite and shuttle launch. If we can see the weather for the next week with reasonable accuracy with today's methods, deep computing may allow us a reasonable view for up to a month or so. Precise weather is not necessary, just as long as we can avoid storms and heavy cloud cover!
-AS
-AS
*Pikachu*
US$29M is more than likely going into hardware as well... We are talking world class super computers, and they have the luxury of giving the source away because there probably aren't more than several hundred machines that could do anything useful with this software, I think...
-AS
-AS
*Pikachu*
anybody can think of another application for a homemade supercomputing cluster? I think not.
Sure, homebrew movie people who want to render CG special effects, or crunch through SETI data in their spare computing cycles, etc.
I'd love to be able to run 3 or 4 'cheap' Alpha machines for a render farm, and do some movie making!
-AS
-AS
*Pikachu*
Same trouble here, this has been going on for a while too (= cpl of weeks). Specifically, the left part of the green seperators won't load here. The browser keeps trying to load stuff from flotsam.slashdot.org port 81 and isn't getting a reply. Strangely enough, it doesn't time out either.
Message on our company Intranet:
"You have a sticker in your private area"
beauty is only a light switch away
Looks like IBM is trying to find practical applications
to keep the supercomputer market going -- sales have been down after
the cold war. One way of achieving this is through free-software.
Here is the software, but do you have the machine?
This could kill off specialist Viz companies like AVS. Unless they refocus around consulting and services. I'll definitely be downloading DX tomorrow.
Processing power of this magnitude was previously very, very difficult to come by
On the one hand, this is quite true. On the other hand, if you need massive processing power and (a very important 'and') your problem can be parallelized in some fashion, than distributed computing (think Beowulf) looks rather attractive compared to the traditional big iron. Granted, some problems do need a massive single machine, but these are, thank gods, tend to be rare (YMMV, of course).
Only in fairly rare cases can a scientist not find code written by collegues that will preform the necessary computations with little modification.
Ahem. Doesn't that entirely depend on the questions you are asking? The code to do standard things is widespread, but as soon as you get somewhat close to the bleeding edge (and that's where you *want* to be, right?) the tools that you can find lying around tend to develop severe problems. And in many cases it's easier to write code from scratch than to modify, debug, and finally rewrite anyway some grad student's project.
"You can tell you've been pushing a new frontier when all available tools are inappropriate" -- Steven K. Roberts, http://microship.com
In fact the whole idea seems a little bit misguided. In the article they talk about using supercomputers to model precise weather patterns. Lorentz showed a while back that precise weather patterns are chaotic, thus rendering most modeling quite useless!
"A little bit misguided" seems to apply to your statement more. It seems that you mean that weather systems are highly sensitive to initial conditions (the butterfly effect), thus you cannot ever model them. Sigh. First, chaotic systems can be modelled and are modelled. Second, you may not care about *precise* weather patterns as long as you get a usable forcast. Third, the time scale is crucial here. Forecasting a month ahead is hard, but forecasting a couple of hours ahead is not that hard. How about a day? two days? three? Fourth, just because right now we see weather as chaotic doesn't mean that at some point we will not develop a better theory which *will* be able to explain and predict weather better than we do now.
Kaa
Kaa
Kaa's Law: In any sufficiently large group of people most are idiots.
Here is my take on what is going on, having worked at IBM (doing contract work, not as an IBMer), having worked with larger-scale IBM hardware, and having talked to a fair number of IBMers.
IBM is supporting the needs of their customers.
That's it. If you are in a position to pay for their services (and IBM support is not cheap), you are important. If you can help make IBM's image more palatable to the people who buy IBM support (this includes research centers and universities), you are somewhat important. (their viewpoint)
Consider that IBM recently built a Beowulf in a day specifically to demonstrate how powerful their Intel-based hardware was. Consider that IBM is happy to support Java and Emacs on S/390s to make their paying customers keep buying services. Consider that the vast majority of IBM's new revenue is coming from custom e-commerce solutions as hardware, software, and support packages.
IBM stands for big iron and reliable, powerful solutions. Not raging I/O (though the mainframes are good at that), not integrated-everything (though OS/2 acts kind of like that), but relentless, dependable solutions that will be supported ten years down the road. There are a great many systems in service at IBM that are at least that old.
Anything that makes IBM more attractive to their core customers will be supported. DX is not the crown jewel of IBM Research -- they provide algorithm design services to monster companies like Monsanto, and IBM has been the only company AFAIK to turn a steady profit selling supercomputers. (The SP/2 is basically a Beowulf with a much higher-performance switch.)
There is no hidden agenda here -- IBM wants you to give them your money and be happy to keep doing so, especially if you are a large business or government institution. If supporting Linux and opening up the source to esoteric supercomputing tools or next-generation compilers makes more customers choose IBM, that's what gets done.
Opening the source will make DX a better product and increase demand for hardware, which IBM conveniently provides, and makes IBM look like the anti-Microsoft in some peoples' eyes. So they do it. A Beowulf built out of, say, Netfinity boxes is easier to maintain because of the hardware-diagnosing features (LightPath for example), so they exhibit the power of a Beowulf.
One day someone from IBM was wondering why the Java-Apache project "Cocoon" wasn't using the IBM XML4J parser anymore. Stefano (Mazzochi, the guy who started the whole Java-Apache thing, and an Apache core developer) replied that it was because open-source tools do not grow momentum without a feeling of participation (eg. "I built this from SCRATCH"). For simple or generally useful tools I think he's right, but for stuff like DX -- which as an internal IBM Research project was put together over the course of 18 months by some of the smarter people at IBM, without interference from marketing, and with the direct support of a company VP -- I can't see how starting over would help. If the whole codebase is released, the potential for so many cool tie-ins and hacks that it's unreal. (if not... well, I think a lot of people will be quite disappointed) The word that comes to mind when describing DX's data model is "profound" -- it is built on the notion of expression data filtering in terms of the mathematics of fiber bundles, independent of the content flowing through the fibers.
There is also a distributed version of DX, which (who knows?) might be Just Right for making Beowulf useful to more than just scientists. Visualization tools (like DX, or AVS) can offer insights into huge datasets that simple reports cannot -- clustering of sales data, or hit rates for candidate drugs as a function of ethnicity.
Anyways, this isn't some version of corporate insanity. IBM wants to sell you stuff and will do whatever it takes to make that happen. If a ton of positive publicity is generated along the way, so much the better!
Remember that what's inside of you doesn't matter because nobody can see it.
In the interest of full disclosure, I'm employed by SGI, and have been for eight years. There's a rumor that when you scratch me, I bleed purple, but that is yet to be confirmed.
I see this initiative as aimed at SGI/Cray. The article mentioned that IBM has over 100 of the top 400 supercomputing sites. SGI with both Cray products and Origin 2000's has probably 250 of the rest, not to mention the installations that we own that don't count, since, by the rules of the list, we can't have more than ~3 entries on the list.
So, IBM has a small piece of a shrinking pie, and they are losing to a company that has a pretty compelling scientific visualization story. So what do they do? They give away some computers that they're having a hard time selling. (Don't get me wrong, they're still useful. I'd take one if it were free.) This, they hope, will increase their mindshare and presence. Then they give away some visualization software to try to undermine the SGI visualization market power.
"I see great things in baseball" - Walt Whitman
Both SGI and IBM are smart in their own ways. The limitation nowadays is not hardware (constrained by physics and economics) or technology (can always be bought out) but human talent. Given the increasing software complexity, it can take years for people to learn and master sophisticated software packages and their APIs. Moore's Law dictates that whatever you buy now can be bought cheaper next year. Hence anything developed at the high end will eventually filter down to the PC level. Unfortunately human learning curves moves slower than hardware evolution cycles. The power of visualisation packages such as AVS is in the module libraries that the community develops and contributes as once the critical mass is there, it is very hard to change habits or programming practices. In a way, releasing OpenSource is like force-growing a new ecosystem niche with the hope that enough talent will feed on the transplanted energy and expand the overall food chain (always a problem in small specialised fields) in the hopes that the colony becomes self-sustaining (creating future demand for similar food). The path to growth for both IBM and SGI is to move their "proprietary" IP into mainstream as quickly as possible to extract maximum benefit from their R&D dollars. For this, they need to win over developers and new power users of which academia is a unique training ground.
I can see the day when companies attempt to buyout emerging graduates as without a constant source of talent, any company will eventually wither under the winds of global competition.
LL
Somebody doesn't understand the difference between a research institute and a `software project', and it isn't just the poster. These are doubtless the same people who state wildly that people only get an MS in CS because they can't find a job.
The license is now available for your perusal at www.research.ibm.com/dx
Not that I am trying to put down TurboLinux (I've never used it, personally), but does anyone have any insight as to why IBM chose to align themselves w/ PHT, which, afaik, is primarily marketed in the Asian markets, vs the other distro's. Are they aiming at a specific market (i.e. Asian), or is there something else. Didn't IBM earlier invest some dough in RedHat? Why did they not stick w/ RedHat? Before the flamers get started, I am _not_ touting RedHat as the end-all be-all of Linux; I use Slackware 4.0b3 at home, RH 6.0 on the laptop, and probably Debian or Turbolinux on the upcoming beater 486. I am just curious as to the selection of PHT.
Monte Milanuk