Automated Migration From Cobol To Java On Linux
Didier DURAND writes "Just published an article about our 100% automated migration from IBM mainframe with Cobol to Linux Java: we could convert of our own application (4 million lines of code) through the tools that we developed. Those tools are open-sourced under GPL for other companies to benefit from them. We save 3 millions euros / year after this migration!"
Sounds like it could turn out like WYSIWYG HTML Editor code. Where every word you bold has the bold tags, etc.
Dreamweaver, Word, etc all make some dang ugly HTML.
As though a few hundred cobol coders cried out in terror, and were suddenly obsolete...
BSG humor is mandatory whenever Cobol comes up...
------ The best brain training is now totally free : )
I'll say what they don't buy you: The ability to throw away the old language.
If changes need to be made - and they will - you will want to change the original language not some intermediate that is stilted and hard to read at best and a candidate for an obscufated insert-language-here contest at worst.
What transcoders do buy you:
The ability to compile code on a platform that doesn't have a compiler for your flavor of your language.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
If you are right there may be a cascade of converters to run and lucrative converters market full of consultants. If things go really ugly the more conversions are done more skills will be needed to find out why converted code malfunctions. These skills could possibly even include cobol just in case one has to look at original to find out ow did it work when it did:)
All it appears to be doing is mapping COBOL line of code to Java Line of code, per Slide 25.
This is more about being able to find someone who can read and write java. The code remains procedural, the COBOL programmers do the same stuff, just in Java now.
Here's an example of the code that was spit out:
sql("SELECT * FROM RS0403 WHERE CDPROJ=#1").into(vrs0403a.dvrs0403a).param(1, tuazone.tua_I_Cdproj)
Not to dissuade, but in someways, they avoided doing a rewrite at all cost.
Great if you want to get off legacy systems, but it's not going to magically improve your code base. GIGO rules still apply.
import system.cool.Sig;
Though I could think of a ton of jokes - and have already seen a few - my first question is, "why."
.NET 2005 and C#, but we could have just as easily used Java. I'm not trying to say anything about the choice of language or underlying platform.)
:)
I can see the possible benefits of no longer relying on aging cobol programmers. I am often dealing with just this issue as I migrate '70's and '80's era systems off off ADABAS and COBOL. However, why would one want to make a one for one class creation of existing mainframe applications. I honestly remember a few programmers I knew doing this right before they retired back in '05. They took a COBOL/IMS application running on an AS390 and turned it into a HTML/ASP.NET application written in C# with IMS and SQL Server on a z890 in virtual MVS and SLES environments. The screens - web based now - were one for one matching with the previous mainframe screens.
My question then too, was why bother?
I just finished a second project in taking a '80's era mainframe application - this one to track the purchase of vital (birth death marriage) records - from mainframe into an n-tier model. Instead of simply copying the mainframe screens we spent time deciding what worked on the mainframe and what didn't. Some of the mainframe concepts - particularly in the public lookup - were fine. They stayed and became web-based applications. Other items were thrown out the window and completely re-worked into a user-friendly and efficient system. (In this case, we used MS
Having done a similar project for real property records in '07, we learned many lessons and were able to reuse assemblies in the new application. In fact, the entire UI, security, printing, data encapsulation, image import (there are over 160M TIFF files in our system), reporting and cashiering/finance/cash handling subsystems are identical and shared among both applications.
I can see possibly wanting to utilize some classes for back end work but wouldn't it be better to review these individually and decide what is best?
Oh, and we're saving roughly $3M/year in mainframe costs.
(Okay, post finished now to wait for someone to mod me as a troll...)
The Kai's Semi-Updated Website Thingy
I agree with your objections and have seen these problems so many times over the last decade that it is getting hard to believe that someone can't write a decent translator.
Java is usually very easy to refactor (smart editors).
It seems like a two or even three stage pass would work.
Stage one, COBOL to raw java.
Stage two, raw java to better formatted java.
Stage three, better formatted java to even better formatted java.
I wrote an RPG3 to RPG4 converter back in 2000.
It used 5 passes-- each a small simple program.
The result was actually fairly close to procedural java-- if we had decided to go with java, I could have written a 6th program to do that conversion.
She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
Make that ten, although I've only coded COBOL at home for fun. Yeah, I'm probably some kind of masochist but I really wanted to give the language a spin and see if it was as horrible as people say it is.
/Mikael
Greylisting is to SMTP as NAT is to IPv4
I hacked into their source code, and found something a little odd:
Table-ized A.I.
My question is, "Do you also convert the CICS calls embedded in the code (and possibly 3270 terminal commands?!?) or is there a Java library to interface with CICS?" My experience with converters is that they follow the 90-10 rule, where they do great with 90% of the code, but that's the easiest, and could almost be done with global Find/Replace. The remaining 10% is why the conversion wasn't already done.
Later . . . Jim
I know there is a certification program to check that a commercial COBOL compiler processes the whole language and produces output code that performs correctly. (Can't recall the name at the moment though I think it was in the US government - perhaps in the DoD.) I'm wondering if this tool has been submitted to that and, if so, whether it passed.
I'd occasionally thought it would be a useful thing to do something similar to this (but with ANSI V2 C++, rather than JAVA, as the target language) - and then get the tool certified. With such a certified tool IT administrators could, with confidence, transcode a COBOL application base into a language with multiple commercial and open compilers a long expected support lifetime, generating native code for virtually all possible targets (from PC clusters to current and future mainframes). If the transcoded output doesn't become excessively opaque and class-dependent it could later be warped into a more native form, should that be desirable.
Perhaps this project will be able to actually do it.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
CobolEngine cobol = new CobolEngine();
cobol.AddLine("ADD A TO B GIVING RESULT");
cobol.AddLine("PRINT RESULT.");
cobol.Compile();
cobol.Execute();
Do you even lift?
These aren't the 'roids you're looking for.
The JIT code is actually pretty fast. (especially when Hotspot is running in -server mode, which does some impressive optimization)
It just consumes way too much memory, and starts damn slow. (though still faster than C#, in my experience)
A couple times now I've taken base classes and rewritten them in Java to speed them up. ArrayList? Much too slow for my needs, even with a wrapper class ensurance it allocates in large enough chunks. By rewriting it in Java using Object arrays, I saw 60-100% speedups in the get/add methods, translating into roughly a ~2-4% speedup. (mileage may vary depending on workload)
It was about 20000% faster than a default ArrayList - mind you, by default those things allocate in 6-index chunks, so every 6 objects you add it copies the whole ArrayList to a new one, with 6 more spots available. @_@
There's a reason Java got the sluggish reputation, but it's not because the JIT code is slow. It's because the developers can get by with less of an understanding of what goes on behind the scenes, which never turns out good...
I propose a fix:
I don't therefore I'm not.
Ha, you think you're joking, don't you? A friend of mine used to work for the Air Force in one of their less high-tech establishments, a place that did inventory. The software, all locally developed, was written in Fortran. However, changing the source and recompiling was extremely tedious since it meant punching new cards, running them through to compile, then running the result of the first run through to link, then running the linked code, then transferring the output to another machine that could print. So, instead of changing the source, they edited the machine code. (And you wonder why they lost the aliens at Area 51...)
I guess it just goes to show that Paul Graham doesn't know very many great programmers. Either that, or his definition of 'great' is screwed up.
Great programmers use the right tool for the job. Whether that tool is Lisp, Python, Java, or even COBOL doesn't matter. The way I see it, Mr. Graham's constant advocacy of Lisp as the be-all and end-all of programming environments is no better than any other language zealotry. Yes, Lisp has its place. So does Java. A truly great programmer would have the task dictate the language, not the other way around.
We all know what to do, but we don't know how to get re-elected once we have done it
JIT code is slow? Or JITing the code is slow? When I run bench-marks, the average-benchark speed continually gets faster and faster as I increase the number of iterations. This is because the JIT continuously re-assesses basic-blocks to determine if coallesing is worth the expense. This means:
A) code is NOT JIT'd at first, even with -server.. ONLY after a min-number of passes (why JIT initializtion code that'll never run a second time?)
B) book-keeping to determine what to JIT and what not to JIT adds extra overhead for rarely used code
C) Even JIT'd code is continuously evaluated to determine if larger execution-flow-patterns are possible.. So the same code might be JIT'd 50 times, each time being aggregated into larger bundles of contiguous raw assembly
Thus, for short-lived applications, the JIT only provides marginal benefit if not a detriment (which is why the google android's JVM does ZERO JIT, and instead uses an alternate byte-code which is more efficient to cold-start).
But, for servers (web-servers and mainframe code which will batch process all night long), you're talking about relatively small chunks of code (out of the overall loaded library) that are highly reused. It's entirely possible that an entire web-page can be inlined into a single chunk of assembly (though it's a stretch). Not even PHP can do this (at present).
Basically typical code is complex because of optional flow-paths (if-statements, etc).. But in highly repetitive batch/server processing, there are a handful of paths which make up maybe 99% of execution, and thus can be rewritten in a way that in C would be horrendous (though certainly possible).
Doing code-analysis to prove that an array index will never be out of bounds is expensive.. But once done, you can remove a lot implicit assembly-code. So general-purpose 3'rd party code can be almost fully optimized away. You get the best of both worlds.. High-degree of defensive coding + optimized execution. The cost is the ramp-up-time (and an immodest but reasonable memory foot-print).
For single-app-servers, throwing 2 gig of RAM is NOTHING ($100 v.s. $5,000 ???), so I'm frustrated when I hear people bitch about memory consumption.. Memory use is bounded, if not, then you have a coded memory-leak and that's not the JVM's fault (usually due to misconfigured-as-unlimited in-mem caching - usually a disk-spill-over-caches that unknowingly (to the coder) retains headers in-mem).
-Michael
"It just consumes way too much memory, and starts damn slow."
Sigh. I've said this 100 times.. server apps (as relevant to this COBOL discussion) don't have startup time. They are up for months at at time (years even). Please distinguish between client-side apps and server-side apps.. As java-programming only exists in client-side-apps these days as server-interfaces (for fast-to-build, yet bug-free coding of mission-critical-apps) and cell-phone-apps (for hardware-portability). For this class of client-side apps, yes you can bitch about memory usage and startup time. Though hopefully memory usage shouldn't exceed 150Meg (a typical firefox executable). Server-apps, on the other hand have completely different requirements.
A 'server' that runs java costs between $900 to $10,000. What is the cost of 2Gig of memory? $100? $200? Give me a break. Further, I usually recommend only using the 32bit JVM and thus <= 2Gig of JVM memory (typically caping out at 1Gig). Thus the worst stop-the-world GC-collection time is on the order of 1 second. This pause sacrifice gives you better overall throughput. There are server-app types that require 16, 32, 64 or 128Gig of RAM however, and thus you need the 64bit JVM (64bit pointers increase mem-overhead and cache-coherence) and can't risk the nearly-a-minute pause-times, so you need incremental collectors (with an additional 15% performance loss). But for this, I usually say that such large memory foot-prints are best handled by consolidated network-APIs. This allows you to cluster the apps on smaller, cheaper hardware, and then have a pair of larger-memory hardware on the back-end. But at this point, is the back-end really any different than a traditional large-mem database (such as mysql-INNODB or mysql-cluster/NDB or memcached)?
And as for:
"here's a reason Java got the sluggish reputation, but it's not because the JIT code is slow. It's because the developers can get by with less of an understanding of what goes on behind the scenes, which never turns out good..."
If you're writing server-code (for targeted $10k hardware clusters), I should hope you're not paying somebody straight out of high-school. Or at least have coding-standard and code-reviews with senior staff.
Otherwise, code to google-apps and run a castraded java-api that doesn't really let you waste resources and doesn't have startup-time issues. Actually, it's a pretty good segway for server-freshmen.
-Michael