Automated Migration From Cobol To Java On Linux
Didier DURAND writes "Just published an article about our 100% automated migration from IBM mainframe with Cobol to Linux Java: we could convert of our own application (4 million lines of code) through the tools that we developed. Those tools are open-sourced under GPL for other companies to benefit from them. We save 3 millions euros / year after this migration!"
Sounds like it could turn out like WYSIWYG HTML Editor code. Where every word you bold has the bold tags, etc.
Dreamweaver, Word, etc all make some dang ugly HTML.
As though a few hundred cobol coders cried out in terror, and were suddenly obsolete...
BSG humor is mandatory whenever Cobol comes up...
------ The best brain training is now totally free : )
Actually that disturbance was all 9 of the world's still-living Cobol coders.
------ The best brain training is now totally free : )
Sure, you can use the magic invert flag "/usr/bin//cobol2java -i [...]".
Just asking.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
I suspect that someone will still need to fine tune the Java, and that will require an understanding of the original Cobol. Given the undeserved disparaging comments I hear around here on Cobol, Fortran, even C, I suspect the average modern developers feels overwork if they have to deal with anything more complex than Python, not saying anything bad about Python, or, even worse, does not fit into their preferred IDE. I find that if you have a basis in the original computer coding methods, all the new stuff is just a simple walk in the park.
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
I'll say what they don't buy you: The ability to throw away the old language.
If changes need to be made - and they will - you will want to change the original language not some intermediate that is stilted and hard to read at best and a candidate for an obscufated insert-language-here contest at worst.
What transcoders do buy you:
The ability to compile code on a platform that doesn't have a compiler for your flavor of your language.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
If you are right there may be a cascade of converters to run and lucrative converters market full of consultants. If things go really ugly the more conversions are done more skills will be needed to find out why converted code malfunctions. These skills could possibly even include cobol just in case one has to look at original to find out ow did it work when it did:)
All it appears to be doing is mapping COBOL line of code to Java Line of code, per Slide 25.
This is more about being able to find someone who can read and write java. The code remains procedural, the COBOL programmers do the same stuff, just in Java now.
Here's an example of the code that was spit out:
sql("SELECT * FROM RS0403 WHERE CDPROJ=#1").into(vrs0403a.dvrs0403a).param(1, tuazone.tua_I_Cdproj)
Not to dissuade, but in someways, they avoided doing a rewrite at all cost.
Great if you want to get off legacy systems, but it's not going to magically improve your code base. GIGO rules still apply.
import system.cool.Sig;
I generally agree. Perhaps the translated result will "run" at first, but the subject of maintenance cannot be ignored because it's usually the largest cost factor. The generated code is likely to be very verbose and filled with ugly translation artifacts. Machine translators are very literal in their technique to ensure compatibility.
A human translator may use knowledge of the domain or common sense to dump certain idioms or artifacts that are not likely to be necessary any more in the new language. They may take small but rational risks in order to toss some ugly nitty gritty code, for example. Machine translators are not smart enough to evaluate such risks, doing it the long way to "make sure".
One must find a way to read all that machine-generated fluff to make any changes or fixes. This makes maintenance costly and error-prone.
It would be more effective to gradually manually convert one program or module at a time.
Plus, as somebody else pointed out, Java may become a "dead" language also. I've seen about 4 language fad eras in my years in IT. The chance of Java surviving as a non-legacy language is not very big based on past patterns. Thus, one is mostly just converting one legacy language to another.
And Java has some really annoying features that I feel future languages will avoid, including putting type declarations on the left side of statements, keeping C's ugly switch/case syntax, lack of stand-alone functions, and others.
Table-ized A.I.
Though I could think of a ton of jokes - and have already seen a few - my first question is, "why."
.NET 2005 and C#, but we could have just as easily used Java. I'm not trying to say anything about the choice of language or underlying platform.)
:)
I can see the possible benefits of no longer relying on aging cobol programmers. I am often dealing with just this issue as I migrate '70's and '80's era systems off off ADABAS and COBOL. However, why would one want to make a one for one class creation of existing mainframe applications. I honestly remember a few programmers I knew doing this right before they retired back in '05. They took a COBOL/IMS application running on an AS390 and turned it into a HTML/ASP.NET application written in C# with IMS and SQL Server on a z890 in virtual MVS and SLES environments. The screens - web based now - were one for one matching with the previous mainframe screens.
My question then too, was why bother?
I just finished a second project in taking a '80's era mainframe application - this one to track the purchase of vital (birth death marriage) records - from mainframe into an n-tier model. Instead of simply copying the mainframe screens we spent time deciding what worked on the mainframe and what didn't. Some of the mainframe concepts - particularly in the public lookup - were fine. They stayed and became web-based applications. Other items were thrown out the window and completely re-worked into a user-friendly and efficient system. (In this case, we used MS
Having done a similar project for real property records in '07, we learned many lessons and were able to reuse assemblies in the new application. In fact, the entire UI, security, printing, data encapsulation, image import (there are over 160M TIFF files in our system), reporting and cashiering/finance/cash handling subsystems are identical and shared among both applications.
I can see possibly wanting to utilize some classes for back end work but wouldn't it be better to review these individually and decide what is best?
Oh, and we're saving roughly $3M/year in mainframe costs.
(Okay, post finished now to wait for someone to mod me as a troll...)
The Kai's Semi-Updated Website Thingy
What happens to the commenting? Won't this turn into an unreadable turd?
But now what will the poor grey-beards do for a living?
Wont' someone please think of the grey-beards?
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
Make that ten, although I've only coded COBOL at home for fun. Yeah, I'm probably some kind of masochist but I really wanted to give the language a spin and see if it was as horrible as people say it is.
/Mikael
Greylisting is to SMTP as NAT is to IPv4
Then convert.
While you are at it, benchmark.
TODO: create/find/steal funny sig.
I hacked into their source code, and found something a little odd:
Table-ized A.I.
...that Java is the New COBOL.
This will just carry forward and old bugs and give them life a new in a new executing environment, where no one knows what will happen. Seems like makings for several DailyWTFs to me...
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
My question is, "Do you also convert the CICS calls embedded in the code (and possibly 3270 terminal commands?!?) or is there a Java library to interface with CICS?" My experience with converters is that they follow the 90-10 rule, where they do great with 90% of the code, but that's the easiest, and could almost be done with global Find/Replace. The remaining 10% is why the conversion wasn't already done.
Later . . . Jim
This looks like a cute toy that will auto-generate some bloated code. No way big iron financial systems are moving to Java, especially auto-generated Java that will perform like crap and be harder to maintain than the COBOL it replaced.
Run and catch, run and catch, the lamb is caught in the blackberry patch.
Every person involved can now go out and be a consultant to other companies that want to migrate off their old Cobol codebase.
I know there is a certification program to check that a commercial COBOL compiler processes the whole language and produces output code that performs correctly. (Can't recall the name at the moment though I think it was in the US government - perhaps in the DoD.) I'm wondering if this tool has been submitted to that and, if so, whether it passed.
I'd occasionally thought it would be a useful thing to do something similar to this (but with ANSI V2 C++, rather than JAVA, as the target language) - and then get the tool certified. With such a certified tool IT administrators could, with confidence, transcode a COBOL application base into a language with multiple commercial and open compilers a long expected support lifetime, generating native code for virtually all possible targets (from PC clusters to current and future mainframes). If the transcoded output doesn't become excessively opaque and class-dependent it could later be warped into a more native form, should that be desirable.
Perhaps this project will be able to actually do it.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
The 1990s called... they want their "automated convert Xxxx to Java" idea back.
Run and catch, run and catch, the lamb is caught in the blackberry patch.
Now I just have to train my staff to read and write machine code, and it's bye bye COBOL forever!
CobolEngine cobol = new CobolEngine();
cobol.AddLine("ADD A TO B GIVING RESULT");
cobol.AddLine("PRINT RESULT.");
cobol.Compile();
cobol.Execute();
Do you even lift?
These aren't the 'roids you're looking for.
Cobol apps from mainframes easily runs on Linux when you just simply recompile using the free, open source, GPL'ed OpenCobol compiler. I've moved a bunch of old IBM mainframe cobol stuff to Linux with OpenCOBOL and so far, I've run into very few issues, none of which weren't solvable with a minimal amount of code changes.
OpenCOBOL also works great if you have any Oracle Cobol apps (that used the Oracle Pro*Cobol precompiler). I'm in the middle of moving a bunch of Cobol-on-Oracle stuff from an old RS/6000 AIX box to 64-bit Linux right now. I'm using Oracle's free "Oracle Enterprise Linux" which is basically a repackaged RHEL distro, and you can even buy formal support contract from Oracle for OEL. So far everything is working out great. The commodity hardware (HP Proliant DL580) server costs a mere fraction of what a new AIX box of comparable power would've cost, and I'm also benefiting from Oracle's free Virtual Server stuff (Xen-based) so I've got the functional equivalent of IBM's LPAR virtual machine technology going on commodity hardware with 10 times the speed, and 1/10th the hardware cost.
3 millions Euros a year can hire some good Java programmers. Hey Strawberryfrog, you doing anything right now?
And was it?
I don't even think I've seen any COBOL. Is it more obtuse than Erlang looks?
Get your own free personal location tracker
I'd have to say that I've seen worse languages (x86 asm, I'm looking at you) but it sure wasn't pretty.
/Mikael
Greylisting is to SMTP as NAT is to IPv4
Nope, we do not fear the dark side for we have seen the light. ( of course that light came from a 3270 in a dark basement 20 years ago... )
---- Booth was a patriot ----
And what cut of this savings have your given your programmers who have made it all possible?
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
And they will get what they pay for..
I just hope its not my bank that tries to go that route.
---- Booth was a patriot ----
An AC writing like an old codger. I think I bought my first Java book in 1995... By this standard of "fad," C was a fad when Windows NT came out.
Don't blame me, I voted for Baltar.
We used p2c to migrate a bunch of pascal code to C many years ago. It was not perfect, but not that bad. You got pretty good at figuring-out the likely places that screwed-up. Also save for the with statements in pascal being translated to accesses to structures, it made relatively readable C.
The JIT code is actually pretty fast. (especially when Hotspot is running in -server mode, which does some impressive optimization)
It just consumes way too much memory, and starts damn slow. (though still faster than C#, in my experience)
A couple times now I've taken base classes and rewritten them in Java to speed them up. ArrayList? Much too slow for my needs, even with a wrapper class ensurance it allocates in large enough chunks. By rewriting it in Java using Object arrays, I saw 60-100% speedups in the get/add methods, translating into roughly a ~2-4% speedup. (mileage may vary depending on workload)
It was about 20000% faster than a default ArrayList - mind you, by default those things allocate in 6-index chunks, so every 6 objects you add it copies the whole ArrayList to a new one, with 6 more spots available. @_@
There's a reason Java got the sluggish reputation, but it's not because the JIT code is slow. It's because the developers can get by with less of an understanding of what goes on behind the scenes, which never turns out good...
I would contend that well written java that has been jited is probablly a bit slower than well written code in a native compiled language but there isn't that much in it and it's hard to compare because the "best" coding methods in the languages are not the same.
However it does have high memory use because of the fact that jited code can't be shared between instances, the large bloat of the standard library, the limited availibility of efficiant data structures (all object types are object references) and the fact that the java garbage collectors don't realease memory back to the OS. This means that if a java app gets swapped out it takes a lot longer to swap back in. It also takes a while to start and get up to full performance, especially if it's the first time a java app has been run this OS session.
In summary on the desktop (where startup and swapin times matter) java is slow, on the server not so much.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
ensuring it allocates in large enough chunks.*
Yet again I have great desire for an edit button!
I know you were joking, but have you used Java's decimal library before?
Not quite as bad as your fake code, but the difference is... BigDecimal is actually a real class that exists in the Java standard library.
The same code in C#:
(provided you don't overflow the decimal value)
GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011
The reason you haven't seen COBOL is that most of it masquerades as JAVA or C# these days. Such are programmers skills (or 5kilz?) these days.
putting the 'B' in LGBTQ+
Why thank you for that insinuation that I'm perfect!
Implying that I would catch all spelling mistakes in the Preview (if I could just find it) is quite the compliment! I'm glad to be so far above all you mere mortals. :)
One of the design goals for COBOL was to make it possible for non-programmers such as supervisors, managers and users, to read and understand COBOL code. Wonder if the converted codes generate comments from the code ....
Holy cow, exactly what was the environment they had that cost $3M Euros/year more than the hand full of Linux boxes they replaced it with?
They could have had a collection of mainframes for $3M Euros, they were vastly under-used if they had that much in savings locked up in that environment.
Their code matches line for line, so they have the most basic of Java code in place of the COBOL code (it likely looks very similar to COBOL code) I guess.
As for the transcoding effort, they could also have fired up any one of several alternative environments that could support the COBOL natively without making some Frankenstein-like Java code...
Ken
No its the other way round. Now we can get rid of all these useless Java programmer, use our Cobol programmers who know what they are doing, and then convert their code to Java so that the PHBs who've been sold on Java can be happy.
Australian running a company that does C# / C++ / Java / SQL / Python / Mathematica
I've never understood why the bloated standard library needs to make Java so slow to load.
You need to import every class you use -- it should only be loading the classes (or packages depending on how you import) that you need.
1984 was not supposed to be an instruction manual.
Now you have a situation where you need someone who not only understand COBOL but JAVA as well. Most of the code will likely translate pretty easily but there are going to be some things that won't work the way they figured. I remember a little bit of COBOL I was involved in while on a work term at university and someone had used an odd assignment to prune the first few characters from a long string. I warned them at the time that the assignment was iffy and a different compiler could produce a result they wouldn't expect but they dismissed it. Now, what will happen when these sorts of odd assignments are passed through a COBOL to Java translator? That's where you end up needing someone who can work with both languages.
I propose a fix:
I don't therefore I'm not.
Hey, i based my future plans on learning Cobol and translating it to some of the more modern languges i know.
Yes. Automated tools work iff
a) no external library dependences
b) programmers stuck to a good style.
c) the languages are well-suited to be translated into each other (otherwise the code may be unmaintainable)
and a rewrite of your program every 30 years may save even more future maintainance, because i do not believe that the software will takes the old code *and* split it into classes/sort it to patterns in the same way an (intelligent?) human programmer would. So what you may get is java Code, but very different from the normal java style. I cite from the linked article:
-------
strongly object-oriented architecture of resulting Java objects in order to maximize the effect of all controls done by compiler. As example, each old COBOL programs becomes a Java class whose existence is checked at compile-time by rather than at runtime. Very useful when your application is 4 millions lines of code like ours and when you want to track down every typing mistake in a continuous integration architecture like ours
------
A whole old program is converted into a class? Sounds directly like from a design pattern book.
-----
pre-allocation of all program variable structures (COMMAREA of COBOL) to further improve performances but also to minimize garbage collection that freezes the system while running.
-----
This sounds like a really funny way of saying: sorry guys, we *had* some performance problem, which we fixed by a workaround to get it working.... No, we dont have anny memory leaks/allocation times. We just allocate everything the program may ever need.
----
many levels of cache to maximize performances of the new Java version of the old application. Through them, our Java-transcoded transactions and batches have better performances than their Cobol ancestors used to have on mainframe.
----
Do they want so say, that the performance was inacceptable when turning caches off?
Even better: that "import" statement just tells the compiler how to resolve names: All names in the output .class file are fully qualified, and only the classes actually referenced by the code need be loaded. If you have an import statement that isn't used, or that pulls too much; it becomes irrelevant at runtime. Also, you don't actually have to import any classes; you may refer to them by their fully qualified names, and the exact same .class file is output either way.
A snippet from Wikipedia COBOL article:
MULTIPLY B BY B GIVING B-SQUARED. .5.
MULTIPLY 4 BY A GIVING FOUR-A.
MULTIPLY FOUR-A BY C GIVING FOUR-A-C.
SUBTRACT FOUR-A-C FROM B-SQUARED GIVING RESULT-1.
COMPUTE RESULT-2 = RESULT-1 **
SUBTRACT B FROM RESULT-2 GIVING NUMERATOR.
MULTIPLY 2 BY A GIVING DENOMINATOR.
DIVIDE NUMERATOR BY DENOMINATOR GIVING X.
Not pretty.
English is not my first language. Corrections and suggestions are welcome.
Well, with all due respect to wikipedia, in your snippet much of the ugliness is due to (probably intentionally) stupidly named variables. No intelligent developer, of which there are still many in the world, would do anything like naming a variable FOUR-A-C.
In python: this_var = that_var * 5
in COBOL: COMPUTE THIS_VAR = THAT_VAR * 5.
It's verbose, but it's not hugely different, though less feature rich than any modern OOPy language with rich object libs. I work with a bunch of folks for whom COBOL is one of the languages we work with, which also include java, python, perl, C#, etc. When you're a pro, you work with what the environment calls for. If the environment is mainframe, COBOL is one of the tools you need to use. As is that other paradigm of programming excellence, JCL.
I was taught to respect my elders. The trouble is, it's getting harder and harder to find some.
At least I'm not a coward!
</petty_insult>
Hey, this guy needs an edit button too!
It's also quite straightforward and simple. And it does pretty much exactly what it looks like it does.
And it doesn't suffer from rounding issues, dropping pennies from a twenty page business ledger due to the inherent issues in representing fractions in binary.
Disclaimer - I don't actually code in the language, and I will deny knowing about the language if anybody actually tries to force me to program in it.
Glonoinha the MebiByte Slayer
Java is a perfectly acceptable programming language in many circumstances.
Such as when your programmers aren't really great :p
From http://www.paulgraham.com/gh.html
Of all the great programmers I can think of, I know of only one who would voluntarily program in Java. And of all the great programmers I can think of who don't work for Sun, on Java, I know of zero.
mind you, by default those things allocate in 6-index chunks, so every 6 objects you add it copies the whole ArrayList to a new one, with 6 more spots available. @_@
There's a reason Java got the sluggish reputation, but it's not because the JIT code is slow. It's because the developers can get by with less of an understanding of what goes on behind the scenes, which never turns out good...
Huh. You might well be the one who doesn't understand what goes on behind the scenes. Just read the source code:
public void ensureCapacity(int minCapacity) { // ... // ...
int newCapacity = (oldCapacity * 3)/2 + 1;
}
It thus allocates half the current size of the list each time the capacity is reached, and not just 6 slots.
Yes, but in that same article, just above that, it has the same code written as:
COMPUTE X1 = (-B + SQRT(B ** 2 - (4 * A * C))) / (2 * A)
COMPUTE X2 = (-B - SQRT(B ** 2 - (4 * A * C))) / (2 * A)
"There is no professional-grade COBOL available for Linux so that they must convert to another language"
I think Microfocus might disagree with you there. It's not cheap, but it's definitely used at enterprise level.
-Never argue with an idiot. They drag you down to their level, then beat you with experience-
Some people have expressed scepticism about the readability and maintainability of the generated Java code. That's a simple concern to deal with. Just run the generated Java code though a Java-to-Perl translator. Then there won't be any question at all about its level of readability and maintainability.
JIT code is slow? Or JITing the code is slow? When I run bench-marks, the average-benchark speed continually gets faster and faster as I increase the number of iterations. This is because the JIT continuously re-assesses basic-blocks to determine if coallesing is worth the expense. This means:
A) code is NOT JIT'd at first, even with -server.. ONLY after a min-number of passes (why JIT initializtion code that'll never run a second time?)
B) book-keeping to determine what to JIT and what not to JIT adds extra overhead for rarely used code
C) Even JIT'd code is continuously evaluated to determine if larger execution-flow-patterns are possible.. So the same code might be JIT'd 50 times, each time being aggregated into larger bundles of contiguous raw assembly
Thus, for short-lived applications, the JIT only provides marginal benefit if not a detriment (which is why the google android's JVM does ZERO JIT, and instead uses an alternate byte-code which is more efficient to cold-start).
But, for servers (web-servers and mainframe code which will batch process all night long), you're talking about relatively small chunks of code (out of the overall loaded library) that are highly reused. It's entirely possible that an entire web-page can be inlined into a single chunk of assembly (though it's a stretch). Not even PHP can do this (at present).
Basically typical code is complex because of optional flow-paths (if-statements, etc).. But in highly repetitive batch/server processing, there are a handful of paths which make up maybe 99% of execution, and thus can be rewritten in a way that in C would be horrendous (though certainly possible).
Doing code-analysis to prove that an array index will never be out of bounds is expensive.. But once done, you can remove a lot implicit assembly-code. So general-purpose 3'rd party code can be almost fully optimized away. You get the best of both worlds.. High-degree of defensive coding + optimized execution. The cost is the ramp-up-time (and an immodest but reasonable memory foot-print).
For single-app-servers, throwing 2 gig of RAM is NOTHING ($100 v.s. $5,000 ???), so I'm frustrated when I hear people bitch about memory consumption.. Memory use is bounded, if not, then you have a coded memory-leak and that's not the JVM's fault (usually due to misconfigured-as-unlimited in-mem caching - usually a disk-spill-over-caches that unknowingly (to the coder) retains headers in-mem).
-Michael
Proving yet again that you can write COBOL in any language.
Seriously tho, the resulting code can't be all that great for a true Java programmer to maintain after the conversion - at its heart it would still be organized in a non-OOP (procedural) manner, which would certainly require some cross-thinking.
}#q NO CARRIER
"It just consumes way too much memory, and starts damn slow."
Sigh. I've said this 100 times.. server apps (as relevant to this COBOL discussion) don't have startup time. They are up for months at at time (years even). Please distinguish between client-side apps and server-side apps.. As java-programming only exists in client-side-apps these days as server-interfaces (for fast-to-build, yet bug-free coding of mission-critical-apps) and cell-phone-apps (for hardware-portability). For this class of client-side apps, yes you can bitch about memory usage and startup time. Though hopefully memory usage shouldn't exceed 150Meg (a typical firefox executable). Server-apps, on the other hand have completely different requirements.
A 'server' that runs java costs between $900 to $10,000. What is the cost of 2Gig of memory? $100? $200? Give me a break. Further, I usually recommend only using the 32bit JVM and thus <= 2Gig of JVM memory (typically caping out at 1Gig). Thus the worst stop-the-world GC-collection time is on the order of 1 second. This pause sacrifice gives you better overall throughput. There are server-app types that require 16, 32, 64 or 128Gig of RAM however, and thus you need the 64bit JVM (64bit pointers increase mem-overhead and cache-coherence) and can't risk the nearly-a-minute pause-times, so you need incremental collectors (with an additional 15% performance loss). But for this, I usually say that such large memory foot-prints are best handled by consolidated network-APIs. This allows you to cluster the apps on smaller, cheaper hardware, and then have a pair of larger-memory hardware on the back-end. But at this point, is the back-end really any different than a traditional large-mem database (such as mysql-INNODB or mysql-cluster/NDB or memcached)?
And as for:
"here's a reason Java got the sluggish reputation, but it's not because the JIT code is slow. It's because the developers can get by with less of an understanding of what goes on behind the scenes, which never turns out good..."
If you're writing server-code (for targeted $10k hardware clusters), I should hope you're not paying somebody straight out of high-school. Or at least have coding-standard and code-reviews with senior staff.
Otherwise, code to google-apps and run a castraded java-api that doesn't really let you waste resources and doesn't have startup-time issues. Actually, it's a pretty good segway for server-freshmen.
-Michael
COBOL shops tend to have odd coding conventions including, from what I've seen, no variable names less than 40 characters.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Yeah, they do. It stems from the essentially primitive dev environments, where you don't have an IDE to help you figure out what's going on with variables. Most mature organizations created variable naming standards to help with maintainability.
I have to say, though, based on the Java and .NET shops I've worked with, lengthy variable names are hardly limited to COBOL. Eg.,
result += positivesAtCurrentScore * negativesBelowCurrentScore
from a code snippet at my current employer.
I was taught to respect my elders. The trouble is, it's getting harder and harder to find some.