[DISCUSS] IBM using LLMs to convert COBOL to Java

bahmanm@lemmy.ml · edit-2 1 year ago

[DISCUSS] IBM using LLMs to convert COBOL to Java

halfempty@kbin.social · 1 year ago

That’s alot of effort to go from one horrible programming language to another horrible programming language.

Juja@lemmy.world · 1 year ago

What would your language of choice have been? And why is java horrible for this scenario? it sounds like a reasonably good choice to me

AttackPanda@programming.dev · 1 year ago

I’m thinking Go or Rust would be the logical next step. They probably won’t want an interpreted language so Python is out.

Juja@lemmy.world · 1 year ago

Just curious, what about go or rust makes them the logical next choice and not java? What do go or rust do better that java doesn’t?

PuppyOSAndCoffee@lemmy.ml · edit-2 1 year ago

Java is an Oracle honey pot, a royal sustainment PIA, massive security liability, clutters up systems with its nonsense and slow as shit.

“dear diary, despite running on a system with 1TB of RAM, a routine security patch reset the Java max memory quota and now every Java process stops after 256MB of object allocation. All four threads ran out of memory with 999GB RAM free. Thank you for this wonderful and blessed gift of computational ineptitude, amen.”

Bene7rddso@feddit.de · 1 year ago

If they don’t want interpreted Java is out too

PuppyOSAndCoffee@lemmy.ml · 1 year ago

Rust - absolutely.

infamous_trade@lemmy.ml · 1 year ago

javascript

cow@lemmy.world · 1 year ago

JavaScript is worse than COBOL.

AttackPanda@programming.dev · 1 year ago

I’m assuming there is an implied /s here

ImpossibleRubiksCube@programming.dev · edit-2 1 year ago

deleted by creator

Taco@lemmy.zip · 1 year ago

JavaScript is actually really nice as a beginner programming language because of how quickly and visually you can see your results, and how easily you can debug with console output. Yeah it’s horribly unoptimized but it’s not for big things. It’s for little things. It’s baby’s first programming language.

PuppyOSAndCoffee@lemmy.ml · edit-2 1 year ago

It actually is pretty quick. Dont sleep on JavaScript capabilities. However, it is untyped. You wouldn’t want the date you wrote your check to become the amount of your check, for example.

TypeScript does a nice job there but all in all at that point might as well go all in on a typed language.

pastermil@sh.itjust.works · 1 year ago

Python, anyone?

orangeboats@lemmy.world · 1 year ago

Not nearly as performant as either Java or COBOL.

PuppyOSAndCoffee@lemmy.ml · 1 year ago

What

Java is a POS and mainframes run faster emulated on a rasberry pi vs the actual hw.

The real answer is you want a typed language with financial transactions. But even python would be better than Java.

ImpossibleRubiksCube@programming.dev · edit-2 1 year ago

deleted by creator

PuppyOSAndCoffee@lemmy.ml · 1 year ago

Yes. Leave it to IBM to take a terrible idea and make it worse.

ImpossibleRubiksCube@programming.dev · edit-2 1 year ago

deleted by creator

PeterPoopshit@lemmy.world · edit-2 1 year ago

But then at least by the time they get it working, they’ll have enough practice to make a new llm to convert their Java code to a useful programming language.

Java is definitely a programming language but good luck actually getting it to compile on anyone else’s machine besides the person who wrote the project.

ImpossibleRubiksCube@programming.dev · edit-2 1 year ago

deleted by creator

loutr@sh.itjust.works · 1 year ago

Well that’s a new one, in most cases modern Java projects are built by simply running “./mvnw package”, on every platform.

argv_minus_one@beehaw.org · 1 year ago

If even highly skilled humans couldn’t do that, artificial pseudointelligence doesn’t stand a chance in hell.

There’s nothing of substance here. Just suits chasing buzzwords. Nothing will actually happen, just like nothing actually happened every other time some fancy new programming language or methodology came along and tried to replace COBOL, including Java.

duncesplayed@lemmy.one · 1 year ago

This is what I don’t get. Rewriting COBOL code into Java code is dead easy. You could teach a junior dev COBOL (assuming this hasn’t been banned under the Geneva Convention yet) and have them spitting out Java code in weeks for a lot cheaper.

The problem isn’t converting COBOL code to Java code. The problem is converting COBOL code to Java code so that it cannot ever possibly have even the most minute difference or bug under any possible circumstances ever. Even the tiniest tiniest little “oh well that’s just a silly little thing” bug could cost billions of dollars in the financial world. That’s why you need to pay COBOL experts millions of dollars to manage your COBOL code.

I don’t understand what person looked at this problem and said “You know what never does anything wrong or makes any mistake ever? Generative AI”

PuppyOSAndCoffee@lemmy.ml · 1 year ago

Ooh good point

What if IBM had a product that did the COBOL->Java conversion (no what if tbh, believe it exists), then just changed the marketing material to make it seem flashy?

So like, you think it’s Ai but really it’s the same grammar translation functions that have been around for ever.

IHeartBadCode@kbin.social · 1 year ago

This sounds no different than the static analysis tools we’ve had for COBOL for some time now.

The problem isn’t a conversion of what may or may not be complex code, it’s taking the time to prove out a new solution.

I can take any old service program on one of our IBM i machines and convert it out to Java no problem. The issue arises if some other subsystem that relies on that gets stalled out because the activation group is transient and spin up of the JVM is the stalling part.

Now suddenly, I need named activation and that means I need to take lifetimes into account. Static values are now suddenly living between requests when procedures don’t initial them. And all of that is a great way to start leaking data all over the place. And when you suddenly start putting other people’s phone numbers on 15 year contracts that have serious legal ramifications, legal doesn’t tend to like that.

It isn’t just enough to convert COBOL 1:1 to Java. You have to have an understanding of what the program is trying to get done. And just looking at the code isn’t going to make that obvious. Another example, this module locks a data area down because we need this other module to hit an error condition. The restart condition for the module reloads it into a different mode that’s appropriate for the process which sends a message to the guest module to unlock the data area.

Yes, I shit you not. There is a program out there doing critical work where the expected execution path is to on purpose cause an error so that some part of code in the recovery gets ran. How many of you think an AI is going to pick up that context?

The tools back then were limited and so programmers did all kinds of hacky things to get particular things done. We’ve got tools now to fix that, just that so much has already been layered on top of the way things work right now. Pair with the whole, we cannot buy a second machine to build a new system and any new program must work 99.999% right out of the gate.

COBOL is just a language, it’s not the biggest problem. The biggest problem is the expectation. These systems run absolutely critical functions that just simply cannot fail. Trying to foray into Java or whatever language means we have to build a system that doesn’t have 45 years worth of testing that runs perfectly. It’s just not a realistic expectation.

aksdb@feddit.de · 1 year ago

What pisses me off about many such endeavors is, that these companies always want big-bang solutions, which are excessively hard to plan out due to the complexity of these systems, so it’s hard to put a financial number on the project and they typically end up with hundreds of people involved during “planning” just to be sacked before any meaningful progress could be made.

Instead they could simply take the engineers they need for maintenance anyway, and give them the freedom to rework the system in the time they are assigned to the project. Those systems are - in my opinion - basically microservice systems. Thousands of more or less small modules inter-connected by JCL scripts and batch processes. So instead of doing it big bang, you could tackle module by module. The module doesn’t care in what language the other side is written in, as long as it still is able to work with the same datastructure(s).

Pick a module, understand it, write tests if they are missing, and then rewrite it.

After some years of doing that, all modules will be in a modern language (Java, Go, Rust, whatever) and you will have test coverage and hopefully even documentation. Then you can start refactoring the architecture.

But I guess that would be too easy and not enterprisy enough.

gedhrel@lemmy.ml · 1 year ago

I think you vastly overestimate the separability of these systems.

Picture 10,000 lines of code in one method, with a history of multiple decades.

Now picture that that method has buried in it, complex interactions with another method of similar size, which is triggered via an obscure side-effect.

Picture whole teams of developers adding to this on a daily basis in realtime.

There is no “meaningful progress” to be made here. It may offend your aesthetic sense, but it’s just the reality of doing business.

aksdb@feddit.de · 1 year ago

What’s the alternative in your opinion?

Not doing anything and keep fiddling around in this mess for the next 20 years?

Continue trying to capture this problem big-bang, which means not only dealing with one such unmaintainable module but all of them at once?

Will every module be a piece of cake? Hell no. But if you never start anywhere, it doesn’t get better on its own.

gedhrel@lemmy.ml · edit-2 1 year ago

The alternative is to continue with a process that’s been demonstrably successful, despite it offending your sensibilities.

Banks are prepared to pay for it. People are prepared to do it. It meets the business needs. Change is massively high-risk in a hugely conservative industry.

aksdb@feddit.de · 1 year ago

And what is that successful process?

Kerfuffle@sh.itjust.works · 1 year ago

This sounds no different than the static analysis tools we’ve had for COBOL for some time now.

One difference is people might kind of understand how the static analysis tools we’ve had for some time now actually work. LLMs are basically a black box. You also can’t easily debug/fix a specific problem. The LLM produces wrong code in one particular case, what do you do? You can try performing fine tuning training with examples of the problem and what it should be but there’s no guarantee that won’t just change other stuff subtly and add a new issue for you to discovered at a future time.

simple@lemm.ee · 1 year ago

I have my doubts that this works well, every LLM we’ve seen that translates/writes code often makes mistakes and outputs garbage.

Jomn@jlai.lu · 1 year ago

Yes, and among the mistakes, it will probably introduce some hard to find bugs/vulnerabilities.

Vlyn@lemmy.zip · 1 year ago

Just ask it to also write tests, duh /s

Steeve@lemmy.ca · 1 year ago

You don’t need it to be perfect, there will still be human intervention.

DaPorkchop_@lemmy.ml · 1 year ago

deleted by creator

Steeve@lemmy.ca · 1 year ago

I’m obviously saying that humans can fix the issues, not that you should be landing broken code…

4stringscooter@lemmy.ml · 1 year ago

So the fintech companies who rely on that tested (though unliked) lump of iron from IBM running an OS, language, and architecture built to do fast, high-throughput transactional work should trust AI to turn it into Java code to run on hardware and infrastructure of their own choosing without having architected the whole migration from the ground up?

Don’t get me wrong, I want to see the world move away from cobol and ancient big blue hardware, but there are safer ways to do this and the investment cost would likely be worth it.

Can you tell I work in fintech?

Puzzle_Sluts_4Ever@lemmy.world · edit-2 1 year ago

Removed by mod

socsa@lemmy.ml · 1 year ago

What a terrible day to be literate

eyy@lemm.ee · 1 year ago

Not a cobol professional but i know companies that have tried (and failed) to migrate from cobol to java because of the enormously high stakes involved (usually financial).

LLMs can speed up the process, but ultimately nobody is going to just say “yes, let’s accept all suggested changes the LLM makes”. The risk appetite of companies won’t change because of LLMs.

Kache@lemm.ee · 1 year ago

Wonder what makes it so difficult. “Cobol to Java” doesn’t sound like an impossible task since transpilers exist. Maybe they can’t get similar performance characteristics in the auto-transpiled code?

Margot Robbie@lemm.ee · 1 year ago

Why Java instead of C# or Go though?

quicken@aussie.zone · 1 year ago

Because IBM doesn’t want to tie themselves to Google or Microsoft. They already have their own builds of OpenJDK.

PuppyOSAndCoffee@lemmy.ml · 1 year ago

No. Because IBM sells WebSphere, so java it is so they can up sell you for more contract labor.

loutr@sh.itjust.works · 1 year ago

Because Cobol is mainly used in an enterprise environment, where they most likely already run Java software which interfaces with the old Cobol software. Plus modern Java is a pretty good language, it’s not 2005 anymore.

PuppyOSAndCoffee@lemmy.ml · 1 year ago

Java is a POS and that’s before log4j.

Treczoks@lemm.ee · 1 year ago

“all those COBOL developer jobs” nowadays probably fit in one bus. That’s why every company that can afford it moves away from COBOL.

ArbitraryValue@sh.itjust.works · edit-2 1 year ago

according to a 2022 survey, there’s over 800 billion lines of COBOL in use on production systems, up from an estimated 220 billion in 2017

That doesn’t sound right at all. How could the amount of COBOL code in use quadruple at a time when everyone is trying to phase it out?

gravitas_deficiency@sh.itjust.works · 1 year ago

Because it’s not actually getting phased out in reality

ArbitraryValue@sh.itjust.works · 1 year ago

But it isn’t getting quadrupled either, at least because there aren’t enough COBOL programmers in the world to write that much new code that quickly.

RobotDrZaius@kbin.social · 1 year ago

It doesn’t say unique lines of code.

kitonthenet@kbin.social · 1 year ago

It could mean anything, the same code used in production in new ways, slightly modified code, newly discovered cobol where the original language was a mystery, new requirements for old systems, seriously it could be too many things for that to be a useful metric with no context

eyy@lemm.ee · 1 year ago

That doesn’t sound right at all. How could the amount of COBOL code in use quadruple at a time when everyone is trying to phase it out?

Because why they’re trying, they need to keep adding business logic to it constantly. Spaghetti code on top of spaghetti code.

RickyRigatoni@lemmy.ml · 1 year ago

Maybe some production systems were replicated at some point and they’re adding those as unique lines?

JWBananas@startrek.website · 1 year ago

The 2022 survey accounted for code that the 2017 survey missed?

ArbitraryValue@sh.itjust.works · 1 year ago

I think it’s more likely that one survey or the other (or both) are simply nonsense.

FoxBJK@midwest.social · 1 year ago

Converting ancient code to a more modern language seems like a great use for AI, in all honesty. Not a lot of COBOL devs out there but once it’s Java the amount of coders available to fix/improve whatever ChatGPT spits out jumps exponentially!

gravitas_deficiency@sh.itjust.works · 1 year ago

The fact that you say that tells me that you don’t know very much about software engineering. This whole thing is a terrible idea, and has the potential to introduce tons of incredibly subtle bugs and security flaws. ML + LLM is not ready to be used for stuff like this at the moment in anything outside of an experimental context. Engineers are generally - and with very good reason - deeply wary of “too much magic” and this stuff falls squarely into that category.

FoxBJK@midwest.social · 1 year ago

All of that is mentioned in the article. Given how much it cost last time a company tried to convert from COBOL, don’t be surprised when you see more businesses opt for this cheaper path. Even if it only converts half of the codebase, that’s still a huge improvement.

Doing this manually is a tall order…

sugar_in_your_tea@sh.itjust.works · 1 year ago

And doing it manually is probably cheaper in the long run, especially considering that COBOL tends to power some very mission critical tasks, like financial systems.

The process should be:

set up a way to have part of your codebase in your new language
write tests for the code you’re about to port
port the code
go to 2 until it’s done

If you already have a robust test suite, step 2 becomes much easier.

We’re doing this process on a simpler task of going from Flow (JavaScript with types) to TypeScript, but I did a larger transition from JavaScript to Go and Ruby to Python using the same strategy and I’ve seen lots of success stories with other changes (e.g. C to Rust).

If AI is involved, I would personally use it only for step 2 because writing tests is tedious and usually pretty easy to review. However, I would never use it for both step 2 and 3 because of the risk of introducing subtle bugs. LLMs don’t understand the code, they merely spot patterns and that’s absolutely not what you want.

Kerfuffle@sh.itjust.works · 1 year ago

Even if it only converts half of the codebase, that’s still a huge improvement.

The problem is it’ll convert 100% of the code base but (you hope) 50% of it will actually be correct. Which 50%? That’s left as an exercise to the reader. There’s no human, no plan, no logic necessarily to how it was converted also so it can be very difficult to understand code like that and you can’t ask the person who wrote why stuff is a certain way.

Understanding large, complex codebases one didn’t write is a difficult task even under pretty ideal conditions.

PuppyOSAndCoffee@lemmy.ml · edit-2 1 year ago

First, odds are only half the code is used, and in that half, 20% has bugs that the system design obscures. It’s that 20% that tends to take the lionshare of modernization effort.

It wasn’t a bug then, though it was there, but it is a bug now.

FoxBJK@midwest.social · 1 year ago

The problem is it’ll convert 100% of the code base

Please go read the article. They specifically say they aren’t doing this.

Kerfuffle@sh.itjust.works · 1 year ago

I was speaking generally. In other words, the LLM will convert 100% of what you tell it to but only part of the result will be correct. That’s the problem.

FoxBJK@midwest.social · 1 year ago

And in this case they’re not doing that:

“IBM built the Code Assistant for IBM Z to be able to mix and match COBOL and Java services,” Puri said. “If the ‘understand’ and ‘refactor’ capabilities of the system recommend that a given sub-service of the application needs to stay in COBOL, it’ll be kept that way, and the other sub-services will be transformed into Java.”

So you might feed it your COBOL code and find it only coverts 40%.

Kerfuffle@sh.itjust.works · 1 year ago

So you might feed it your COBOL code and find it only coverts 40%.

I’m afraid you’re completely missing my point.

The system gives you a recommendation: that has a 50% chance of being correct.

Let’s say the system recommends converting 40% of the code base.

The system converts 40% of the code base. 50% of the converted result is correct.

50% is a random number picked out of thin air. The point is that what you end up with has a good chance of being incorrect and all the problems I mentioned originally apply.

gravitas_deficiency@sh.itjust.works · edit-2 1 year ago

Yeah, I read the article.

They’re MASSIVELY handwaving a lot of detail away. Moreover, they’re taking the “we’ll fix it in post” approach by suggesting “we can just run an armful of security analysis software on the code after the system spits something out”. While that’s a great sentiment, you (and everyone considering this approach) needs to consider that complex systems are pretty much NEVER perfect. There WILL be misses. Add this to the fact that a ton of organizations that still use COBOL are banks - which are generally considered fairly critical to the day-to-day operation of our society, and you can see why I am incredibly skeptical of this whole line of thinking.

I’m sure the IBM engineers who made the thing are extremely good at what they do, but at the same time, I have a lot less faith in the organizations that will actually employ the system. In fact, I wouldn’t be terribly shocked to find that banks would assign an inappropriately junior engineer to the task - perhaps even an intern - because “it’s as simple as invoking a processing pipeline”. This puts a truly hilarious amount of trust into what’s effectively a black box.

Additionally, for a good engineer, learning any given programming language isn’t actually that hard. And if these transition efforts are done in what I would consider to be the right way, you’d also have a team of engineers who know both the input and output languages such that they can go over (at the very, very least) critical and logically complex areas of the code to ensure accuracy. But since this is all about saving money, I’d bet that step simply won’t be done.

IHeartBadCode@kbin.social · 1 year ago

For those who have never worked on legacy systems. Any one who suggests “we’ll fix it in post” is asking you to do something that just CANNOT happen.

The systems I code for, if something breaks, we’re going to court over it. Not, oh no let’s patch it real quick, it’s your ass is going to be cross examined on why the eff your system just wrote thousands of legal contracts that cannot be upheld as valid.

Yeah, that fix it in post shit any article, especially this one that’s linked, suggests should be considered trash that has no remote idea how deep in shit one can be if you start getting wild hairs up your ass for changing out parts of a critical system.

gravitas_deficiency@sh.itjust.works · 1 year ago

And that’s precisely the point I’m making. The systems we’re talking about here are almost exclusively banking systems. If you don’t think there will be so Fucking Huge Lawsuits over any and all serious bugs introduced by this - and there will be bugs introduced by this - you straight up do not understand what it’s like to develop software for mission-critical applications.

PuppyOSAndCoffee@lemmy.ml · 1 year ago

Trusting IBM engineers, perhaps…sales/marketing? Oooh now I am skeptical.

HellAwaits@lemm.ee · edit-2 1 year ago

deleted by creator

socsa@lemmy.ml · 1 year ago

I’m more alarmed at the conversation in this thread about migrating these cobol apps to java. Maybe I am the one who is out of touch, but what the actual fuck? Is it just because of the large java hiring pool? If you are effectively starting from scratch why in the ever loving fuck would you pick java?

NightAuthor@beehaw.org · 1 year ago

Java is the new cobol, all the enterprises love it.

AutoTL;DR@lemmings.world · 1 year ago

This is the best summary I could come up with:

For large organizations, it tends to be a complex and costly proposition, given the small number of COBOL experts in the world.

When the Commonwealth Bank of Australia replaced its core COBOL platform in 2012, it took five years and cost over $700 million.

Running locally in an on-premises configuration or in the cloud as a managed service, Code Assistant is powered by a code-generating model, CodeNet, that can understand not only COBOL and Java but around 80 different programming languages.

A recent Stanford study finds that software engineers who use code-generating AI systems similar to it are more likely to cause vulnerabilities in the apps they develop.

“Like any AI system, there might be unique usage patterns of an enterprise’s COBOL application that Code Assistant for IBM Z may not have mastered yet,” Puri said.

IBM sees a future in broader code-generating AI tools, as well — intent on competing with apps like GitHub Copilot and Amazon CodeWhisperer.

The original article contains 734 words, the summary contains 159 words. Saved 78%. I’m a bot and I’m open source!

dontcarebear@lemmy.ml · edit-2 1 year ago

deleted by creator

Duamerthrax@lemmy.ml · 1 year ago

Though previously all such cases used to happen gradually, giving most people enough time to adapt to the changes.

The Luddites would like to have a word with you.

[DISCUSS] IBM using LLMs to convert COBOL to Java

[DISCUSS] IBM using LLMs to convert COBOL to Java

IBM taps AI to translate COBOL code to Java | TechCrunch