A Song of Fire and AIs

Jan 23, 2026 • 9 min

A year ago, Tuomas Honkala didn’t expect he’d be having AI models critique each other’s code while he referees. He didn’t set out to become an AI pioneer. But here he is. Building tools that solve years-long problems and have technical debates between language models. Occasionally making them write in iambic pentameter. As a Senior Technical Architect at RELEX Solutions, exploring the frontier comes with the territory.

At RELEX, AI isn’t a new ground; it’s built into what we do as a company. RELEX’s AI platform helps retailers and manufacturers forecast demand more accurately and optimize their entire supply chain, from production to store shelves. Built-in AI and machine learning enable smarter planning and faster, more confident decisions.

So, when it came time to explore whether AI could solve one of RELEX’s biggest internalf challenges, the question wasn’t whether the team could handle it. The question was: if AI can help our customers transform their operations, what could it do for us?

Tuomas is part of a Senior Technical Architect team at RELEX. They sit between technical delivery and product/technology, and their job is to explore the frontier. When new platforms, tools, or approaches emerge, they’re the ones who figure out if they’re mature enough for the consulting organization to use. That means solving problems that don’thave obvious owners yet and doing it with a lot of autonomy. You need to be self-driven because nobody’s going to tell you exactly what to work on. Some days it’s you unsticking a complex customer implementation. The next day, it’s experimenting with AI tools to automate work that would otherwise take consultants thousands of hours.

This is a story of one of those days.

The Migration Marathon Nobody Wanted

RELEX is migrating from an older integration platform to a newer one. Nothing unusual there, technology moves on, and platforms evolve. But here’s what made this interesting: we have hundreds of customers, each with dozens of custom integrations, written in Java that transform their specific data formats. These aren’t cookie-cutter interfaces. Each customer’s systems are different, and every piece of legacy code has its own quirks.

The traditional migration path? Manually rewrite each one, customer by customer, integration by integration. The technical consulting hours required to manually convert all of this would be immense, multiple people, years of work that doesn’t create new value for customers. It’s just keeping the lights on during a platform upgrade.

The Experiment Nobody Asked For (But Everyone Needed)

Nobody told Tuomas to solve this with AI. He just had the autonomy to spend time exploring whether it was even possible.

“I started poking at the problem with Copilot, just seeing if AI could understand this type of code,” he explains. “Then I switched to Claude, which seemed better suited for the structured reasoning this needed.”

“In a lot of organizations, exploratory work like this gets killed before it produces results, either because someone wants immediate ROI or because ‘we’ve never done it that way before.'”

His team lead, Jouko Lehtonen, encouraged the experiment, and as the prototype started working, other people dealing with the migration marathon ahead started paying attention. Nobody told him he was wasting time on a pipe dream, and he had the freedom to ask, “what if?” and the time to actually find out.

Building the Pipeline

The prototype tool Tuomas built works like this:

Input: Java program files that define a customer’s legacy integrations. Typically, 50-500 lines of transformation logic that takes their raw data formats and converts them into something the system can use.

Process: A five phase AI pipeline using Claude’s API. Phase 1 uses Claude Opus for deep semantic analysis (what does this Java code actually do?). Phase 2 designs the logical structure for the new platform. Phase 3 configures input data schemas. Phase 4 generates the complete configuration for the new codeless integration platform. Phase 5 handles assembly and validation. Each phase can iterate if needed, and there are validation checks throughout to catch when the AI hallucinates features that don’t exist.

Output: A ready-to-import configuration file for RELEX’s new integration platform. The goal is to produce configurations that require as little manual cleanup as possible, and for many legacy integrations, they’re achieving that. When the team runs unit tests on AI-converted interfaces, a significant portion imports and runs successfully without human intervention.

But the standard is high. These configurations go into production-level customer environments, so “these are mostly OK” isn’t good enough.

The key architectural choice: they do this one customer at a time, one integration at a time. You can’t convert thousands of integrations into one batch and then discover they’re all broken. Each conversion needs to be testable and valid before moving to the next one. That means the tool needs to produce production ready output, not “80% there withsomeone finishing it manually” output.

The Daily Reality of Working With AI

The work is 95% Claude. Tuomas does a lot of model switching. Sometimes it’s more convenient to move quickly with Sonnet (the middle-of-the-road model), sometimes a problem requires Opus (the heavier model) for deeper reasoning. The application’s architecture reflects this: Phase 1 and Phase 4 use Opus because that’s where complex semantic analysis happens, understanding what legacy Java code really does, then generating new configurations that capture all that logic. The other phases use Sonnet because they’re more straightforward and speed matters.

“The model switching isn’t just about cost. It’s about matching cognitive load to capability.”

“Day-to-day, it’s a mix,” Tuomas says. “I’m reading AI-generated code, questioning architectural choices, and figuring out validation systems to catch when the AI confidently produces nonsense. Copilot makes an occasional appearance for IDE autocomplete, but the heavy lifting is all Claude API calls.”

You don’t need Opus to format JSON output, but you absolutely need it when you’re trying to figure out whether a HashMap lookup in Java translates to a left-outer join or an inner join in SQL.

Keeping AI Honest

Tuomas tries to keep up with new sections of code as they’re introduced. The codebase has grown large enough that end-to-end readings aren’t practical anymore, but he reads every new piece as it comes in. The tool has gone through hundreds of revisions at this point. Working with Claude is very rapid, iterative work, and between each revision, he tests whether the change had the desired impact. There’s been a lot of testing.

“Honestly, more than code reading, it’s this constant testing that’s given me the clearest picture of what the application can and cannot do,” he explains. “You feed it a legacy integration, see what comes out, check if it imports successfully into the platform, run the unit tests, and see where it breaks. Then you trace back through the phases to understand why the AI made that choice. Sometimes you really need to put things on pause and start reading the legacy integration source code to understand why the AI is not seeing the forest for the trees.”

That’s why he included a comprehensive validation system (over a dozen different validation types) into the tool itself. It catches when the AI hallucinates features the platform doesn’t support, generates syntactically correct but semantically wrong configurations, or confidently produces parameter names that don’t exist. Prevention through testing beats trying to understand every line of generated code.

When Models Disagree

There was a particularly frustrating bug where the prompt engineering baked into the code started failing due to some obscure JavaScript transpilation (the process of converting source code from one high-level programming language to another) happening behind the scenes. Tuomas was troubleshooting this between Sonnet and Opus. Both were offering all sorts of ideas, increasingly exotic solutions, none of which worked.

Finally, Copilot suggested using a parts.push() pattern to sidestep the transpilation issue entirely. Sonnet was immediately critical of this suggestion. It seemed inelegant, not the “proper” way to solve the problem. But Tuomas tried it anyway, and it worked.

“Models have bias toward solutions that sound sophisticated or architecturally clean. Sometimes the right answer is the boring one that just works.”

“That moment clarified something,” he reflects. “Having Copilot suggest something simple, then watching Sonnet criticize it, forced me to evaluate the tradeoffs rather than deferring to whichever model sounded more confident. The disagreement was more valuable than either model’s individual suggestion.”

It’s essentially rubber duck debugging with multiple ducks that argue with each other. Their blind spots don’t overlap perfectly, which means you catch more problems, as long as you’re listening to the disagreements instead of just picking the answer you like best.

The Back-of-the-Envelope Math

Hundreds of customers, dozens of integrations each. Conservatively, RELEX is looking at 2,000+ individual integrations that need migration. Manual conversion for the most complex integrations could take up to ten hours per interface, though simpler ones might be doable in an hour or two. That’s several thousand hours of consulting work if done entirely by hand.

“That means thousands of consulting hours redirected from migration work toward things that improve customer outcomes.”

“Currently, it looks like the tool should be able to handle the vast majority of cases accurately, with either perfect or near perfect results, the latter needing some human fine-tuning for the AI-generated configuration,” Tuomas explains. “And importantly, the tool should help us maintain a consistent level of quality across all these migrations, identical principles applied across the board, rather than each consultant interpreting the conversion rules slightly differently.”

But he’s not fooling himself into claiming everything can be outsourced to AI here. The tool is exactly that: a tool, not an autonomous integration golem you can leave running unsupervised. The AI-conversion results need to be tested and validated because these go into production grade customer environments. The threshold for “production ready” is straightforward: unit tests need to pass after the conversion is done. The tool outputs a confidence percentage to help prioritize where human review is most needed, but there’s no shortcut around verification.

The real win isn’t eliminating human involvement. It’s automating the conversion itself so consultants can focus on validation and the genuinely tough edge cases.

Putting an End to Stupid Things

If Tuomas had to pick one RELEX value that genuinely showed up in this work, it’s “Put an end to stupid things.”

Migrating thousands of integrations manually when the underlying logic could be automated? That’s stupid. Not because manual work is beneath anyone, but because it consumes massive consulting capacity on work that doesn’t create immediate value for customers. The value from the new platform will be realized years down the line as RELEX’s integration capabilities evolve. In the meantime, the company would spend thousands of hours just to maintain parity.

“We shouldn’t accept ‘wasteful’ work as inevitable just because ‘that’s how we’ve always done migrations.'”

“The entire motivation for building this tool was to put an end to that stupid thing,” he says. “Redirect those hours toward solving actual customer problems instead of just executing necessary but ultimately mechanical migration.”

The Shakespeare Interlude

At one point, Tuomas decided it was time to increase the sophistication of the tool. And what better way to achieve that than by having it describe the inner workings of each integration in Shakespearean iambic pentameter?

Now, while the conversion process runs, the tool delivers technically accurate descriptions of what each integration does; which columns are being joined, what filters are applied, where the data flows, but as if Shakespeare were debugging ETL pipelines. “Hark! The JOIN node doth corrupt thy column names…”

“The fact that Claude can deliver this while maintaining meter and accurately describing data transformations… it’s either impressive or deeply concerning. Possibly both.”

This story is part of RELEX’s approach to technical innovation: giving talented people hard problems, the space to solve them creatively, and the trust to figure out what works. If this sounds like the kind of place where you’d thrive, we’re always looking for bold thinkers who want to build the future with us.