AI agents in B2B software development

Over the past two years, the conversation about AI in software development has gone from "GitHub Copilot autocompletes my functions" to something a lot more uncomfortable: agents that read an issue, design a plan, modify files, run tests, revert when they're wrong, and open a PR with their own summary. At ITSense we see this live every sprint, and the experience has forced us to rewrite how we talk about engineering with our B2B clients.

This article is a horizontal cut of what we've learned working with agents as first-class citizens on real projects for banking, cooperatives, aviation, and government.

1. From assistants to executors

The difference between an assistant and an agent looks semantic until you live it. An assistant answers questions; an agent decomposes the task, executes, observes the result, and decides the next step. On our teams, a typical agent does all of this with no human intervention while a senior engineer reviews in parallel.

The operational consequence is concrete: the engineer stops writing routine code and becomes the orchestrator-in-chief. They define the contract, pick the right model for the task, review the diff, approve the merge. That new role is what we mean when we say ITSense Method: AI acts, humans decide.

2. Multi-model isn't a luxury, it's hygiene

Marrying a single AI provider in 2026 is the same bad decision as marrying a single cloud in 2016. Different models have different strengths:

Claude excels at complex reasoning, large refactors, and adherence to long instructions.
GPT-4/5 shines at exploratory tasks and creative generation.
Gemini offers broad multimodality and long context at low cost.
Cohere is a standard in companies with English/Spanish data restrictions.
Meta Llama and open weights let you run inference on-premise when data can't leave.

At a top-5 banking client, we ran a feature's full development cycle with three different models picked per subtask: one for architectural design, another for critical transactional code, a third for tests. Raw time savings vs. the prior single-model baseline: 41%.

"We pick the best model for each task. Not the one that sells best." — Operating principle of the ITSense stack.

3. Where agents work and where they don't

We've iterated a lot on where agents deliver real value. Summary:

Yes — high ROI

Mass stack migrations (e.g., Rails 5 → 7, .NET Framework → .NET 8).
Initial generation of test suites (unit, integration, E2E).
Naming or pattern refactors across large monorepos.
Technical documentation (ADRs, READMEs, runbooks) from real code.
Bug triage with full repo context.

Still under strict human supervision

High-risk financial logic (charges, interest, collateral).
Production database schema changes.
Regulator integrations (SEC, FINRA, IRS, OCC, NYDFS, or local equivalents).
Anything that touches credentials, secrets, or PII.

Not because the agent can't, but because the cost of error in those zones is asymmetric. A bug in a naming migration is caught in tests; a bug in interest calculation hits closing balances and can land in front of the regulator.

4. What this means for B2B clients

The B2B clients riding this wave best have three things in common:

They raise the bar on the backlog. If the backlog used to have 40 small issues per quarter, it now has 12 large initiatives, because the agent ships the 40 small ones in half the time.
They invest in persistent context. The bottleneck isn't the model anymore; it's the quality of the context we feed it. Living documentation, well-maintained ADRs, and a unified substrate per project multiply agent output.
They scrutinize IP and vendor lock. Contracts with clear clauses on training data, data residency, and portability of generated code. See our piece AI-first vs. AI-enabled for the full checklist.

5. What's next: agents with client memory

We're piloting something that changes the game again: agents with persistent memory at the client level. The agent remembers every architectural decision, every incident, every trade-off the team made over the last sprints. When humans rotate, the knowledge stays.

At two cooperative clients this has cut new-engineer onboarding from three weeks to four days.

Closing

Agents won't replace engineers. They'll replace engineers who don't learn to direct agents. At ITSense every team member is paired with at least one AI agent as part of the daily work. It isn't marketing — it's how we've built software for two years.

If you're curious about how this would work in your organization, let's talk. A two-week Discovery is the fastest way to see the difference without committing to a large project.

← Previous: AI-first vs. AI-enabled Next: Time-to-market with AI →

How AI agents are transforming B2B software development in Colombia