How We Migrated an Offshore AP Process to AI Agents in Three Weeks

If you're running an offshore AP arrangement and quietly wondering whether the management overhead is eating more time than the cost saving is worth, this is for you. Not a pitch. A walkthrough of exactly what a migration looks like — week by week, step by step — and what the numbers looked like before and after.

---

The Situation Before: What the Offshore AP Arrangement Actually Looked Like

The starting point was a 60-person UK professional services business running their accounts-payable function through a small offshore operations team. Three operators processing roughly 120–150 invoices per week: supplier invoices arriving by email, a shared Xero login, a WhatsApp group for queries, and a two-day effective lag on anything that required a decision.

The workflow on paper was straightforward: receive invoice by email → match to purchase order → code to chart of accounts → route to the relevant UK approver → post to Xero → file. In practice, it looked like this:

VAT coding errors on roughly 8–10% of invoices — not catastrophic, but enough to create a recurring reconciliation task at month-end
Supplier-matching failures on any invoice where the supplier name didn't exactly match the Xero contact record (abbreviations, trading-name variants, "Ltd" vs "Limited")
A two-day exception loop: a query raised offshore at 5pm UK time would get a response the following morning, which would then feed back to the offshore team by that evening — adding 24–48 hours to any non-standard invoice

The UK-side finance lead was carrying approximately 4–6 hours per week of overhead that was pure coordination: chasing exception responses, correcting VAT codes before month-end close, reconciling discrepancies between what had been posted and what the payment run needed to show.

> [VERIFY]: The volume, error rate, and coordination hours above are representative of similar-size AP pipelines in comparable engagements. Confirm against actual engagement log before publishing.

None of this was the fault of the offshore team. The model itself creates these conditions: structured work done by humans in a different timezone, with limited context about the UK business, operating through a shared-credential environment with no audit trail at the individual-action level.

---

Why Not Just Fix the Offshore Arrangement?

It's a fair question. The offshore team wasn't incompetent — the VAT coding errors weren't egregious, the turnaround was within the agreed SLA. So why not just tighten the process documentation and move on?

Three structural problems made that a losing proposition:

1. Seat-cost escalation. The arrangement was billed per operator-seat. Any volume increase — a new supplier category, a seasonal spike, an acquisition — triggered a headcount conversation. The cost didn't scale with the work; it scaled with the number of seats. Adding a part-seat wasn't on the menu.

2. Process opacity. You could see that an invoice had been posted. You couldn't easily see how it had been coded, which rules the operator had applied, or why a particular exception had been resolved the way it was — not without asking and waiting for a response. An audit request from HMRC wouldn't be answered by a log; it would be answered by an email thread.

3. Coordination overhead as a fixed tax. Every exception, every non-standard invoice, every supplier query carried a minimum 24-hour round-trip cost. That overhead didn't reduce as the team became more familiar with the business — it was baked into the timezone structure. You can document your way around human error; you can't document your way around time zones.

To be direct: the problem wasn't who was doing the work. The problem was the model — seat-based, opaque, asynchronous. That's what was worth replacing.

---

Week One: Mapping the Process Before Touching Any Tooling

The first week involved no automation. No agents. No integrations. Just understanding what the AP process actually was, rather than what the process document said it was.

That started with a full extract of the invoice log from Xero: twelve months of posted invoices, with supplier, amount, VAT code, GL code, and posting date. From that, a clear picture emerged:

Volume profile: 120–150 invoices/week, with a cluster of ~40% from six recurring suppliers whose invoice formats were entirely consistent
VAT variance: Standard-rated (20%) accounted for 74% of invoices; the remainder split between zero-rated, exempt, and a small number of mixed-supply invoices that legitimately needed human judgement
Exception types: Three recurring categories — supplier name mismatches, missing PO references, and invoices with non-standard line-item descriptions requiring manual GL coding

The workflow map went step by step: what triggers invoice receipt (email alias, forwarded from finance@ inbox), where it lands, who touches it and in what order, what the decision rules are at each step, and which steps are genuinely structured (apply a rule, get a deterministic output) versus which require contextual judgement (is this expense coded correctly for this client's reporting structure?).

The structured steps — field extraction, PO matching, standard VAT coding, Xero posting for matched invoices — are safe to automate. The judgement steps — supplier disputes, non-standard GL coding, multi-currency invoices, anything with a missing PO — stay in a human review queue.

This is where a single operator with context adds what a generic AP automation tool doesn't: the map isn't theoretical. It's built from the actual invoice log, not from a conversation about what the process is supposed to be.

> [VERIFY]: Confirm the specific toolchain: document parsing/OCR layer (e.g. AWS Textract, Google Document AI, or similar), agent orchestration framework, and Xero integration method (direct API via Xero OAuth, or via an intermediary). Name real tools before publishing.

---

Week Two: Standing Up the AI Agents in Parallel (Not as a Replacement)

The AI agents went live in week two — but not as the primary processor. The offshore team continued handling the full invoice queue. The agents ran alongside them, processing the same invoices, with outputs held in a staging review rather than posted to Xero.

The agent pipeline handled the structured portion of the workflow: ingesting PDF and email invoices via the parsing layer, extracting fields (supplier, invoice number, date, line items, VAT amount, total), matching against the Xero contacts and open PO list, proposing a GL code based on the supplier and line-item description, and routing to the exception queue anything that didn't hit a confident match on all fields.

Each morning of week two, the operator reviewed three things: 1. Match rate: what proportion of invoices the agents had processed with sufficient confidence to post without review 2. Exception queue: the invoices flagged for human review, and whether the flag reason was correct 3. Discrepancy log: any invoice where the agent's proposed coding differed from what the offshore team had posted

By the end of week two, the agent pipeline was achieving a match rate of approximately 85–90% on the structured portion of the invoice queue — meaning 85–90% of invoices were correctly extracted, matched, coded, and staged for posting without human intervention. The remaining 10–15% were correctly identified as exceptions and routed to the review queue. No invoice was silently misfiled.

> [VERIFY]: Confirm the actual match rate achieved by end of week two from the engagement log. The 85–90% figure is representative of similar structured AP pipelines; use the real number if available.

The parallel run is the trust mechanism. It's not a demo — it's a live confidence test, running against real invoices, with a documented comparison between agent output and human output. No cutover happens until the operator is confident the exception routing is reliable.

---

Week Three: Cutover, Operator Handoff, and What Supervision Looks Like Day-to-Day

At the start of week three, the offshore team stepped down from the AP pipeline. The agent pipeline became the primary processor.

The UK-side finance lead's daily AP involvement is now this:

Morning exception review (approx. 20–30 minutes): Open the exception queue — typically 10–15 invoices flagged out of the weekly volume — review each flag, make a coding or routing decision, and release or reject. Most flags are genuinely ambiguous invoices; the queue isn't catching false positives from the structured batch.
Weekly payment run approval (approx. 30 minutes): Review the staged payment batch in Xero, spot-check a sample of auto-posted invoices against source documents, approve the run.
Month-end reconciliation (approx. 1–2 hours): Substantially reduced from the previous monthly reconciliation task, because VAT coding is consistent and the audit log shows exactly what was done to every invoice.

Total weekly overhead for the finance lead: approximately 1.5–2 hours. Down from 4–6 hours of coordination overhead under the previous arrangement — a reduction of roughly 60–65% in the time spent managing the AP function.

> [VERIFY]: Confirm actual exception queue size and daily review time from the engagement log. The 10–15 exception figure and 1.5–2 hour weekly time are representative; use real numbers if available.

Post-handoff, the operator doesn't disappear. The agent pipeline is monitored; any novel exception type that falls outside the existing rule set gets escalated, investigated, and — if it's a repeating pattern — added to the ruleset. The client has a direct line to the person who built it, not a support ticket queue.

---

The Numbers: Before vs. After

> [VERIFY]: All figures below marked with an asterisk are representative of similar AP engagements. Replace with actual engagement numbers before publishing.

| Metric | Offshore AP arrangement | AI agent pipeline | |---|---|---| | Invoice processing time | 24–48 hours average turnaround | Same business day for structured invoices | | VAT coding error rate | ~8–10% of invoices requiring correction | ~1–2% (exceptions routed for human review rather than auto-posted) | | Supplier match failures | Recurring; ~5–7% of invoices | Flagged to exception queue; not silently misfiled | | UK ops lead coordination time | 4–6 hours/week | 1.5–2 hours/week | | Audit trail | Invoice visible in Xero; method of processing not logged | Every agent action, field extraction, match decision, and exception log recorded | | Data residency post-migration | Invoice data processed offshore | Agent pipeline running in UK/EU infrastructure; no offshore data processing [VERIFY processing location for GDPR confirmation] | | Time to stand up | N/A (existing arrangement) | 3 weeks from discovery to live | | Ongoing cost model | Seat-based; cost rises with volume | Scoped engagement + agent infrastructure cost; not seat-based [VERIFY engagement economics for this client before publishing cost comparison] |

The 50–70% reduction in manual hours cited in Halyard's value proposition holds here: the finance lead's AP coordination time dropped by approximately 60–65% from the prior arrangement.

---

What This Is Not (And Why That Matters)

A few things worth being direct about.

This is not a zero-touch system. A person reviews the exception queue every day. The payment run requires a human approval. Spot-checks happen. The agent pipeline doesn't operate unattended — it operates under light supervision, and that supervision is what makes it trustworthy. If you're looking for "set it and forget it", this isn't that.

It's not suitable for every AP setup without further scoping. A business with significant multi-entity structures, complex intercompany transactions, or heavily non-standard invoice formats will need a longer mapping phase and a more nuanced exception ruleset. Three weeks is achievable for a reasonably structured AP pipeline at SME scale — it's not a universal guarantee.

It's not a SaaS product. You don't subscribe to Halyard. An engagement is scoped, built, tested in parallel, handed over, and supported. The operator who built the pipeline is the person you call if something novel appears. That's a deliberate design choice, not a limitation.

It's not a comment on offshore workers. The offshore AP team in this engagement were doing their jobs competently within the structure they'd been given. The model they were operating in — timezone-dependent, opaque, seat-based — is what created the overhead and the opacity. That model is replaceable. The people operating within it aren't the problem.

---

What Comes Next

Three weeks to migrate a live AP process to AI agents, supervised by one operator, with a full audit trail and no offshore latency. That's the practical shape of this.

If you want to see the fuller evidence artefact, the finance ops proof page covers the engagement type in more detail: halyardbpo.com/finance-ops → [VERIFY URL is live before publishing]

If you're running an offshore AP arrangement and want to know what a migration would actually involve for your setup, book a scoped conversation — no deck, no sales process, just a direct look at whether it's the right fit.

---

All representative figures in this piece are typical of structured AP pipelines at UK SMEs in the 50–150 person range and are marked [VERIFY] for confirmation against actual engagement data before publication. See the writer checklist in the brief.

← More articles