Case study · B2B distribution · CEE

Custom sales and procurement software for a distributor, with a local AI that runs it.

A CEE B2B distributor needed its quoting and procurement to stop eating people's days. We built the software that runs both, then put a local AI on top, grounded in the company's own data and running on its own hardware, that watches the business, retunes the system, reports on it, and answers for it. In production.

At a glance

Sector: B2B distribution · CEE
Built: Quoting + procurement software
Pattern: Deterministic core + sovereign local AI
Where it runs: On their own hardware · local only
Cloud: Frontier model · non-sensitive edge only
Status: In production

The problem, in plain terms

Two jobs run a distribution business. Both were done by hand.

If you move products business to business, two motions run your day: putting a price in front of a customer, and keeping the right stock on the shelf. At this distributor both were manual and spread across spreadsheets and inboxes. Slow to do, easy to get wrong, and impossible to grow without hiring more people to do the same clicking.

They wanted more than tooling. They wanted judgment on top: which products are about to move, which supplier to lean on, what to order before the trend shows up in the numbers. The catch is that this judgment runs on the most sensitive data the company has: pricing, margins, supplier terms, customer history, sales trends. None of it can be handed to a hosted cloud model, where it would leave their control.

What they needed

Judgment over their own numbers: what's moving, which supplier to call, what to order next.

The hard constraint

Nothing sensitive can leave their own infrastructure: pricing, margins, supplier terms, customer history.

The shape of the solution

One system, two minds, and a hard line around the data.

The design splits the work in two. Deterministic services handle anything that has to be exactly right. A local AI model handles the judgment: it runs on the company's own hardware and is grounded in their data through retrieval. The only thing that ever leaves the building is text with nothing confidential in it.

Read the diagram top to bottom: the team works against fast, predictable software; that software runs on the company's own data; a local model sits on top of all of it; and a single supervised channel reaches a cloud model, only for drafting.

On their hardware · local only

Sales & procurement team

asks a question · pastes a request

Deterministic core

Quotation engine

Procurement engine

plain code · exact · auditable · no model in this path

The company's own data

pricelists · sales history · supplier terms · customer history · past cases

Local AI · analyst & operator

spots trends · retunes the system · daily reports · answers questions

Cloud · at the edge

Claude Sonnet

no confidential data crosses

Sensitive data, and every model that reads it, stays inside the boundary. The cloud sees only finished, non-sensitive text.

The deterministic core

Start with everything that has to be exact. Two engines do the transactional work, and there is deliberately no model in either path. Pricing and ordering math must be correct, repeatable, and auditable, so they're plain, fast code you can read and test.

Quotation: a pricing engine, not a guess.

It ingests vendor pricelists and offers in whatever format they arrive and normalises them into one comparable catalogue. A salesperson pastes in a customer request and gets a ready offer back: part number, quantity, the best buying price across vendors, and a margin they can adjust on the spot, with vendor prioritisation and live stock ETAs folded in. Every number traces back to a source line, so a quote can be audited after the fact.

Quote path

01Customer request
↓→
02Normalise vendor pricelists
↓→
03Best buying price · vendor priority
↓→
04Margin (adjustable) · stock ETA
↓→
05Ready offer

No model in this path · every number auditable

Procurement: ordering that follows the data.

It watches what's selling and what suppliers are holding, then recommends what to order against real demand instead of a buyer's gut feel. Same principle as quoting: the recommendation is computed, explainable, and reviewable before anyone places an order.

Ordering loop

01What's selling
↓→
02What suppliers hold
↓→
03Demand read
↓→
04What to order

Ordered against real demand

Where the judgment lives

The local AI: an analyst that reads the business, on their own hardware.

On top of the deterministic core sits a local language model. It runs entirely on the company's own hardware and answers from their own business data through retrieval. It looks the relevant records up at question time rather than being trained on them. That single choice is what makes it both current and safe: the data never leaves, and the model is always reading today's numbers, not a frozen snapshot.

Reads the business

Surfaces trends and opportunities in the company's own sales data before they're obvious in the totals: retrieval pulls the relevant history, the model reasons over it.

Retunes the system

Add a supplier, exclude a product line, adjust the ordering logic: described in plain language and applied through typed tools, instead of someone hand-editing configuration.

Reports daily

Generates daily reports on the state of sales and stock from the same records the engines run on. No separate BI pipeline to drift out of sync.

Answers questions

Answers the team from the business's own data, with the source records behind each answer: any trend, any customer or vendor, how a shipping problem got solved last time.

Why retrieval, not fine-tuning

Fine-tuning would bake sensitive data into the model's weights: stale the moment a price changes, and impossible to fully audit or redact. Retrieval keeps the data in their systems, queried live, with every answer traceable to the records it came from. The model stays a reasoning engine; the data stays theirs.

The cloud, only at the edge.

Some jobs are pure writing: a customer reply, a tidy summary. For those, the system calls Claude Sonnet through a single controlled channel. The sensitive analysis has already happened locally; only the finished, non-confidential text crosses the line.

Data sovereignty

What stays inside, and the one thing that crosses.

The boundary is the whole point of the architecture. Everything that touches confidential data runs on the company's own infrastructure: storage, retrieval, and the model that reads it. Exactly one channel leaves it, and it carries text, not data.

Stays on-prem · inside the boundary

Pricing and margins
Supplier terms
Customer history
Sales trends and analysis
The local model and every retrieval over their data

Crosses to the cloud · Claude Sonnet

A drafted customer reply
A written summary or write-up

Sensitive analysis never leaves the local model. The egress channel carries non-sensitive composition only. It's the single place the system talks to anything external.

Right tool for each job

Deterministic where it must be exact. AI where it adds judgment.

Nothing is on the AI by default. Each piece of work runs where it belongs, and the reason is the same one every time: be exact where a number has to be right, use judgment where there isn't a fixed answer.

WorkRuns onWhy there

Pricing and marginsDeterministic codeMust be exact and auditable

Demand-based orderingDeterministic codeFollows real demand, not a gut feel

Spotting trends & opportunitiesLocal AI · retrievalJudgment over their own data

Retuning the systemLocal AI · toolsPlain language, not config edits

Daily reports & questionsLocal AI · retrievalGrounded in their data, on-prem

Drafting replies & write-upsClaude Sonnet · edgeNon-sensitive composition only

If you're building something similar

How we'd approach the same problem for you.

If you run a distribution business, or anything where sensitive numbers meet a need for judgment, four decisions carry most of the weight. Get these right and the rest is execution.

Draw the data boundary first

Decide what can never leave before you pick a single tool. The boundary shapes everything after it: where inference runs, what the cloud is allowed to see, and how you'd pass an audit.

Keep the money math deterministic

Pricing, margins, ordering quantities, anything a customer or auditor will check, belong in plain code, not a model. Reserve AI for the judgment a fixed calculation can't make.

Ground the AI in your data with retrieval

Run a capable open model on your own hardware and let it read your records at question time. You get current answers, full traceability, and no sensitive data baked into weights.

Allow one narrow edge to the cloud

A frontier model is excellent at drafting. Give it a single, supervised channel that only ever sees non-confidential text. You get the drafting quality without the exposure.

Why it holds together

Deterministic code does the transactional work, where the answer has to be exact and you need an audit trail. The local AI sits on top as the analyst and operator, grounded in the company's own data through retrieval and kept on their hardware. A frontier model handles only the non-sensitive drafting at the edge.

It's a hybrid on purpose: AI where it adds judgment, not where it would only add risk to a number that has to be right.

The result

In production and in daily use across sales and procurement. Specific figures are withheld under the client's confidentiality terms.

What it deliberately wasn't

Not "AI for everything." The pricing and the ordering math stay deterministic. The AI never touches a number that has to be exact. What the AI owns is the judgment work: spotting trends, reconfiguring the system, reporting, and answering questions.

Right tool for each job. That's the difference between software that holds up in production and a demo that doesn't.

Have a workflow like this?

Book an architecture call ← All projects