Custom sales and procurement software for a distributor, with a local AI that runs it.
A CEE B2B distributor needed its quoting and procurement to stop eating people's days. We built the software that runs both, then put a local AI on top, grounded in the company's own data and running on its own hardware, that watches the business, retunes the system, reports on it, and answers for it. In production.
- Sector
- B2B distribution · CEE
- Built
- Quoting + procurement software
- Pattern
- Deterministic core + sovereign local AI
- Where it runs
- On their own hardware · local only
- Cloud
- Frontier model · non-sensitive edge only
- Status
- In production
Two jobs run a distribution business. Both were done by hand.
If you move products business to business, two motions run your day: putting a price in front of a customer, and keeping the right stock on the shelf. At this distributor both were manual and spread across spreadsheets and inboxes. Slow to do, easy to get wrong, and impossible to grow without hiring more people to do the same clicking.
They wanted more than tooling. They wanted judgment on top: which products are about to move, which supplier to lean on, what to order before the trend shows up in the numbers. The catch is that this judgment runs on the most sensitive data the company has: pricing, margins, supplier terms, customer history, sales trends. None of it can be handed to a hosted cloud model, where it would leave their control.
One system, two minds, and a hard line around the data.
The design splits the work in two. Deterministic services handle anything that has to be exactly right. A local AI model handles the judgment: it runs on the company's own hardware and is grounded in their data through retrieval. The only thing that ever leaves the building is text with nothing confidential in it.
Read the diagram top to bottom: the team works against fast, predictable software; that software runs on the company's own data; a local model sits on top of all of it; and a single supervised channel reaches a cloud model, only for drafting.
Sensitive data, and every model that reads it, stays inside the boundary. The cloud sees only finished, non-sensitive text.
Start with everything that has to be exact. Two engines do the transactional work, and there is deliberately no model in either path. Pricing and ordering math must be correct, repeatable, and auditable, so they're plain, fast code you can read and test.
Quotation: a pricing engine, not a guess.
It ingests vendor pricelists and offers in whatever format they arrive and normalises them into one comparable catalogue. A salesperson pastes in a customer request and gets a ready offer back: part number, quantity, the best buying price across vendors, and a margin they can adjust on the spot, with vendor prioritisation and live stock ETAs folded in. Every number traces back to a source line, so a quote can be audited after the fact.
Procurement: ordering that follows the data.
It watches what's selling and what suppliers are holding, then recommends what to order against real demand instead of a buyer's gut feel. Same principle as quoting: the recommendation is computed, explainable, and reviewable before anyone places an order.
The local AI: an analyst that reads the business, on their own hardware.
On top of the deterministic core sits a local language model. It runs entirely on the company's own hardware and answers from their own business data through retrieval. It looks the relevant records up at question time rather than being trained on them. That single choice is what makes it both current and safe: the data never leaves, and the model is always reading today's numbers, not a frozen snapshot.
Fine-tuning would bake sensitive data into the model's weights: stale the moment a price changes, and impossible to fully audit or redact. Retrieval keeps the data in their systems, queried live, with every answer traceable to the records it came from. The model stays a reasoning engine; the data stays theirs.
What stays inside, and the one thing that crosses.
The boundary is the whole point of the architecture. Everything that touches confidential data runs on the company's own infrastructure: storage, retrieval, and the model that reads it. Exactly one channel leaves it, and it carries text, not data.
Sensitive analysis never leaves the local model. The egress channel carries non-sensitive composition only. It's the single place the system talks to anything external.
Deterministic where it must be exact. AI where it adds judgment.
Nothing is on the AI by default. Each piece of work runs where it belongs, and the reason is the same one every time: be exact where a number has to be right, use judgment where there isn't a fixed answer.
How we'd approach the same problem for you.
If you run a distribution business, or anything where sensitive numbers meet a need for judgment, four decisions carry most of the weight. Get these right and the rest is execution.
Deterministic code does the transactional work, where the answer has to be exact and you need an audit trail. The local AI sits on top as the analyst and operator, grounded in the company's own data through retrieval and kept on their hardware. A frontier model handles only the non-sensitive drafting at the edge.
It's a hybrid on purpose: AI where it adds judgment, not where it would only add risk to a number that has to be right.
In production and in daily use across sales and procurement. Specific figures are withheld under the client's confidentiality terms.
Not "AI for everything." The pricing and the ordering math stay deterministic. The AI never touches a number that has to be exact. What the AI owns is the judgment work: spotting trends, reconfiguring the system, reporting, and answering questions.
Right tool for each job. That's the difference between software that holds up in production and a demo that doesn't.
Have a workflow like this?