Skip to main content
№ 00 · AI architecture & implementation · Bratislava

Calling the model is one line.We build everything around it.

One architect designs, builds, and runs the production system around that call — and stays on-call for 30 days after cutover.

Sebrona is a Bratislava-based AI consulting and software-engineering practice building production AI systems for European mid-market and enterprise clients.

ARCHITECT
One architect · brief to handover
EVAL SET
50–600 prompts per route · red build blocks merge
PAGED ON-CALL
30 days post-cutover · we hold the pager
MODEL FLOOR
Claude Opus 4.8 · Mistral Small 3.2 (Apache 2.0) · routing policy in week 1
CAPABILITY
p95 budget < 600ms TTFT · 5-min Anthropic cache · backup-model route
NOT FOR
horizontal SaaS subscriptions · twenty-engineer war rooms · six-week throwaway POCs
№ 01 · Capabilities

Four practices.
One architect runs all of them.

We build custom AI systems for companies, the kind you can't buy off the shelf. We automate manual, repetitive work. And when no existing product fits how a business runs, we build the software that does. One architect owns each engagement from the first call to handover.

Not on the menu: horizontal SaaS subscriptions, twenty-engineer war rooms, or six-week throwaway POCs.

  1. 01

    Custom AI systems

    We build AI systems that work from your own data and run in production.

    It’s software that works from your own records (orders, prices, products, history), not the public internet, so the answers fit your business. For example, a sales rep asks in plain language “what did this customer buy last quarter, and what’s their agreed price?” and gets it straight back, instead of digging through your old ERP. It replaces the daily hunt for things the company already knows, and it runs in everyday use on your own systems, not a demo.

    Hybrid retrieval (BM25 + Voyage-3 dense embeddings, Cohere rerank-3 on top-50) over your own corpus. The Cohere rerank is opt-in, and sovereignty-strict corpora swap Voyage-3 for a local BGE-M3. Tool-using agents over Model Context Protocol (MCP) servers with typed contracts. Private inference on open-weight models (Mistral Small 3.2 / Qwen 3, Apache-2.0) where the data is too sensitive to leave the building. A 50–600-prompt gold set gates every prompt route through Promptfoo CI. Red eval blocks merge. ADRs for major decisions. Every call carries a full OpenTelemetry trace. Cutover on a tested rollback path: if shadow traffic flags a regression we don’t switch.

    Stack
    • Hybrid retrieval · BM25 + Voyage-3
    • Tool-using agents · MCP
    • Eval harness · CI-gated
    • Local + hosted routing
  2. 02

    Process automation

    Repetitive, rule-based work: document review, data extraction, reporting. Moved onto software, with people kept on the exceptions.

    We take the work your team repeats the same way every day and give it to software, keeping people on the cases that need a decision. For example, instead of a buyer checking every SKU each morning (what sold, what to reorder), the system learns from past orders, predicts demand, and prepares the order; they set it up once, then only get notified for the exceptions, like a sudden spike or a supplier price change. The routine clears off the desk, so the hours go to judgment, not re-keying.

    Document review, data extraction, report generation, lead qualification: the work an analyst spends thirty hours a week on. Agents and n8n workflows take the routine path; the human stays on the edge cases. Where the shape fits, forty minutes per task drops to four. The shapes that fit are narrow and we name them: structured-document review, contract-field extraction, and lead triage off a fixed schema. We baseline the gain per engagement before we commit to a number. Not every workflow fits that shape, and we’ll tell you on the first call which of yours don’t. If we can’t measure the gain, you don’t pay for the build.

    Stack
    • n8n · queues
    • Pydantic AI agents
    • Webhooks · CRON
  3. 03

    Custom software & platforms

    Custom software for the processes no off-the-shelf product fits. You own what we build.

    When no product on the market matches how you actually work, or the one you have is painful to use, we build software shaped to your process instead of bending your process to a generic tool. For example, you keep the old ERP that nobody wants to open, and we put a clean front-end on top: your staff work in software built for them, while it reads and writes to the ERP underneath. You own what we build outright: no per-seat subscription, no vendor deciding your roadmap.

    End-to-end product builds for clients whose problem doesn’t fit an off-the-shelf SaaS. Tenant model, billing, RBAC, audit log, admin console. The plumbing most teams only bolt on at Series A.

    Stack
    • Postgres RLS · per-tenant
    • Stripe · automatic_tax
    • RBAC + audit log
  4. 04

    Engineering & infrastructure

    The backend, frontend and infrastructure the systems above run on, handed to your team at the end.

    This is the foundation the three above run on (the servers, databases, and apps), built to hold up in real use, not just to demo. For example, the order data and the new front-end stay fast and online through your busiest ordering days, your data sits in the EU, and your own team can read and maintain the code after we leave. You get a working system handed over with its keys, not a black box you depend on us to touch.

    Full-stack delivery: tRPC contracts with Zod validation, Postgres with row-level security, HMAC-verified webhooks, React web + React Native mobile. Cloudflare on the edge. First-byte budget: under 50ms across EU on warm-cache static and API routes; model-backed routes are gated separately at p95 < 600ms TTFT (see the stack). Data stays in EU regions (Supabase Frankfurt or your own racks).

    Stack
    • TypeScript
    • Postgres
    • Cloudflare Workers
№ 02 · Practice

The line of code is the easy part. Production AI is everything that surrounds it:

Retrieval grounded in your own data. Typed contracts that survive a schema change, evals that catch a silent faithfulness or refusal regression three sprints later and block the merge, a dashboard your DPO can read without our help, and on-call that pages the right person at 3am.

Principle 01
One lead architect owns the engagement end-to-end and writes the ADR behind every load-bearing decision. When scope demands a second head, senior team joins. They don’t take it over.
Artefact
ADR · /docs/adr/NN.md
Principle 02
Sensitive prompts run on local open-weight models (Mistral Small 3.2 / Qwen 3, Apache-2.0). Everything else routes to Claude (Opus 4.8 for reasoning, Sonnet 4.6 for throughput, Haiku 4.5 for routine routes) or the GPT-5 family through a policy you sign off on. Every call is logged with model, policy, retrieval set, and cost. The routing decision is auditable, not assumed.
Artefact
routing.policy.yaml · signed off in week 1
Principle 03
Your team owns the system the day we leave. That means ADRs in-repo, the eval harness wired into your CI, a registry with rollback documented and tested, a runbook written against actual incidents, and thirty days of paged on-call alongside you. Day 31 you have the pager.
Artefact
runbook.md · 30 days paged on-call
Live · review loop

Every system we ship goes through a review loop. This page went through it too.

Hit run and it scans the page you're reading. Each pass samples a different set of checks from the suite, and every one is already resolved in what you see.

Review complete · 4 of 54 checks
  1. Performancethree.js · ~1.18 MB deferred

    The hero's 3D scene (three.js, ~1.18 MB) loaded before the headline could paint.

    ResolvedCode-split it. The text paints first; the sphere fades in after.

  2. AccessibilityWAI-ARIA · aria-hidden

    The architecture diagram is decorative, but a screen reader announced every node as content.

    ResolvedMarked aria-hidden. The layer names live in the real text, not the geometry.

  3. Motionprefers-reduced-motion

    Reveal animations and the rotating sphere ignored prefers-reduced-motion.

    ResolvedBoth honor it now. The page renders settled and the sphere holds still.

  4. Layout regression328px shove · caught at 1920 · class guarded

    A fix from an earlier review pass added a second position class to a rail that was already fixed. The cascade resolved against intent and shoved the hero 328 pixels down on wide screens. It shipped to production before an eyes-on check at 1920 caught it.

    ResolvedRemoved the duplicate class. The audit now fails any class string carrying two position utilities, so the whole bug class is locked out.

This is the defect class most sites ship and never catch. The full suite runs on every build of this page; every fix below is real code in this repository.

№ 03 · Selected practice

Five projects.
Three we can show.

Same architect on every project, kick-off through cutover. Every build ships the same floor: typed contracts (tRPC + Zod), a 50–600-prompt eval set wired into CI (Promptfoo / Inspect AI), and OpenTelemetry traces from edge to model.

01Live · Shipping in production

ElektrikPro

Field-service SaaS for the Slovak electrical trade · operated jointly with M.Z.CONNECT s.r.o., the practicing electrician's company · paying customers

Our own vertical SaaS for the Slovak electrical trade: job tracking, invoicing, parts ordering. Co-founded with a domain operator who lives the workflow, with paying customers. The architecture patterns are the same ones we run on client engagements; we ship what we sell.

TypeScriptPostgresStripeMobileCloudflare
02Delivered · In production

B2B distribution operator · CEE

Custom sales & procurement software, with a sovereign local AI on top.

A deterministic engine handles quoting and procurement — best price across vendors, adjustable margins, stock ETAs, demand-based ordering. On top, a local AI grounded in the company's own data and running on their hardware: it surfaces trends, retunes the system in plain language, reports daily, and answers the team's questions. Cloud models touch only the non-sensitive drafting.

RAGLocal inferenceOn-premDeterministic coreClaude Sonnet
03Internal R&D · Sebrona testbed

JARVIS · internal R&D

The architect’s own 24/7 private AI infrastructure

The founder’s own stack, running on his hardware: voice (whisper-large-v3-turbo on Groq, ElevenLabs streaming TTS) and chat across Mac, iPhone, and Telegram, hooked into calendar, mail, messages, journal, health, and whatever app is on his screen. 204 tools reachable via Model Context Protocol. Routing is hybrid by design: Claude Opus 4.8 for reasoning-heavy turns, local Mistral Small 3.2 in the office for prompts that can’t leave the network. It’s also where we prove architecture patterns before they ship to clients.

VoiceLocal-firstMCP24/7Founder-run
04Delivered · NDA

EU public‑sector AI pilot

Private inference, default‑deny data, full audit trail

AI document workflow for a customer that cannot send data to US-hosted models. Hybrid retrieval over ~40k documents (BM25 + Voyage-3 dense, Cohere rerank-3 on top-50), then summarization on a quantized open-weight model on-prem. Non-sensitive prompts route to an EU-region hosted model through a policy layer the customer signed off on. The eval harness tests faithfulness, citation accuracy, and refusal correctness, built against a 600-item gold set with the customer’s own domain reviewers. Default-deny at every boundary; full audit trail streamed to their SIEM. Client name under NDA.

RETRIEVAL
Hybrid · 40k docs
EVAL SET
600 items · expert-reviewed
INFERENCE
Mistral / Qwen · Q4_K_M–Q8 · on-prem GPU
AUDIT
Default-deny · SIEM-streamed
Local inferenceDefault-denyEU-regionAudit
05In build · 2026

European venue operator · pre-launch

Bookings, assets, reporting, built around how the venue works

Custom operations platform for an asset-heavy venue operator that doesn’t fit horizontal SaaS. Bookings, asset management, reporting. Modelled on how the team works today, not how Mews or Lightspeed would prefer they worked. Six-month pre-launch, three internal cohorts before public open. Client name under NDA.

PRE-LAUNCH
6 months · 3 internal cohorts
DOMAIN
Bookings · assets · reporting
BUILT FOR
Asset-heavy ops, not horizontal SaaS
OPEN
Q4 2026 · staged rollout
BookingsAsset mgmtMulti-tenantCustom
№ 04 · How an engagement runs

Five phases. By the end of week one, a fixed price.

Same order, every engagement. You commit to week one up front. By that Friday you have a fixed-scope proposal and a price. If the project doesn’t make sense as scoped, we say so and you owe nothing.

  1. 01

    Diagnostic week

    5 days

    On-site or remote. Read the codebase, the data, the team. End the week with a written architecture brief and a fixed-scope proposal.

  2. 02

    Architecture spec

    1–2 weeks

    Signed-off design: data model, inference routing, eval plan, integration contracts, on-call runbook. Reviewable by your team before any code ships.

  3. 03

    Build sprint

    4–10 weeks

    The build. We commit to your repo from day one, with weekly demos against the spec. The lead architect codes through to ship; senior engineers join when scope demands. No senior-to-junior handoff in week six.

  4. 04

    Eval & cutover

    1–2 weeks

    The eval harness runs against production data. Shadow traffic, then partial cutover, then full. Rollback path documented and tested before the switch.

  5. 05

    On-call handover

    30 days, included

    Paged on-call for the first month after cutover. Then a clean handover to your team: runbook, dashboards, escalation policy, post-mortem template.

Anti-patterns
  • No senior-to-junior bait-and-switch.
  • No phase-6 ‘optimisation’ upsell.
  • No time-and-materials drift past the fixed-scope proposal.
Total
7–15weeksdiagnostic week through on-call handover
Contracted in EURfixed-scope or T&M

We don't publish typical ranges because every engagement starts with a fixed-price Diagnostic week. The only number that matters is the one in your proposal Friday of week one.

When Sebrona, when not
01

Sebrona is the right call when

  • Production AI inside an EU data boundary.
  • One architect end-to-end, not a six-person staff-aug rotation.
  • The system replaces a process you used to staff with people.
02

Sebrona is the wrong call when

  • You want a horizontal SaaS subscription. Buy Copilot or ChatGPT Enterprise.
  • You want twenty engineers in a war room. Call a Big 4.
  • You want a POC you'll throw away in six weeks. Don't waste either of our time.
№ 05 · Point of view

Sovereign by default

Your data must not leave your jurisdiction without your explicit permission.

Buyers across European mid-market and public sector have walked away from "send everything to a US server, trust us." They want systems that run where the data already lives: own racks, a chosen EU region, a private inference endpoint behind the firewall. We run vLLM on Linux/CUDA, llama.cpp or Ollama on Apple Silicon for smaller footprints, with quantization picked per workload (Q4_K_M for latency-bound routes, Q8 / FP16 when GPU budget allows). In every build we ship — included, not upsold — default-deny data policies, local-first inference for sensitive prompts, a full audit trail with model and routing policy logged per call, and EU residency end to end.

Residency stops
  • Dublin
  • CF EU
  • Frankfurt
  • Supabase
  • Bratislava
  • Office
  • On-prem
  • Your racks
Data stays where it lives. We pick the region; we don’t move yours.
Compliance frameworks
  • GDPR
  • ART. 32 · SECURITY
  • AI ACT
  • ART. 9 · RISK MGMT
  • NIS2
  • ART. 21 · CYBER
Frameworks we map to in the architecture brief. Not badges, citations.
Read the termsDPAPrivacy policy
Position 01

Local-first inference where it matters.

Sensitive prompts run on open-weight models on your hardware. The rest can route to Claude (Opus 4.8 / Sonnet 4.6 / Haiku 4.5) or the GPT-5 family. Your call, written into a policy you own. Every trace shows which model got which prompt and why.

Position 02

Audit on every call.

Prompts, retrievals, tool calls, and outputs all logged with model version, routing policy, data lineage, token count, and cost. Stream exports to your SIEM. Auditable by your DPO without our help.

Position 03

EU regions, or your regions.

Cloudflare with EU Data Boundary, Supabase Frankfurt, or your own hardware in your own racks. Sebrona s.r.o. (Bratislava) holds every contract; no US partner pays us a referral fee.

Trade-off we name out loud
Open-weight models still lag the frontier on hard reasoning and long-horizon planning. You pay for sovereignty in GPU capex, ops burden, and a worse answer on the hardest few percent of prompts. Where that gap matters, those routes go to a hosted model with explicit consent and a redaction pass. The spec names which ones. Sovereignty is the default. It is not the rule.
№ 06 · Reference architecture

Six layers. One stack that doesn't move when the model does.

In plain terms: one team owns all six layers. When a model or vendor changes, your system keeps running — no rebuild, no chasing fixes across suppliers.

Defensible defaults at every layer, each picked against a documented alternative in an ADR you can argue with. We build all six in-house. No subcontracted frontend, no off-shored data tier. Swap any cell for a tool your team already runs; the contracts above and below don’t move.

The stack is opinionated. If you already run a different orchestration framework or a non-Postgres data plane, we adopt yours and the spec records the swap. The evals, the model registry, and the OTel traces don’t change.

  1. L5
    Interface

    The visible surface: chat, dashboards, embeds, the widget your buyer recognises.

    Web · React/TS
    Mobile · React Native
    Chat · copilot
    Dashboards · embeds

    React + TypeScript on the web, React Native for the mobile clients. The component primitives are shared across both. We don’t double-build the design system.

  2. L4
    API Gateway

    Auth, rate limits, request shaping. The seam between the public web and the model floor.

    tRPC + Zod
    REST · OpenAPI
    Webhooks
    Auth · RLS

    tRPC + Zod is the default. One contract from DB row to React prop. We drop to REST when an external consumer needs OpenAPI; the validation discipline is the same either way.

  3. L3
    Orchestration

    Workflow graphs and agent loops. Retries, fallbacks, and the policies that catch a partial failure before the user notices.

    LangGraph · MCP servers
    LiteLLM router · policy YAML
    Langfuse · Promptfoo
    n8n · queues

    LangGraph for stateful agent flows we run from spec to cutover; LiteLLM as the router behind the routing.policy.yaml signed off in week 1. Tool surfaces standardise on Model Context Protocol. Same server contract for Claude, local models, and the IDE. n8n behind a queue for analyst-facing automations the client will edit themselves. Langfuse for trace-level observability and prompt-version diffs; Promptfoo as the CI gate. Eval harness and prompt-and-model registry live in this layer, not as an afterthought.

  4. L2
    Model Layer

    Router across providers. Prompts under version control, an eval harness that runs on every push.

    Claude Opus 4.8 · Sonnet 4.6 · Haiku 4.5
    Mistral Small 3.2 · Qwen 3 (Apache 2.0)
    Cohere rerank-3
    Voyage-3 · BGE-M3

    Routing policy decides which model sees which prompt. Hosted frontier when reasoning load is high and data is non-sensitive; local open-weight when sovereignty wins, at the cost of GPU capex and a real gap on the hardest prompts.

  5. L1
    Data

    Postgres, pgvector, object storage. The ingestion pipelines that feed every prompt and retrieval.

    Postgres
    pgvector
    Object store
    Event stream

    One database until we prove we need two. pgvector keeps retrieval next to the row it grounds; we’ll move to a dedicated vector store only when scale forces it, and we’ll write the ADR explaining why.

  6. L0
    Infra

    EU data boundary by default. On-prem when the regulator requires it. Secrets in KMS, SLOs agreed week one.

    Cloudflare · EU
    Supabase · Frankfurt
    OTel + SLOs
    Vault · KMS

    EU data boundary by default, on-prem when the regulator requires it. Secrets in a KMS, never in env files. SLOs and error budgets agreed in week one. What we won’t measure, we won’t bill for.

  7. FULL STACK
    Six layers, one system

    All six layers run in production at once. Type contracts hold across the seams; the eval harness gates every route.

    Type contracts
    Eval harness
    OTel + audit log
    EU sovereignty
    On-prem option
    Versioned · IaC

    Every seam is a contract. When a regression slips in, the eval gates fire at the layer that broke it before it reaches the user.

    Hover · light up all six layers

↑ data flows up · ↓ requests flow down · BUDGETs are eval-gate commitments, not historical SLOs. Measured numbers sit in each engagement’s ADR.

Discipline

Discipline across every layer.

CONTRACTS
tRPC + Zod, generated
EVAL HARNESS
CI-gated, every route
DATA POLICY
Default-deny + egress allowlist
OBSERVABILITY
OTel traces + logs + metrics
№ 07 · Build baseline

Seven things in every shipped system.

  • Type-safe end to end. tRPC contracts, Zod validation at every boundary, generated clients; no untyped JSON across a network hop.
  • Eval harness on every prompt route, CI-gated via Promptfoo / Inspect AI on per-route gold sets sized to the blast radius (50–600 prompts): faithfulness, groundedness, refusal correctness, jailbreak resistance, cost ceiling, p95 latency. Pass-thresholds live in the route’s ADR (faithfulness ≥ 90 · groundedness ≥ 90 · refusal-correctness ≥ 95 · regression-detection p95 < 5%). A red build blocks merge.
  • Prompt and model registry pins every prompt and route to a version, with one-command rollback and A/B and shadow traffic out of the box.
  • OpenTelemetry traces from edge to model and back — a trace per user action, retrievals and tool calls as spans, token and cost as span attributes.
  • Default-deny data policies. PII redaction at the boundary, egress allowlist at the network edge, secrets in a KMS, never in env files.
  • ADRs in-repo for every architectural decision worth defending, so a new engineer can read them and understand why the system looks the way it does.
  • Cloudflare Pages preview to main in under six minutes, rollback is `git revert`, and database migrations are reversible or they don’t ship.
agent.ts · reference patternTypeScript
1// reference pattern (pseudocode, not a published SDK)
2import { agent, tool, eval } from '@sebrona/core'
3import { z } from 'zod'
4 
5export const triageAgent = agent({
6 model: 'claude-opus-4-8',
7 policy: 'default-deny',
8 retrieval: { store: 'pgvector', dense: 'voyage-3', lexical: 'bm25', rerank: 'rerank-3', topK: 8 },
9 tools: [searchDocs, openTicket, notifyOps],
10 guardrails: [pii.redact, eval.faithfulness(0.90)],
11 observability: { otel: true, traces: 'always' },
12})
13 
14// → typed end-to-end · eval'd on every prompt · observable by default
$

We build everything around the model.

№ 08 · Office
Miroslav Striško — Bratislava · 2026
Miroslav Striško
Sebrona s.r.o.
IČO 57 639 272
Budatínska 3230/16A · 851 06 Petržalka
Bratislava · EST. 2026

The architect on the contract is the engineer in the commit log.

One architect. Two to three engagements a quarter. The reply lands inside twenty-four hours, and your repo is in your team’s hands by week one.

Seventeen years on the buying side of enterprise technology before he flipped to the build side. A decade in financial markets first: execution, risk, the operations end. Then seven years running senior export at a B2B technology distributor, EU coverage. In 2024 he started building the systems his last two careers had spent seventeen years buying. The first engagements predate the company itself: delivered before the 2026 incorporation and carried into Sebrona s.r.o.

Sebrona ships two of its own products. ElektrikPro: co-founded with a domain operator who runs the workflow daily, paying customers. JARVIS: the founder’s private 24/7 AI stack and the test rig where we prove an architecture pattern before it touches a client engagement. The senior team joins when the scope demands a second head. The full shipping log is at sebrona.com/changelog. Recent field notes at sebrona.com/blog.

Capacity · 12 weeks
NowBookedOpen
Next opening · Week 2 · for diagnostic week
On the desk
  • Venue operator · build sprint · week 5 of 8 · cohort-2 demo Friday.
  • Diagnostic week · CEE industrial group · written brief + fixed-price proposal Friday.
  • Architecture spec · sovereign RAG pilot · routing policy under legal review.
№ 09 · Contact

First call is free.

Bring something you want built, or something that’s stuck. The architect picks up. First thirty minutes are free; if there’s nothing worth building, the call ends with that conclusion in writing.

Bring to the first call
  • A system you want built, or one that’s stuck
  • The data it would touch: regions, volumes, sensitivity
  • Who on your side owns the outcome
  • A date that matters, if there is one
First reply·Within 24 hours · from the architect · not a BDR
Engagement·Fixed-price Diagnostic week · then fixed-scope or T&M · contracted in EUR