Sebrona · Reference build

JARVIS

JARVIS is Sebrona’s own AI system, the always-on butler running on hardware we own. A Mac, an iPhone, and a Telegram bot share one local brain. Model routing is hybrid by design: sensitive prompts stay on a local Mistral Small instance in the office; everything else routes to Claude when capability earns the round-trip. One local server holds the memory. 204 tools available to the brain.

He is also where we prove what a custom assistant for a team looks like before we sell one. The Build your own section below covers what travels to your engagement, what gets re-decided, and where the trust boundary sits.

Build a custom assistant→01A day with the butler→

№ 01 · A day with the butler

He notices things between turns. Twelve background routines, one per minute.

05:30phase 01

Morning

A brief is generated automatically: weather, calendar, callbacks from yesterday’s journal, commitment counts, health correlations. Read on demand or delivered as a voice notification.

Mid-dayphase 02

Triage and follow-up

Say the wake word, or open the chat. Inbox triage by priority tier. Draft replies with a voice read-back. Free-slot search across the calendar. Overdue commitments surfaced. Workouts logged. Music controlled. A managed Chrome browser when an API doesn’t exist.

Eveningphase 03

Soft nudges

Escalating reminders to write the day’s journal entry. Flag the supplement stack if missed. A Sunday relationship review surfaces drifting contacts and threads that deserve a reply.

Backgroundphase 04

The 60-second tick

Twelve autonomous routines walk the state every minute: proactive scanner for events about to fire, commitment extraction from recent turns, pattern noticing across the trajectory log, weekly self-critique, nightly synthesis. Surfaces things you’d otherwise have to remember to ask about. Never blocks.

№ 02 · HUD

A notch overlay that knows what mode he’s in.

A PyQt6 always-on-top window renders the macOS notch as a state-coloured pill at 60 fps. Custom GL shaders paint a raymarched core; QPainter draws the pill outline. State is read off a JSON file via a filesystem watcher with a 500 ms backup poll, because watcher misses atomic renames.

IDLE · collapsed to the literal notch shape, dim teal pulse
LISTENING · fills the screen-wide pill, bright teal core, mic-level ring
THINKING · slow rotation, amber tint
SPEAKING · sentence-pulse synced to TTS, full bloom on syllable peaks
PROACTIVE · tilt-and-glow with a chime when something surfaces

Pre-rendered chimes under 400 ms. Sounds never block.

№ 03 · Brain visualization

A 3D galaxy of his working memory.

A stdlib HTTP server on port 8742 renders a real-time 3D-force-graph of JARVIS’s current state. Spin it with two fingers. Embedded inside the Mac dashboard via QWebEngineView, inside the iPhone app via WKWebView, and reachable from any browser when you just want to look.

Core · pulses on activity

People · ringed by importance

Tools · clustered by category

Memory · faded by age

Live · drag to orbit · scroll to zoom · click a node to fly the camera · LISTEN / THINK / SPEAK / MODE controls bottom-right

Read-only. The visualization never writes back: it’s a read-only view, not a control panel.

№ 04 · Three surfaces, one brain

One server, three devices, and the state stays in sync.

A single local server in our office holds the state. Mac and iPhone are read replicas; writes go through HTTP and round-trip back in roughly ten seconds. When the server or the mesh VPN flap, writes queue locally and replay FIFO on the next round trip. Nothing is lost across a reboot.

№ 01 · Mac

Wake word, microphone, GL-shaded HUD overlay, a 60 fps notch pill. Native dashboard with chat, today, brain, and learning tabs. Ten launchd daemons run in the background: wake listener, agent tick, Telegram bot, VIP watcher, HUD, hotkeys, browser gateway, tool bridge, API, state sync.

№ 02 · iPhone

Native SwiftUI app, sideloaded over Tailscale, never touches the App Store. Chat, today, brain, learning tabs. On-device SFSpeechRecognizer. Voice never leaves the phone. Server streams MP3 chunks back over a private mesh VPN.

№ 03 · Telegram

A private bot whitelisted to one user ID. Text, voice notes, image vision. Confirmation prompts as inline-keyboard buttons. Memory, commitments, and skills are unchanged from the Mac and iPhone clients; Telegram just adds reach to wherever the user already chats.

№ 05 · How a turn works

From “Hey JARVIS” to “you’re welcome, sir” in three seconds.

The brain lifecycle is 8,345 lines of Python. Most of them are careful prompt-cache management: the stable system block stays byte-identical across turns so it hits Anthropic’s 5-min and 1-hour cache breakpoints. Get that right, the assistant costs cents per day. Get it wrong, it costs fifty euros.

№ 01
Wake
Custom-trained wake-word model running on 80 ms audio chunks. A 1500 ms VAD silence detection closes the utterance.
№ 02
STT
Whisper via Groq for 200 ms median latency. Local whisper.cpp small.en when the network’s down.
№ 03
Stable system block
Persona + context + memory.md + learned.md assembled byte-identical across turns to hit Anthropic’s 5-min and 1-hour prompt-cache breakpoints.
№ 04
Per-turn block
Semantic-similar past turns (SBERT cosine), routing hint, channel hint, idle context, pending learnings.
№ 05
Tool filter
SBERT picks the most relevant 30–60 of 204 tool schemas. Sending all of them every turn would waste tokens.
№ 06
Stream
Routing policy picks the model: Claude Opus 4.8 for capability-heavy turns, local Mistral Small for prompts that must stay on Sebrona hardware. Tool calls route by locality: local, Mac-only, or Mac-then-cloud fallback. Confirmation gates pause on irreversible actions.
№ 07
TTS
ElevenLabs WebSocket streaming, sentence-segmented. Mic mutes during playback to prevent feedback. Falls back to macOS `say` when ElevenLabs throws.

Wake detect

80 ms

VAD silence

1500 ms

STT

200–400 ms

Brain TTFT

800–1200 ms

TTS first chunk

350 ms

№ 06 · Voice pipeline

Wake on the word, listen, and stand down on command.

Custom-trained wake-word models per user. Spotify auto-ducks to 30% on wake. A cached greeting path skips the brain entirely. Stand-down phrases ("thank you, that's all", "stop", "shut up") return to wake state without a brain round-trip.

Mic mutes during TTS playback. After playback, the wake-word buffer is flushed so the tail of the assistant’s own voice doesn’t re-trigger it. Slovak STT detects automatically; reply policy stays in the user’s preferred language.

№ 07 · Tools the brain can call

204 tools at the brain’s elbow.

Every tool carries a JSON schema, a docstring, a list of natural-language phrases it’s for, a list it’s not for, and a locality tag. The brain doesn’t see all 204 every turn. A SBERT filter picks the 30–60 most relevant based on the user’s text. Embeddings are cached at startup; per-query encoding is 10–20 ms warm.

Health18

sleep, HRV, workouts, supplements, pattern watch

Email10

Gmail + Outlook triage, draft, send, reply

Calendar9

today, week, free-slot, create, brief

Commitments14

add, overdue, at-risk, stale, review, mark done

Journal7

append, today, streak, weekly review, search

People6

brief, log, open threads, relationship audit

Messaging4

WhatsApp, iMessage, Teams, Instagram DM

Music12

Spotify play, pause, queue, volume, skip

macOS25

open app, screenshot, vision, type, lock, brightness

Code orchestration7

Claude Code project register, run, progress, stop

Git / GitHub6

status, commit, push, diff, log, repo status

Browser control1

managed Chrome via CDP, confirmation gate

HUD / mode / wake8

state, mode label, pulse, pause, resume

Today / anchors7

today list, add, remove, anchor set/get

Proactive3

queue list, dismiss, emit

Learning / proposals5

observation, confirm, propose, apply, reject

Skills4

create, invoke, list, audit

World / context10

world state, desktop, notifications, time, weather

any

111

runs locally

mac

routes via WS bridge

mac_or_cloud

Mac-first, cloud fallback

№ 08 · The staff

Twelve routines that watch between the user’s turns.

A 60-second tick walks the routine modules and runs whichever ones say it’s their time. Each routine has its own Anthropic client; failure of one never affects the others. Surfacing rules respect quiet hours and content-hash deduplication, so the same alert never fires twice.

№ 01daily 05:30

Morning brief

Synthesises calendar + inbox triage + commitments digest + health patterns + journal streak into one delivered brief.

№ 02every tick

Proactive scanner

Calendar events 4–6 min ahead, high-priority email, overdue commitments.

№ 03every tick

Commitment extraction

Pulls new commitments from recent conversation turns into the structured table.

№ 04periodic

Lookahead scanner

Multi-day calendar lookahead. Surfaces conflicts and preparation suggestions.

№ 05periodic

Pattern extraction

Queues candidate observations for the user’s nod (yes/no).

№ 0621:30 / 22:30 / 23:30

Journal nudge

Polite first nudge, then firmer, then last call, if today’s entry is missing.

№ 0708:00–09:30

Health pattern watch

Symptom cluster fires a proactive event on 3+ mentions in 14 days.

№ 08Mon 09:00

Commitments review

Unified weekly digest: overdue / at-risk / stale / closed-this-week.

№ 09Sun 10:00

Relationship review

Surfaces drifting contacts and open threads.

№ 1023:30

Nightly synthesis

End-of-day summary writer.

№ 11weekly

Weekly auto-dream

Self-reflection: what worked, what failed, what to try next week.

№ 12weekly

Weekly critique

Reviews JARVIS’s own tool choices and surface failures.

Three surfacing channels: auto-speak (full TTS read-out), chime-notify (sound + banner), silent (HUD pulse only). The brain decides which channel based on time, mode, and importance.

№ 09 · State

One source of truth. Replicas everywhere else.

The server’s SQLite database, journal, learned patterns, commitments, workouts, trajectories all live there. Mac and iPhone read replicas. Writes go through HTTP to the server; bidirectional rsync mirrors back every ten seconds with newer-mtime wins.

Offline durability: when the mesh VPN is down or the server is rebooting, writes queue to local JSONL files and replay FIFO on the next round trip. No data loss across a flap.

No conflict resolution beyond “newer wins”. There’s exactly one user, so there’s nothing to resolve.

№ 10 · Safety

Two confirmation gates. That’s the whole list.

A curated Python set in brain.py names every tool whose effect is irreversible: mail send, calendar create, message send, type keystrokes, log workout, mark commitment done. When the brain picks one, it pauses and reads the exact arguments back to the user in plain English. Yes proceeds. No abandons.

The second gate sits in front of the browser agent: any planned action that mentions send / submit / buy / delete / pay / book / confirm is read back before execution. The audit trail captures the plan, the response, and the result.

Every confirmation slows the user down, so the list stays short: only actions whose blast radius can't be undone.

№ 11 · Tech stack

No NoSQL. No Kafka. No Postgres.

Stack at a glance155 Python files · 47k LOC · 2k Swift

Server

Python 3.14 · Starlette · uvicorn · systemd

Async-first ASGI, WS native, one unit, one user.

Mac

Python 3.14 · PyQt6 · launchd · GL shaders

Native look, screen permissions trap aside, can embed web.

iPhone

Swift 5.10 · SwiftUI · WKWebView · SFSpeech

On-device STT. Tailscale-only network. No App Store.

LLM (hosted)

Claude Opus 4.8 · Haiku 4.5 · Sonnet 4.6

Tool-use quality + cache discipline (5-min and 1-hour breakpoints), when capability earns the round-trip.

LLM (local)

Mistral Small 3.2 · Qwen 3 32B

Sovereign inference on a Mac Studio for sensitive prompts that don’t leave the network.

STT

Groq whisper-large-v3-turbo → whisper.cpp small.en

200 ms hot, local offline.

TTS

ElevenLabs WS → macOS `say`

Streaming sentence segmentation. Fallback works mute.

Wake

openWakeWord · per‑user models

Voice fingerprint as auth signal.

VAD

silero-vad

Best on short utterances.

Embeddings

all-MiniLM-L6-v2 · 384-dim

90 MB, ≈10 ms inference.

Storage

SQLite + markdown + JSONL

No Postgres, no Redis, no Kafka. Single user, single server.

Browser

OpenClaw via CDP

Drives real Chrome, persisted profiles, mutation gate.

Mesh VPN

Tailscale

No public exposure. Three devices, one tailnet.

Code agent

Claude Code CLI

Long-running multi-file orchestration from inside the brain.

Tool protocol

MCP servers (Model Context Protocol)

One tool contract reachable from Claude, local models, and the IDE.

Eval / registry

Promptfoo · Inspect AI

Per-route gold sets, regression-gated in CI.

Build your own

JARVIS is the test rig where we prove architecture patterns before they ship to clients. We don’t sell it as a product.

The differences from JARVIS itself are concrete. A tighter confirmation-gate list because the user count is higher than one. Bidirectional state sync to systems your team already lives in (Linear, Salesforce, Snowflake, an internal app) instead of one local SQLite. On-prem inference where the regulator or the client requires it, not just where the founder prefers it. The cache discipline and the local-Mistral fallback for sensitive prompts carry over unchanged; almost everything else gets re-decided in the Diagnostic week below.

If you want a private assistant for your team, on the apps they already use, hooked into the systems they already work in, with the data on your hardware, that’s the engagement we run.

Phase 01 · Week 1

Diagnostic

We read your codebase, your data, your team’s actual workflow. By Friday we know whether the project makes sense as scoped, and which cells of the stack travel and which need a swap.

Phase 02 · Weeks 2–10

Build

Lead architect in your repo. Demos every Friday against the spec. Cutover on a tested rollback path.

Phase 03 · 30 days

On-call

We carry the pager for thirty days. By the end, your team owns it: runbook, eval harness, prompt-and-model registry, ADRs in the repo.

Scope your assistant→← Back to the practice