JARVIS
JARVIS is Sebrona’s own AI system, the always-on butler running on hardware we own. A Mac, an iPhone, and a Telegram bot share one local brain. Model routing is hybrid by design: sensitive prompts stay on a local Mistral Small instance in the office; everything else routes to Claude when capability earns the round-trip. One local server holds the memory. 204 tools available to the brain.
He is also where we prove what a custom assistant for a team looks like before we sell one. The Build your own section below covers what travels to your engagement, what gets re-decided, and where the trust boundary sits.
He notices things between turns. Twelve background routines, one per minute.
A notch overlay that knows what mode he’s in.
A PyQt6 always-on-top window renders the macOS notch as a state-coloured pill at 60 fps. Custom GL shaders paint a raymarched core; QPainter draws the pill outline. State is read off a JSON file via a filesystem watcher with a 500 ms backup poll, because watcher misses atomic renames.
- IDLE · collapsed to the literal notch shape, dim teal pulse
- LISTENING · fills the screen-wide pill, bright teal core, mic-level ring
- THINKING · slow rotation, amber tint
- SPEAKING · sentence-pulse synced to TTS, full bloom on syllable peaks
- PROACTIVE · tilt-and-glow with a chime when something surfaces
Pre-rendered chimes under 400 ms. Sounds never block.
A 3D galaxy of his working memory.
A stdlib HTTP server on port 8742 renders a real-time 3D-force-graph of JARVIS’s current state. Spin it with two fingers. Embedded inside the Mac dashboard via QWebEngineView, inside the iPhone app via WKWebView, and reachable from any browser when you just want to look.
Read-only. The visualization never writes back: it’s a read-only view, not a control panel.
One server, three devices, and the state stays in sync.
A single local server in our office holds the state. Mac and iPhone are read replicas; writes go through HTTP and round-trip back in roughly ten seconds. When the server or the mesh VPN flap, writes queue locally and replay FIFO on the next round trip. Nothing is lost across a reboot.
Wake word, microphone, GL-shaded HUD overlay, a 60 fps notch pill. Native dashboard with chat, today, brain, and learning tabs. Ten launchd daemons run in the background: wake listener, agent tick, Telegram bot, VIP watcher, HUD, hotkeys, browser gateway, tool bridge, API, state sync.
Native SwiftUI app, sideloaded over Tailscale, never touches the App Store. Chat, today, brain, learning tabs. On-device SFSpeechRecognizer. Voice never leaves the phone. Server streams MP3 chunks back over a private mesh VPN.
A private bot whitelisted to one user ID. Text, voice notes, image vision. Confirmation prompts as inline-keyboard buttons. Memory, commitments, and skills are unchanged from the Mac and iPhone clients; Telegram just adds reach to wherever the user already chats.
From “Hey JARVIS” to “you’re welcome, sir” in three seconds.
The brain lifecycle is 8,345 lines of Python. Most of them are careful prompt-cache management: the stable system block stays byte-identical across turns so it hits Anthropic’s 5-min and 1-hour cache breakpoints. Get that right, the assistant costs cents per day. Get it wrong, it costs fifty euros.
Wake on the word, listen, and stand down on command.
Custom-trained wake-word models per user. Spotify auto-ducks to 30% on wake. A cached greeting path skips the brain entirely. Stand-down phrases ("thank you, that's all", "stop", "shut up") return to wake state without a brain round-trip.
Mic mutes during TTS playback. After playback, the wake-word buffer is flushed so the tail of the assistant’s own voice doesn’t re-trigger it. Slovak STT detects automatically; reply policy stays in the user’s preferred language.
204 tools at the brain’s elbow.
Every tool carries a JSON schema, a docstring, a list of natural-language phrases it’s for, a list it’s not for, and a locality tag. The brain doesn’t see all 204 every turn. A SBERT filter picks the 30–60 most relevant based on the user’s text. Embeddings are cached at startup; per-query encoding is 10–20 ms warm.
Twelve routines that watch between the user’s turns.
A 60-second tick walks the routine modules and runs whichever ones say it’s their time. Each routine has its own Anthropic client; failure of one never affects the others. Surfacing rules respect quiet hours and content-hash deduplication, so the same alert never fires twice.
Three surfacing channels: auto-speak (full TTS read-out), chime-notify (sound + banner), silent (HUD pulse only). The brain decides which channel based on time, mode, and importance.
One source of truth. Replicas everywhere else.
The server’s SQLite database, journal, learned patterns, commitments, workouts, trajectories all live there. Mac and iPhone read replicas. Writes go through HTTP to the server; bidirectional rsync mirrors back every ten seconds with newer-mtime wins.
Offline durability: when the mesh VPN is down or the server is rebooting, writes queue to local JSONL files and replay FIFO on the next round trip. No data loss across a flap.
No conflict resolution beyond “newer wins”. There’s exactly one user, so there’s nothing to resolve.
Two confirmation gates. That’s the whole list.
A curated Python set in brain.py names every tool whose effect is irreversible: mail send, calendar create, message send, type keystrokes, log workout, mark commitment done. When the brain picks one, it pauses and reads the exact arguments back to the user in plain English. Yes proceeds. No abandons.
The second gate sits in front of the browser agent: any planned action that mentions send / submit / buy / delete / pay / book / confirm is read back before execution. The audit trail captures the plan, the response, and the result.
Every confirmation slows the user down, so the list stays short: only actions whose blast radius can't be undone.
No NoSQL. No Kafka. No Postgres.
JARVIS is the test rig where we prove architecture patterns before they ship to clients. We don’t sell it as a product.
The differences from JARVIS itself are concrete. A tighter confirmation-gate list because the user count is higher than one. Bidirectional state sync to systems your team already lives in (Linear, Salesforce, Snowflake, an internal app) instead of one local SQLite. On-prem inference where the regulator or the client requires it, not just where the founder prefers it. The cache discipline and the local-Mistral fallback for sensitive prompts carry over unchanged; almost everything else gets re-decided in the Diagnostic week below.
If you want a private assistant for your team, on the apps they already use, hooked into the systems they already work in, with the data on your hardware, that’s the engagement we run.