Granola
Private, self-hosted AI meeting notes. Record on any machine, transcribe on your 3080 desktop, view transcript & AI summaries in a native Iced app. Zero cloud dependency.
System architecture
Rust Β· Iced Β· cpal"] STATE[("π€ capture + encode
Opus .ogg β memory")] VIEW["π transcript + summary
in-app tabs"] end subgraph Discovery["Auto-discovery"] MDNS["mDNS (LAN)
granola-server.local"] TS["Tailscale (remote)
granola-server:9800"] end subgraph Server["Arch + RTX 3080 (Axum API)"] direction TB API["Axum HTTP API
POST /record Β· GET /status Β· GET /result"] WHISPER["whisper.cpp
large-v3-turbo (CUDA)"] LLM["LLM summariser
Ollama / llama.cpp"] PUSH["Notion pusher
background task"] end subgraph Output["Output"] NT["π Notion tingies DB
summary + transcript + actions"] end UI -->|capture| STATE UI -->|display| VIEW STATE -->|POST /api/record| API VIEW -->|poll /api/status| API Client -.->|discover| Discovery Discovery -.->|resolve| Server API -->|spawn| WHISPER WHISPER -->|transcript| LLM LLM -->|structured notes| PUSH PUSH -->|API call| NT style Client fill:#e0e7ff,stroke:#a5b4fc,stroke-width:1 style Discovery fill:#fef3c7,stroke:#fcd34d,stroke-width:1 style Server fill:#ccfbf1,stroke:#5eead4,stroke-width:1 style Output fill:#e0e7ff,stroke:#a5b4fc,stroke-width:1
π² App window (Iced native)
Alex (0:02) So for the API migration, we need to decide whether we're doing a phased rollout or a big bang cutover.
You (0:15) Phased is safer. We can route a percentage of traffic to the new endpoints and monitor for issues.
Sam (0:28) I agree. But we need feature flags for every endpoint or the frontend team will be blocked.
β¦ 14 minutes remaining
π‘ API protocol β client β server
Client sends
Client receives
Iced recording app
Single Rust binary. One codebase runs on M2 Mac and Arch. Native, no WebKit, no Electron.
- β Iced native GUI β cosy widgets, GPU-accelerated rendering (wgpu)
- β cpal mic capture β cross-platform audio input
- β Opus encoding β ~2MB for a 30-min meeting
- β HTTP client β reqwest to POST audio + poll status
- β In-app tabs β toggle between transcript, AI summary, history
- β Auto-discovery β mDNS first, Tailscale fallback
- β No tray icon β just a window, like a proper app
Axum API server
Your Arch + 3080 desktop. Runs an HTTP server instead of a file-watch daemon. Receives audio, returns results.
- β Axum (Rust) β async, fast, type-safe API
- β POST β queue β process β receives Opus, decodes to WAV, spawns pipeline
- β whisper.cpp (CUDA) β 30-min meeting β ~90 sec
- β Ollama API β summarises transcript to structured notes
- β Notion push β background task, writes to tingies DBs
- β Job queue β concurrent meetings don't collide
π Auto-discovery
On local LAN
mDNS broadcast
Server advertises granola-server._http._tcp.local via Avahi (Arch) / Bonjour (macOS)
Client resolves
Uses libmdns (Rust crate) to discover β http://granola-server.local:9800 in ~200ms
Over Tailscale (anywhere)
Fallback trigger
mDNS timed out (2s) β client tries Tailscale MagicDNS
Tailscale DNS
Resolves granola-server.tailnet-name.ts.net:9800 β encrypted WireGuard transport
Tailscale path handles local + remote seamlessly. mDNS is a nice LAN-only optimisation for the ~2s it saves on startup.
β© End-to-end flow
Record
Click record in the Iced app, or hit a hotkey. cpal captures mic audio, encoded to Opus in-memory. Timer counts up.
Upload
Stop recording β app POSTs Opus bytes to /api/record. Server returns a job_id. App shows "uploaded, processingβ¦"
Transcribe
Axum server spawns a background task. Decodes Opus to WAV, runs whisper.cpp --model large-v3-turbo. 30-min meeting β ~90 seconds on CUDA.
Summarise
Transcript goes to Ollama API with a prompt: extract action items, decisions, key discussion points, and open questions. Returns structured JSON.
Display + push
App polls GET /api/status/{job_id} until status = "done", then GET /api/result/{job_id}. Transcript and summary render in-app tabs. In background, server pushes the same data to Notion tingies DB.
Done
App shows "Ready" with a link to the Notion page. All past recordings accessible in the History tab.
π§± Tech stack
| Component | Runs on | Lang | Key deps |
|---|---|---|---|
| App GUI | macOS Arch | Rust | iced + wgpu |
| Audio capture | β | Rust | cpal + symphonia (Opus) |
| HTTP client | β | Rust | reqwest |
| API server | Arch | Rust | axum + tokio + serde |
| Transcription | Arch | C++ | whisper.cpp + CUDA |
| Summarisation | Arch | β | Ollama API (local model) |
| Notion push | Arch | Rust | reqwest + notion API |
| mDNS discovery | β | Rust | libmdns / mdns-sd |
Why all Rust? Shared types between client and server (same crate for API structs, job IDs, audio encoding). One build system. No language boundary headaches.
π¦ Crate layout
granola/ βββ Cargo.toml # workspace root βββ crates/ β βββ granola-core/ # shared types: JobId, JobStatus, Transcript, Summary β β βββ Cargo.toml β β βββ src/lib.rs β βββ granola-client/ # Iced app: mic capture + encode + HTTP + GUI β β βββ Cargo.toml β β βββ src/ β β βββ main.rs # entry + iced runtime β β βββ ui/ # widget tree (record/stop, tabs, transcript view) β β βββ audio/ # cpal capture β Opus encoder β β βββ client.rs # reqwest API client β β βββ discovery.rs # mDNS + Tailscale resolver β βββ granola-server/ # Axum API + whisper + Ollama + Notion push β βββ Cargo.toml β βββ src/ β βββ main.rs # axum server bootstrap β βββ routes/ # POST /record, GET /status, GET /result β βββ pipeline/ # whisper runner, Ollama summariser β βββ notion.rs # Notion API client β βββ queue.rs # in-memory job queue (tokio tasks) βββ scripts/ βββ setup.sh # install whisper.cpp model, configure avahi
πΊοΈ Build phases
Phase 1 Β· Core pipeline
Server-side: receive audio β transcribe β summarise β push
- β
granola-coreshared types - β
granola-serverAxum API scaffold - β whisper.cpp integration (subprocess)
- β Ollama summariser
- β Notion push
- β Test with
curl
Working API, testable with curl
Phase 2 Β· Iced client
Native GUI: record β upload β display results
- β
granola-clientapp scaffold - β Iced window + tab layout
- β Mic capture with cpal + Opus
- β HTTP client + status polling
- β Auto-discovery (mDNS + Tailscale)
Record on laptop β view in app
Phase 3 Β· Polish
Hotkeys, history, reliability
- β Global hotkey toggle
- β History tab (past recordings)
- β Offline queue (record while server's away)
- β Speaker diarisation
- β LLM model swapping in settings
Daily driver ready
π€ Key decisions made
| Decision | Choice | Why |
|---|---|---|
| GUI framework | Iced | Native Rust. No WebKit. Proper Wayland support. One codebase for Mac + Arch. |
| App model | Window (no tray) | No Wayland tray headaches. Iced window is native and clean. |
| Communication | HTTP API (Axum) | Lets the app fetch results back. Polling is simple and reliable. Shared Rust types between client + server. |
| Process location | Desktop (3080) | 20-30Γ realtime whisper. Desktop is always-on. Single server to maintain. |
| Audio format | Opus β WAV | Opus for POST (tiny), server decodes to WAV for Whisper |
| Discovery | mDNS primary + Tailscale fallback | Sub-second LAN discovery. Tailscale handles everything else. |
| LLM | Local Ollama | Private. 3080 runs local models easily. No API costs. |
| Output | Notion + in-app display | Both. App shows instant results. Notion is the durable archive. |
β Still to decide
- β’ Ollama model? Which local model for summarisation? Llama 3.x / Mistral / Qwen? 8B fits comfortably on the 3080.
- β’ Desktop always on? If not, we should build a local recording queue in the client so you can record offline and upload later.
- β’ Hotkey? Start/stop with a global keybind without focusing the window? (Possible with Iced + a key listener crate)
- β’ Starting point? Phase 1 (server API + curl testing) gives you a working pipeline fastest. Phase 2 (Iced app) is the fun UI bit.