M2 Mac Arch Linux RTX 3080 Iced Tailscale

Granola

Private, self-hosted AI meeting notes. Record on any machine, transcribe on your 3080 desktop, view transcript & AI summaries in a native Iced app. Zero cloud dependency.

System architecture

flowchart LR subgraph Client["Iced app (each laptop)"] direction TB UI["Native GUI
Rust · Iced · cpal"] STATE[("🎤 capture + encode
Opus .ogg → memory")] VIEW["📖 transcript + summary
in-app tabs"] end subgraph Discovery["Auto-discovery"] MDNS["mDNS (LAN)
granola-server.local"] TS["Tailscale (remote)
granola-server:9800"] end subgraph Server["Arch + RTX 3080 (Axum API)"] direction TB API["Axum HTTP API
POST /record · GET /status · GET /result"] WHISPER["whisper.cpp
large-v3-turbo (CUDA)"] LLM["LLM summariser
Ollama / llama.cpp"] PUSH["Notion pusher
background task"] end subgraph Output["Output"] NT["📄 Notion tingies DB
summary + transcript + actions"] end UI -->|capture| STATE UI -->|display| VIEW STATE -->|POST /api/record| API VIEW -->|poll /api/status| API Client -.->|discover| Discovery Discovery -.->|resolve| Server API -->|spawn| WHISPER WHISPER -->|transcript| LLM LLM -->|structured notes| PUSH PUSH -->|API call| NT style Client fill:#e0e7ff,stroke:#a5b4fc,stroke-width:1 style Discovery fill:#fef3c7,stroke:#fcd34d,stroke-width:1 style Server fill:#ccfbf1,stroke:#5eead4,stroke-width:1 style Output fill:#e0e7ff,stroke:#a5b4fc,stroke-width:1

🔲 App window (Iced native)

Granola granola-server.local · connected

REC 0:12:34

Sprint Planning · 12 Jun

Transcript Summary History

Alex (0:02) So for the API migration, we need to decide whether we're doing a phased rollout or a big bang cutover.

You (0:15) Phased is safer. We can route a percentage of traffic to the new endpoints and monitor for issues.

Sam (0:28) I agree. But we need feature flags for every endpoint or the frontend team will be blocked.

… 14 minutes remaining

Uploaded · Processing · ~45s remaining

📡 API protocol — client ↔ server

Client sends

POST /api/record

Content-Type: audio/ogg

X-Machine: m2-mac-work

X-Recorded: 2026-06-16T09:12:00Z

--- binary Opus data ---

→ 201 { "job_id": "abc123" }

GET /api/status/abc123

→ 200 { "status": "processing", "progress": "42%" }

Client receives

GET /api/result/abc123

→ 200

{

"title": "Sprint Planning - 12 Jun",

"duration_sec": 1842,

"transcript": [

{ "speaker": "Alex", "ts": 2, "text": "..." },

"summary": {

"key_points": ["..."],

"decisions": ["Phased rollout"],

"action_items": [

{"assignee": "You", "item": "Write feature flags"}

]

"notion_url": "https://notion.so/..."

}

💻

Iced recording app

Single Rust binary. One codebase runs on M2 Mac and Arch. Native, no WebKit, no Electron.

✓ Iced native GUI — cosy widgets, GPU-accelerated rendering (wgpu)
✓ cpal mic capture — cross-platform audio input
✓ Opus encoding — ~2MB for a 30-min meeting
✓ HTTP client — reqwest to POST audio + poll status
✓ In-app tabs — toggle between transcript, AI summary, history
✓ Auto-discovery — mDNS first, Tailscale fallback
✗ No tray icon — just a window, like a proper app

🖥️

Axum API server

Your Arch + 3080 desktop. Runs an HTTP server instead of a file-watch daemon. Receives audio, returns results.

✓ Axum (Rust) — async, fast, type-safe API
✓ POST → queue → process — receives Opus, decodes to WAV, spawns pipeline
✓ whisper.cpp (CUDA) — 30-min meeting → ~90 sec
✓ Ollama API — summarises transcript to structured notes
✓ Notion push — background task, writes to tingies DBs
✓ Job queue — concurrent meetings don't collide

📍 Auto-discovery

On local LAN

mDNS broadcast

Server advertises granola-server._http._tcp.local via Avahi (Arch) / Bonjour (macOS)

Client resolves

Uses libmdns (Rust crate) to discover — http://granola-server.local:9800 in ~200ms

Over Tailscale (anywhere)

Fallback trigger

mDNS timed out (2s) → client tries Tailscale MagicDNS

Tailscale DNS

Resolves granola-server.tailnet-name.ts.net:9800 — encrypted WireGuard transport

💡 Both machines on your Tailscale tailnet? The Tailscale path handles local + remote seamlessly. mDNS is a nice LAN-only optimisation for the ~2s it saves on startup.

⏩ End-to-end flow

Record

Click record in the Iced app, or hit a hotkey. cpal captures mic audio, encoded to Opus in-memory. Timer counts up.

Upload

Stop recording → app POSTs Opus bytes to /api/record. Server returns a job_id. App shows "uploaded, processing…"

Transcribe

Axum server spawns a background task. Decodes Opus to WAV, runs whisper.cpp --model large-v3-turbo. 30-min meeting → ~90 seconds on CUDA.

Summarise

Transcript goes to Ollama API with a prompt: extract action items, decisions, key discussion points, and open questions. Returns structured JSON.

Display + push

App polls GET /api/status/{job_id} until status = "done", then GET /api/result/{job_id}. Transcript and summary render in-app tabs. In background, server pushes the same data to Notion tingies DB.

Done

App shows "Ready" with a link to the Notion page. All past recordings accessible in the History tab.

🧱 Tech stack

Component	Runs on	Lang	Key deps
App GUI	macOS Arch	Rust	iced + wgpu
Audio capture	—	Rust	cpal + symphonia (Opus)
HTTP client	—	Rust	reqwest
API server	Arch	Rust	axum + tokio + serde
Transcription	Arch	C++	whisper.cpp + CUDA
Summarisation	Arch	—	Ollama API (local model)
Notion push	Arch	Rust	reqwest + notion API
mDNS discovery	—	Rust	libmdns / mdns-sd

Why all Rust? Shared types between client and server (same crate for API structs, job IDs, audio encoding). One build system. No language boundary headaches.

📦 Crate layout

granola/
├── Cargo.toml              # workspace root
├── crates/
│   ├── granola-core/          # shared types: JobId, JobStatus, Transcript, Summary
│   │   ├── Cargo.toml
│   │   └── src/lib.rs
│   ├── granola-client/        # Iced app: mic capture + encode + HTTP + GUI
│   │   ├── Cargo.toml
│   │   └── src/
│   │       ├── main.rs          # entry + iced runtime
│   │       ├── ui/              # widget tree (record/stop, tabs, transcript view)
│   │       ├── audio/           # cpal capture → Opus encoder
│   │       ├── client.rs        # reqwest API client
│   │       └── discovery.rs     # mDNS + Tailscale resolver
│   └── granola-server/        # Axum API + whisper + Ollama + Notion push
│       ├── Cargo.toml
│       └── src/
│           ├── main.rs          # axum server bootstrap
│           ├── routes/          # POST /record, GET /status, GET /result
│           ├── pipeline/        # whisper runner, Ollama summariser
│           ├── notion.rs        # Notion API client
│           └── queue.rs         # in-memory job queue (tokio tasks)
└── scripts/
    └── setup.sh                # install whisper.cpp model, configure avahi

🗺️ Build phases

Phase 1 · Core pipeline

Server-side: receive audio → transcribe → summarise → push

→ granola-core shared types
→ granola-server Axum API scaffold
→ whisper.cpp integration (subprocess)
→ Ollama summariser
→ Notion push
→ Test with curl

Working API, testable with curl

Phase 2 · Iced client

Native GUI: record → upload → display results

→ granola-client app scaffold
→ Iced window + tab layout
→ Mic capture with cpal + Opus
→ HTTP client + status polling
→ Auto-discovery (mDNS + Tailscale)

Record on laptop → view in app

Phase 3 · Polish

Hotkeys, history, reliability

→ Global hotkey toggle
→ History tab (past recordings)
→ Offline queue (record while server's away)
→ Speaker diarisation
→ LLM model swapping in settings

Daily driver ready

🤔 Key decisions made

Decision	Choice	Why
GUI framework	Iced	Native Rust. No WebKit. Proper Wayland support. One codebase for Mac + Arch.
App model	Window (no tray)	No Wayland tray headaches. Iced window is native and clean.
Communication	HTTP API (Axum)	Lets the app fetch results back. Polling is simple and reliable. Shared Rust types between client + server.
Process location	Desktop (3080)	20-30× realtime whisper. Desktop is always-on. Single server to maintain.
Audio format	Opus → WAV	Opus for POST (tiny), server decodes to WAV for Whisper
Discovery	mDNS primary + Tailscale fallback	Sub-second LAN discovery. Tailscale handles everything else.
LLM	Local Ollama	Private. 3080 runs local models easily. No API costs.
Output	Notion + in-app display	Both. App shows instant results. Notion is the durable archive.

❓ Still to decide

• Ollama model? Which local model for summarisation? Llama 3.x / Mistral / Qwen? 8B fits comfortably on the 3080.
• Desktop always on? If not, we should build a local recording queue in the client so you can record offline and upload later.
• Hotkey? Start/stop with a global keybind without focusing the window? (Possible with Iced + a key listener crate)
• Starting point? Phase 1 (server API + curl testing) gives you a working pipeline fastest. Phase 2 (Iced app) is the fun UI bit.

Granola architecture v2 · updated 16 Jun 2026 Published with flareduct